The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Classification and feature construction on Time series Data

surya_mpadsurya_mpad Member Posts: 3 Contributor I
edited December 2018 in Help

Hello everyone,

 

As part of a case study, I 've been working on the task 'Time series Classification' and the goal is to classify the time series data (each example in the dataset represents a time series) into 7 different classes. With the basic process( K-NN with Dynamic Time Warping) I got the classification accuracy of 98.93 and RMSE 0.011 +/0.103 ( which is strange). Since I am new to time series classification,  I built a simple process without any feature construction.

 

So I would like to have your comments on the processes that I have built and about the various feature engineering(preprocessing) techniques and the operators in RapidMiner that I can apply on time series data (each example represents a time series) for classification 

 

I have attached the sample data and the XML of the process. Please review the process and the data, and it would be great if you can let me know the right way to deal with the time series( each example in the dataset) data for the classification task with RapidMiner. 

 

 

About the dataset: 

 

*. Each example ( each row) represents a time series and have 34 regular attributes(features) which represent the different periods of the time series.

*. The class labels Type have 7 different classes(1,2,..7). see below picture

 

  Capture.PNG

 

 Your comments are valuable,

 

Many thanks and best regards,

Surya

 

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.1.001" expanded="true" height="68" name="Retrieve Classfication_timeseries_with classnames" width="90" x="45" y="34">
<parameter key="repository_entry" value="../data/Classfication_timeseries_with classnames"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.1.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="85">
<enumeration key="partitions">
<parameter key="ratio" value="0.8"/>
<parameter key="ratio" value="0.2"/>
</enumeration>
<parameter key="sampling_type" value="automatic"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="289">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Type"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="true"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="581" y="34">
<parameter key="create_complete_model" value="false"/>
<parameter key="training_window_width" value="10"/>
<parameter key="training_window_step_size" value="-1"/>
<parameter key="test_window_width" value="10"/>
<parameter key="horizon" value="1"/>
<parameter key="cumulative_training" value="false"/>
<parameter key="average_performances_only" value="true"/>
<process expanded="true">
<operator activated="true" class="k_nn" compatibility="8.1.001" expanded="true" height="82" name="k-NN" width="90" x="112" y="34">
<parameter key="k" value="1"/>
<parameter key="weighted_vote" value="false"/>
<parameter key="measure_types" value="NumericalMeasures"/>
<parameter key="mixed_measure" value="MixedEuclideanDistance"/>
<parameter key="nominal_measure" value="NominalDistance"/>
<parameter key="numerical_measure" value="DynamicTimeWarpingDistance"/>
<parameter key="divergence" value="GeneralizedIDivergence"/>
<parameter key="kernel_type" value="radial"/>
<parameter key="kernel_gamma" value="1.0"/>
<parameter key="kernel_sigma1" value="1.0"/>
<parameter key="kernel_sigma2" value="0.0"/>
<parameter key="kernel_sigma3" value="2.0"/>
<parameter key="kernel_degree" value="3.0"/>
<parameter key="kernel_shift" value="1.0"/>
<parameter key="kernel_a" value="1.0"/>
<parameter key="kernel_b" value="0.0"/>
</operator>
<connect from_port="training" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.1.001" expanded="true" height="82" name="Performance (2)" width="90" x="313" y="34">
<parameter key="main_criterion" value="first"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="false"/>
<parameter key="kappa" value="false"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="false"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="849" y="136">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<connect from_op="Retrieve Classfication_timeseries_with classnames" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Validation" to_port="training"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Validation" from_port="training" to_port="result 3"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>

 

Answers

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @surya_mpad,

     

    You are using a Sliding Window Validation operator, which is used, in deed, in time series problems.

    But a priori your problem is a pure classification problem : you want to predict the class of the attribute "Type" according to the values

    of your attributes Period-i, right ?

    So you have to use a Cross Validation operator associated to a Performance (Classification) operator.

     

    I don't how you obtain an accuracy of 98.93 % (on the whole dataset ? / have you set "Product Id" as "id" using Set Role ?), this high result is suspect.

    To answer to your question about feature selection, in deed, you have a lot of attributes. So to reduce the number of these attributes (without losing precision), and thus gain in simplicity, you can use Optimize Selection (Evolutionnary) operator (documentation about this algorithm here).

    On my side, on your partial dataset, I obain with the kNN model : 

                 with Optimize Selection     without Optimize Selection

    k = 1               95%                                         89%

    k = 2               88%                                         89%

    k = 3               89%                                         89%

    ...

     

    You can find my process here : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="85">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Classification_Period\sample data.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Product_ID.true.real.attribute"/>
    <parameter key="1" value="Period_1.true.real.attribute"/>
    <parameter key="2" value="Period_2.true.real.attribute"/>
    <parameter key="3" value="Period_3.true.real.attribute"/>
    <parameter key="4" value="Period_4.true.real.attribute"/>
    <parameter key="5" value="Period_5.true.real.attribute"/>
    <parameter key="6" value="Period_6.true.real.attribute"/>
    <parameter key="7" value="Period_7.true.real.attribute"/>
    <parameter key="8" value="Period_8.true.real.attribute"/>
    <parameter key="9" value="Period_9.true.real.attribute"/>
    <parameter key="10" value="Period_10.true.real.attribute"/>
    <parameter key="11" value="Period_11.true.real.attribute"/>
    <parameter key="12" value="Period_12.true.real.attribute"/>
    <parameter key="13" value="Period_13.true.real.attribute"/>
    <parameter key="14" value="Period_14.true.real.attribute"/>
    <parameter key="15" value="Period_15.true.real.attribute"/>
    <parameter key="16" value="Period_16.true.real.attribute"/>
    <parameter key="17" value="Period_17.true.real.attribute"/>
    <parameter key="18" value="Period_18.true.real.attribute"/>
    <parameter key="19" value="Period_19.true.real.attribute"/>
    <parameter key="20" value="Period_20.true.real.attribute"/>
    <parameter key="21" value="Period_21.true.real.attribute"/>
    <parameter key="22" value="Period_22.true.real.attribute"/>
    <parameter key="23" value="Period_23.true.real.attribute"/>
    <parameter key="24" value="Period_24.true.real.attribute"/>
    <parameter key="25" value="Period_25.true.real.attribute"/>
    <parameter key="26" value="Period_26.true.real.attribute"/>
    <parameter key="27" value="Period_27.true.real.attribute"/>
    <parameter key="28" value="Period_28.true.real.attribute"/>
    <parameter key="29" value="Period_29.true.real.attribute"/>
    <parameter key="30" value="Period_30.true.real.attribute"/>
    <parameter key="31" value="Period_31.true.real.attribute"/>
    <parameter key="32" value="Period_32.true.real.attribute"/>
    <parameter key="33" value="Period_33.true.real.attribute"/>
    <parameter key="34" value="Period_34.true.real.attribute"/>
    <parameter key="35" value="Period_35.true.real.attribute"/>
    <parameter key="36" value="Period_36.true.real.attribute"/>
    <parameter key="37" value="Type.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="179" y="85">
    <parameter key="attribute_name" value="Type"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles">
    <parameter key="Product_ID" value="id"/>
    </list>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="8.2.000" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Type"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="103" name="Multiply" width="90" x="447" y="59"/>
    <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.2.000" expanded="true" height="166" name="Optimize Parameters (Grid)" width="90" x="581" y="34">
    <list key="parameters">
    <parameter key="k-NN.k" value="[3;10;10;linear]"/>
    </list>
    <process expanded="true">
    <operator activated="true" class="optimize_selection_evolutionary" compatibility="8.2.000" expanded="true" height="103" name="Optimize Selection (Evolutionary)" width="90" x="447" y="34">
    <process expanded="true">
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Cross Validation" width="90" x="380" y="34">
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN" width="90" x="179" y="34">
    <parameter key="k" value="10"/>
    </operator>
    <connect from_port="training set" to_op="k-NN" to_port="training set"/>
    <connect from_op="k-NN" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
    <connect from_op="Performance" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="remember" compatibility="8.2.000" expanded="true" height="68" name="Remember" width="90" x="514" y="34">
    <parameter key="name" value="Model"/>
    <parameter key="io_object" value="Model"/>
    </operator>
    <connect from_port="example set" to_op="Cross Validation" to_port="example set"/>
    <connect from_op="Cross Validation" from_port="model" to_op="Remember" to_port="store"/>
    <connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="recall" compatibility="8.2.000" expanded="true" height="68" name="Recall (2)" width="90" x="581" y="136">
    <parameter key="name" value="Model"/>
    <parameter key="io_object" value="Model"/>
    </operator>
    <connect from_port="input 1" to_op="Optimize Selection (Evolutionary)" to_port="example set in"/>
    <connect from_op="Optimize Selection (Evolutionary)" from_port="example set out" to_port="output 2"/>
    <connect from_op="Optimize Selection (Evolutionary)" from_port="weights" to_port="output 1"/>
    <connect from_op="Optimize Selection (Evolutionary)" from_port="performance" to_port="performance"/>
    <connect from_op="Recall (2)" from_port="result" to_port="model"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="weights_to_data" compatibility="8.2.000" expanded="true" height="68" name="Weights to Data" width="90" x="715" y="136"/>
    <operator activated="true" class="sort" compatibility="8.2.000" expanded="true" height="82" name="Sort" width="90" x="849" y="85">
    <parameter key="attribute_name" value="Weight"/>
    <parameter key="sorting_direction" value="decreasing"/>
    </operator>
    <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="8.2.000" expanded="true" height="124" name="Optimize Parameters (2)" width="90" x="581" y="289">
    <list key="parameters">
    <parameter key="k-NN.k" value="[3;10;10;linear]"/>
    </list>
    <process expanded="true">
    <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Cross Validation (2)" width="90" x="380" y="34">
    <process expanded="true">
    <operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN (2)" width="90" x="179" y="34"/>
    <connect from_port="training set" to_op="k-NN (2)" to_port="training set"/>
    <connect from_op="k-NN (2)" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="34">
    <list key="class_weights"/>
    </operator>
    <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
    <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
    <connect from_op="Performance (2)" from_port="example set" to_port="test set results"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_test set results" spacing="0"/>
    <portSpacing port="sink_performance 1" spacing="0"/>
    <portSpacing port="sink_performance 2" spacing="0"/>
    </process>
    </operator>
    <connect from_port="input 1" to_op="Cross Validation (2)" to_port="example set"/>
    <connect from_op="Cross Validation (2)" from_port="model" to_port="model"/>
    <connect from_op="Cross Validation (2)" from_port="performance 1" to_port="performance"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Optimize Parameters (2)" to_port="input 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="parameter set" to_port="result 3"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="output 1" to_op="Weights to Data" to_port="attribute weights"/>
    <connect from_op="Weights to Data" from_port="example set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_port="result 2"/>
    <connect from_op="Optimize Parameters (2)" from_port="performance" to_port="result 4"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    <portSpacing port="sink_result 5" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

     

     

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @surya_mpad I want to add that your time series data appears to have very low activity and then a sudden spike in volatility. Can you account for this?spikes.png

     

  • surya_mpadsurya_mpad Member Posts: 3 Contributor I

    Hi Thomas,

    Thanks for the reply.

    I think the time series what you have drawn is from one attribute(period_1.0).
    As I have mentioned in my post, each example represents a time series, and the task is to classify them into categories( attribute 'type' is the label). So I think we need to analyses time series on each example( please correct me if I am wrong).

    And please remember that the data generated with a script, so the data might irregular.

    Many Thanks
    Surya

Sign In or Register to comment.