Bug when running ANOVA

mmarag · September 2018

Hello,

i want to perform a comparison of forecasting performances for 5 methods: ARIMA, Generalized Linear Model, Linear Regression, Support Vector Machines and Neural Networks.

I use a sample dataset of Apple (AAPL) containing 137 consequtive trading days.

Since the ARIMA model is evaluated using the AIC score and i need all of them to be evaluated using Root Mean Squared Error (RMSE), i apply the ARIMA trainer of the first 127 days and then i ask the ARIMA forecast to predict over an horizon of 10 days, comparing the actual price with the forecast and calculating the RMSE.

For the other methods its much easier, since i train each model on the 127 days and apply the model on the last 10 days.

Note that i am using Grid search to find optimal parameters for all methods. The process runs perfectly but when i add the ANOVA operator to compare the performances, an error pop up. When i disable the ANOVA and have only T-test, everyting runs smoothly.

I attach the process (if you want to run this you have to disable the ANOVA operator!), the error and the data sample in a zip file.

Kind regards,

mmarag

SGolbert · September 2018

Hi @mmarag,

I have tried removing the range filtering, because you don't really need it (you get your performance measures with cross validation). Additionally, I see no way to compare ARIMA with the others.

Here is my process:

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
    <process expanded="true">
      <operator activated="true" class="read_excel" compatibility="9.0.002" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
        <parameter key="excel_file" value="C:\Users\sgolbert\Desktop\AAPL.xlsx"/>
        <list key="annotations"/>
        <parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="date.true.date_time.attribute"/>
          <parameter key="1" value="close.true.real.attribute"/>
          <parameter key="2" value="volume.true.integer.attribute"/>
          <parameter key="3" value="open.true.real.attribute"/>
          <parameter key="4" value="high.true.real.attribute"/>
          <parameter key="5" value="low.true.real.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.0.002" expanded="true" height="103" name="Multiply" width="90" x="45" y="238"/>
      <operator activated="false" class="subprocess" compatibility="9.0.002" expanded="true" height="82" name="Arima Performance" width="90" x="447" y="85">
        <process expanded="true">
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.0.002" expanded="true" height="145" name="Optimize Parameters (Grid)" origin="GENERATED_TUTORIAL" width="90" x="514" y="34">
            <list key="parameters">
              <parameter key="ARIMA Trainer.plithiumorder_of_the_autoregressive_model" value="[1;10;10;linear]"/>
              <parameter key="ARIMA Trainer.qlithiumorder_of_the_moving-average_model" value="[0.0;10;10;linear]"/>
            </list>
            <parameter key="log_performance" value="false"/>
            <process expanded="true">
              <operator activated="true" class="time_series:arima_trainer" compatibility="9.0.002" expanded="true" height="103" name="ARIMA Trainer" origin="GENERATED_TUTORIAL" width="90" x="246" y="85">
                <parameter key="time_series_attribute" value="close"/>
                <parameter key="has_indices" value="true"/>
                <parameter key="indices_attribute" value="date"/>
                <parameter key="plithiumorder_of_the_autoregressive_model" value="5"/>
                <parameter key="qlithiumorder_of_the_moving-average_model" value="5"/>
              </operator>
              <connect from_port="input 1" to_op="ARIMA Trainer" to_port="example set"/>
              <connect from_op="ARIMA Trainer" from_port="forecast model" to_port="output 1"/>
              <connect from_op="ARIMA Trainer" from_port="performance" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="false" class="time_series:apply_forecast" compatibility="9.0.002" expanded="true" height="82" name="Apply Forecast" origin="GENERATED_TUTORIAL" width="90" x="246" y="391">
            <parameter key="forecast_horizon" value="10"/>
            <description align="center" color="transparent" colored="false" width="126">The best fitting model is used to forecast the next 10 values of the Time Series</description>
          </operator>
          <operator activated="false" class="select_attributes" compatibility="9.0.002" expanded="true" height="82" name="Select Attributes (2)" width="90" x="380" y="391">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="forecast of close"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="false" class="filter_examples" compatibility="9.0.002" expanded="true" height="103" name="Filter Examples" width="90" x="447" y="544">
            <list key="filters_list">
              <parameter key="filters_entry_key" value="forecast of close.is_not_missing."/>
            </list>
          </operator>
          <operator activated="false" breakpoints="after" class="generate_id" compatibility="9.0.002" expanded="true" height="82" name="Generate ID" width="90" x="581" y="544"/>
          <operator activated="false" class="set_role" compatibility="9.0.002" expanded="true" height="82" name="Set Role" width="90" x="715" y="850">
            <parameter key="attribute_name" value="close"/>
            <parameter key="target_role" value="label"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="false" class="performance_regression" compatibility="9.0.002" expanded="true" height="82" name="arima" width="90" x="983" y="748"/>
          <connect from_port="in 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="out 1"/>
          <connect from_op="Apply Forecast" from_port="example set" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="arima" to_port="labelled data"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="time_series:windowing" compatibility="9.0.002" expanded="true" height="82" name="Windowing" width="90" x="112" y="442">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="close"/>
        <parameter key="indices_attribute" value="date"/>
        <parameter key="window_size" value="1"/>
        <parameter key="horizon_attribute" value="close"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.0.002" expanded="true" height="229" name="Multiply (2)" width="90" x="179" y="595"/>
      <operator activated="true" class="subprocess" compatibility="9.0.002" expanded="true" height="103" name="Neural Net Performance" width="90" x="648" y="748">
        <process expanded="true">
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.0.002" expanded="true" height="124" name="Optimize Parameters (5)" origin="GENERATED_TUTORIAL" width="90" x="380" y="34">
            <list key="parameters">
              <parameter key="Neural Net.learning_rate" value="[0.01;0.5;10;linear]"/>
            </list>
            <parameter key="log_performance" value="false"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation (4)" width="90" x="179" y="136">
                <parameter key="sampling_type" value="linear sampling"/>
                <process expanded="true">
                  <operator activated="true" class="neural_net" compatibility="9.0.002" expanded="true" height="82" name="Neural Net" width="90" x="112" y="34">
                    <list key="hidden_layers"/>
                    <parameter key="training_cycles" value="500"/>
                  </operator>
                  <connect from_port="training set" to_op="Neural Net" to_port="training set"/>
                  <connect from_op="Neural Net" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                  <description align="left" color="green" colored="true" height="113" resized="false" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="9.0.002" expanded="true" height="82" name="Apply Model (7)" width="90" x="45" y="34">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="9.0.002" expanded="true" height="82" name="Performance (3)" width="90" x="179" y="34"/>
                  <connect from_port="model" to_op="Apply Model (7)" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model (7)" to_port="unlabelled data"/>
                  <connect from_op="Apply Model (7)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
                  <connect from_op="Performance (3)" from_port="performance" to_port="performance 1"/>
                  <connect from_op="Performance (3)" from_port="example set" to_port="test set results"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_test set results" spacing="0"/>
                  <portSpacing port="sink_performance 1" spacing="0"/>
                  <portSpacing port="sink_performance 2" spacing="0"/>
                  <description align="left" color="blue" colored="true" height="107" resized="false" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
                </process>
                <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
              </operator>
              <connect from_port="input 1" to_op="Validation (4)" to_port="example set"/>
              <connect from_op="Validation (4)" from_port="model" to_port="model"/>
              <connect from_op="Validation (4)" from_port="performance 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <connect from_port="in 1" to_op="Optimize Parameters (5)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (5)" from_port="performance" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="source_in 3" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.002" expanded="true" height="103" name="SVM Performance" width="90" x="581" y="544">
        <process expanded="true">
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.0.002" expanded="true" height="124" name="Optimize Parameters (4)" origin="GENERATED_TUTORIAL" width="90" x="380" y="34">
            <list key="parameters">
              <parameter key="SVM (Linear).C" value="[0;5;100;linear]"/>
            </list>
            <parameter key="log_performance" value="false"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation (3)" width="90" x="179" y="136">
                <parameter key="sampling_type" value="linear sampling"/>
                <process expanded="true">
                  <operator activated="true" class="support_vector_machine_linear" compatibility="9.0.002" expanded="true" height="82" name="SVM (Linear)" width="90" x="112" y="34"/>
                  <connect from_port="training set" to_op="SVM (Linear)" to_port="training set"/>
                  <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                  <description align="left" color="green" colored="true" height="113" resized="false" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="9.0.002" expanded="true" height="82" name="Apply Model (5)" width="90" x="45" y="34">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="9.0.002" expanded="true" height="82" name="Performance (6)" width="90" x="179" y="34"/>
                  <connect from_port="model" to_op="Apply Model (5)" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model (5)" to_port="unlabelled data"/>
                  <connect from_op="Apply Model (5)" from_port="labelled data" to_op="Performance (6)" to_port="labelled data"/>
                  <connect from_op="Performance (6)" from_port="performance" to_port="performance 1"/>
                  <connect from_op="Performance (6)" from_port="example set" to_port="test set results"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_test set results" spacing="0"/>
                  <portSpacing port="sink_performance 1" spacing="0"/>
                  <portSpacing port="sink_performance 2" spacing="0"/>
                  <description align="left" color="blue" colored="true" height="107" resized="false" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
                </process>
                <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
              </operator>
              <connect from_port="input 1" to_op="Validation (3)" to_port="example set"/>
              <connect from_op="Validation (3)" from_port="model" to_port="model"/>
              <connect from_op="Validation (3)" from_port="performance 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <connect from_port="in 1" to_op="Optimize Parameters (4)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (4)" from_port="performance" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="source_in 3" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.002" expanded="true" height="103" name="LR Performance" width="90" x="514" y="391">
        <process expanded="true">
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.0.002" expanded="true" height="166" name="Optimize Parameters (3)" origin="GENERATED_TUTORIAL" width="90" x="313" y="34">
            <list key="parameters">
              <parameter key="Linear Regression.feature_selection" value="none,M5 prime,greedy,T-Test,Iterative T-Test"/>
            </list>
            <parameter key="log_performance" value="false"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation (2)" width="90" x="179" y="136">
                <parameter key="sampling_type" value="linear sampling"/>
                <process expanded="true">
                  <operator activated="true" class="linear_regression" compatibility="9.0.002" expanded="true" height="103" name="Linear Regression" width="90" x="45" y="34"/>
                  <connect from_port="training set" to_op="Linear Regression" to_port="training set"/>
                  <connect from_op="Linear Regression" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                  <description align="left" color="green" colored="true" height="113" resized="false" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="9.0.002" expanded="true" height="82" name="Apply Model (3)" width="90" x="45" y="34">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="9.0.002" expanded="true" height="82" name="Performance (4)" width="90" x="179" y="34"/>
                  <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
                  <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (4)" to_port="labelled data"/>
                  <connect from_op="Performance (4)" from_port="performance" to_port="performance 1"/>
                  <connect from_op="Performance (4)" from_port="example set" to_port="test set results"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_test set results" spacing="0"/>
                  <portSpacing port="sink_performance 1" spacing="0"/>
                  <portSpacing port="sink_performance 2" spacing="0"/>
                  <description align="left" color="blue" colored="true" height="107" resized="false" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
                </process>
                <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
              </operator>
              <connect from_port="input 1" to_op="Validation (2)" to_port="example set"/>
              <connect from_op="Validation (2)" from_port="model" to_port="model"/>
              <connect from_op="Validation (2)" from_port="example set" to_port="output 1"/>
              <connect from_op="Validation (2)" from_port="test result set" to_port="output 2"/>
              <connect from_op="Validation (2)" from_port="performance 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
              <portSpacing port="sink_output 3" spacing="0"/>
            </process>
          </operator>
          <connect from_port="in 1" to_op="Optimize Parameters (3)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (3)" from_port="performance" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="source_in 3" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="subprocess" compatibility="9.0.002" expanded="true" height="103" name="GLM Performance" width="90" x="447" y="238">
        <process expanded="true">
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.0.002" expanded="true" height="124" name="Optimize Parameters (2)" origin="GENERATED_TUTORIAL" width="90" x="380" y="34">
            <list key="parameters">
              <parameter key="Generalized Linear Model.family" value="gaussian,poisson,gamma"/>
            </list>
            <parameter key="log_performance" value="false"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Validation" width="90" x="179" y="136">
                <parameter key="sampling_type" value="shuffled sampling"/>
                <process expanded="true">
                  <operator activated="true" class="h2o:generalized_linear_model" compatibility="7.2.000" expanded="true" height="124" name="Generalized Linear Model" width="90" x="45" y="34">
                    <list key="beta_constraints"/>
                    <list key="expert_parameters"/>
                  </operator>
                  <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/>
                  <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                  <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="9.0.002" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="9.0.002" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="34"/>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
                  <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
                  <connect from_op="Performance (2)" from_port="example set" to_port="test set results"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_test set results" spacing="0"/>
                  <portSpacing port="sink_performance 1" spacing="0"/>
                  <portSpacing port="sink_performance 2" spacing="0"/>
                  <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
                </process>
                <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
              </operator>
              <connect from_port="input 1" to_op="Validation" to_port="example set"/>
              <connect from_op="Validation" from_port="model" to_port="model"/>
              <connect from_op="Validation" from_port="performance 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <connect from_port="in 1" to_op="Optimize Parameters (2)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (2)" from_port="performance" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="source_in 3" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="t_test" compatibility="9.0.002" expanded="true" height="187" name="T-Test" origin="GENERATED_TUTORIAL" width="90" x="849" y="289"/>
      <operator activated="true" class="anova" compatibility="9.0.002" expanded="true" height="187" name="ANOVA" origin="GENERATED_TUTORIAL" width="90" x="1050" y="391"/>
      <connect from_op="Read Excel" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Windowing" to_port="example set"/>
      <connect from_op="Windowing" from_port="windowed example set" to_op="Multiply (2)" to_port="input"/>
      <connect from_op="Multiply (2)" from_port="output 1" to_op="GLM Performance" to_port="in 1"/>
      <connect from_op="Multiply (2)" from_port="output 2" to_op="GLM Performance" to_port="in 2"/>
      <connect from_op="Multiply (2)" from_port="output 3" to_op="LR Performance" to_port="in 1"/>
      <connect from_op="Multiply (2)" from_port="output 4" to_op="LR Performance" to_port="in 2"/>
      <connect from_op="Multiply (2)" from_port="output 5" to_op="SVM Performance" to_port="in 1"/>
      <connect from_op="Multiply (2)" from_port="output 6" to_op="SVM Performance" to_port="in 2"/>
      <connect from_op="Multiply (2)" from_port="output 7" to_op="Neural Net Performance" to_port="in 1"/>
      <connect from_op="Multiply (2)" from_port="output 8" to_op="Neural Net Performance" to_port="in 2"/>
      <connect from_op="Neural Net Performance" from_port="out 1" to_op="T-Test" to_port="performance 5"/>
      <connect from_op="SVM Performance" from_port="out 1" to_op="T-Test" to_port="performance 4"/>
      <connect from_op="LR Performance" from_port="out 1" to_op="T-Test" to_port="performance 3"/>
      <connect from_op="GLM Performance" from_port="out 1" to_op="T-Test" to_port="performance 2"/>
      <connect from_op="T-Test" from_port="significance" to_port="result 1"/>
      <connect from_op="T-Test" from_port="performance 1" to_op="ANOVA" to_port="performance 1"/>
      <connect from_op="T-Test" from_port="performance 2" to_op="ANOVA" to_port="performance 2"/>
      <connect from_op="T-Test" from_port="performance 3" to_op="ANOVA" to_port="performance 3"/>
      <connect from_op="T-Test" from_port="performance 4" to_op="ANOVA" to_port="performance 4"/>
      <connect from_op="T-Test" from_port="performance 5" to_op="ANOVA" to_port="performance 5"/>
      <connect from_op="ANOVA" from_port="significance" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

There is no significant difference between the groups, which may be understandable given the short window (intercept or the default node has a big weight). Maybe you need to generate more complex features in order to better fit the training data (larger windows, differentiation). The dataset itself is also very small.

For this kind of reduced data set time series models can be good. You can focus on comparing the predictions and their confidence intervals.

Best regards,

Sebastian

mmarag · September 2018

Hello and thanks for the input.

Actually, the cross validation is performed on the train data (i.e. the 127 days) and is used to help the optimizer choose the best parameters. I need to filter the data because i am interested on the performance for only the last 10 days, i want to ensure that the models would meet such data for the first time and haven't seen them before.

As regards to the training data size, i have also tried it with the copper dataset (its a ready made sample in the time series extension) and it works (apparts from the ANOVA operator which crashes)

with regards

SGolbert · September 2018

Hi,

I tested with the copper dataset and it works if I use the whole sample size. If I take a look at the ANOVA table, I can see that for both cases N = 40. That means that the ANOVA operator seem to get a subset of the results (10 each) and it fails if there are less than 10 samples in each group. That doesn't seem to be correct, therefore it must be a bug.

I attach @sgenzer to be begin the bug-reporting process, thank you!

mmarag · September 2018

thank you very much for your effort on this

sgenzer · September 2018

Thanks @SGolbert for pinging me.

@mmarag so I brought in this process and your Excel file and get no error whatsoever. Can you please help me reproduce?

Scott

mmarag · September 2018

hello @sgenzer,

when the T-test operator is connected to the ANOVA the error that pops out is this:

mmarag · September 2018

and the error description is

Exception: org.apache.commons.math3.exception.NotStrictlyPositiveException
Message: degrees of freedom (-5)
Stack trace:
org.apache.commons.math3.distribution.FDistribution.(FDistribution.java:131)
org.apache.commons.math3.distribution.FDistribution.(FDistribution.java:86)
org.apache.commons.math3.distribution.FDistribution.(FDistribution.java:65)
com.rapidminer.tools.math.AnovaCalculator$AnovaSignificanceTestResult.(AnovaCalculator.java:71)
com.rapidminer.tools.math.AnovaCalculator.performSignificanceTest(AnovaCalculator.java:180)
com.rapidminer.operator.validation.significance.AnovaSignificanceTestOperator.performSignificanceTest(AnovaSignificanceTestOperator.java:67)
com.rapidminer.operator.validation.significance.SignificanceTestOperator.doWork(SignificanceTestOperator.java:95)
com.rapidminer.operator.Operator.execute(Operator.java:1025)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:77)
com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:812)
com.rapidminer.operator.ExecutionUnit$2.run(ExecutionUnit.java:807)
java.security.AccessController.doPrivileged(Native Method)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:807)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:428)
com.rapidminer.operator.Operator.execute(Operator.java:1025)
com.rapidminer.Process.execute(Process.java:1322)
com.rapidminer.Process.run(Process.java:1297)
com.rapidminer.Process.run(Process.java:1183)
com.rapidminer.Process.run(Process.java:1136)
com.rapidminer.Process.run(Process.java:1131)
com.rapidminer.Process.run(Process.java:1121)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:65)

sgenzer · September 2018

ok thanks. Just so I can replicate it exactly, can you please send me a new XML?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Bug when running ANOVA

Fixed and Released · Last Updated May 2019

Comments