"[Solved] Performance measurement with trend direction"

qwertz · January 2013

Dear all,

I am looking for a certain kind of performance measurement. "Relative error" for example gives an idea to which degree the prediction fits the label. But in my special case I also want to know whether the prediction over- or underestimates the label.

A workaround might be to use something like "average(prediction) - average(label)" in addition to the relative error. But of course it would be better to have this in one operator.

Please let me know your ideas...

Kind regards
Sachs

qwertz · January 2013

It seems that I can have multiple performance criteria if I use the attached setup. But in this case I have a general question of understanding:

1) Which of the both performance operators is being used to train the model? (Or can it be multiple?)
I thought that validation works like: take performance to adapt SVM > apply model > evaluate performance > back to first step

2) Is there a way to log the standard deviation of the performance measure which is shown in the result view as well?

Kind regards
Sachs


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="459" width="694">
      <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="number_examples" value="10"/>
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
      </operator>
      <operator activated="true" class="split_validation" compatibility="5.2.008" expanded="true" height="130" name="Validation" width="90" x="179" y="30">
        <process expanded="true" height="459" width="165">
          <operator activated="true" class="support_vector_machine_linear" compatibility="5.2.008" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
          <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
          <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="459" width="624">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="120">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="179" y="120"/>
          <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="255">
            <parameter key="root_mean_squared_error" value="false"/>
            <parameter key="relative_error" value="true"/>
          </operator>
          <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="447" y="165">
            <parameter key="filename" value="log"/>
            <list key="log">
              <parameter key="log" value="operator.Performance (2).value.relative_error"/>
            </list>
          </operator>
          <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="313" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="averagable 2"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
          <portSpacing port="sink_averagable 3" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 2" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="36"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

MariusHelf · January 2013

Hi Sachs,

the training of the model is completely independent of the chosen performance measure - the algorithm (in your case, the SVM) always uses the same methods to create the model, and the performance operators are only used to estimate the result of those methods. A detailed description of the Cross Validation can be found here: http://en.wikipedia.org/wiki/Cross-validation_(statistics)#K-fold_cross-validation

Furthermore, you are currently logging the performance of each iteration of the X-Validation. Usually, you do not want to do that, but are only interested in the performance of the entire X-Validation. For that, you have to place the Log operator outside of the X-Validation. Then you can log the final performance by logging the "performance" value of the Validation operator. The standard deviation is available as the "deviation" value of the same operator.

You can easily create custom performance measures: you can perform arbitrary operations on the output of Apply Model, e.g. with Aggregate and Generate Attributes, and then use the Extract Performance operator to provide a value of the resulting example set as performanace value.

Best regards,
Marius

qwertz · January 2013

Hi Marius,

Thank you for all! This helps a lot!

Kind regards
Sachs

PS: Here is my humble contribution to this topic. I set up a sample process like described above which does an individual performance calculation. For anyone who might be in need of it...


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="403" width="701">
      <operator activated="true" class="subprocess" compatibility="5.2.008" expanded="true" height="76" name="Generate Data (2)" width="90" x="45" y="30">
        <process expanded="true" height="403" width="694">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="10"/>
            <parameter key="number_of_attributes" value="3"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename" width="90" x="179" y="30">
            <parameter key="old_name" value="att3"/>
            <parameter key="new_name" value="prediction"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="313" y="30">
            <parameter key="name" value="prediction"/>
            <parameter key="target_role" value="prediction"/>
            <list key="set_additional_roles"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="aggregate" compatibility="5.2.008" expanded="true" height="76" name="Aggregate" width="90" x="179" y="30">
        <list key="aggregation_attributes">
          <parameter key="label" value="average"/>
          <parameter key="prediction" value="average"/>
        </list>
      </operator>
      <operator activated="true" class="rename" compatibility="5.2.008" expanded="true" height="76" name="Rename (2)" width="90" x="313" y="30">
        <parameter key="old_name" value="average(label)"/>
        <parameter key="new_name" value="avg_label"/>
        <list key="rename_additional_attributes">
          <parameter key="average(prediction)" value="avg_prediction"/>
        </list>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
        <list key="function_descriptions">
          <parameter key="performance_att" value="avg_prediction-avg_label"/>
        </list>
      </operator>
      <operator activated="true" class="extract_performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="581" y="30">
        <parameter key="performance_type" value="data_value"/>
        <parameter key="attribute_name" value="performance_att"/>
        <parameter key="example_index" value="1"/>
      </operator>
      <connect from_op="Generate Data (2)" from_port="out 1" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
      <connect from_op="Rename (2)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance" to_port="example set"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

qwertz · January 2013

I was just fooling around when I came across this:

To my understanding it should be possible to extract the results of both performance operators. However, I always get just the same value twice...

Best regards
Sachs


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="459" width="694">
      <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.001" expanded="true" height="130" name="Validation" width="90" x="179" y="30">
        <parameter key="training_window_width" value="10"/>
        <parameter key="test_window_width" value="10"/>
        <process expanded="true" height="432" width="335">
          <operator activated="true" class="support_vector_machine_linear" compatibility="5.2.008" expanded="true" height="76" name="SVM (Linear)" width="90" x="45" y="30"/>
          <connect from_port="training" to_op="SVM (Linear)" to_port="training set"/>
          <connect from_op="SVM (Linear)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="432" width="500">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="series:forecasting_performance" compatibility="5.2.001" expanded="true" height="76" name="Performance" width="90" x="179" y="75">
            <parameter key="horizon" value="1"/>
            <parameter key="main_criterion" value="prediction_trend_accuracy"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="120">
            <parameter key="root_mean_squared_error" value="false"/>
            <parameter key="relative_error" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 2"/>
          <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
          <portSpacing port="sink_averagable 3" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="313" y="75">
        <list key="log">
          <parameter key="performance_a" value="operator.Validation.value.performance"/>
          <parameter key="performance_b" value="operator.Validation.value.performance1"/>
        </list>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
      <connect from_op="Log" from_port="through 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="36"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

MariusHelf · January 2013

Actually, you can only log the first ave-output of the Validation. Furthermore, for some reason the performance and performance1 values are the same. You have to change your process in the following way:
- Connect the per output of the first performance operator to the per input of the second performance output
- connect the second per output to the first ave output of the validation
- log performance and performance2 instead of performance and performance1

Best regards,
Marius

qwertz · January 2013

Though I don't understand the underlying logic, it works pretty well the way you described it

Thank you!

Cheers
Sachs


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="459" width="694">
      <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="number_of_attributes" value="2"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.001" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
        <parameter key="training_window_width" value="10"/>
        <parameter key="test_window_width" value="10"/>
        <process expanded="true" height="432" width="335">
          <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.2.008" expanded="true" height="76" name="SVM" width="90" x="45" y="30">
            <parameter key="svm_type" value="epsilon-SVR"/>
            <parameter key="kernel_type" value="linear"/>
            <list key="class_weights"/>
          </operator>
          <connect from_port="training" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="432" width="480">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="series:forecasting_performance" compatibility="5.2.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
            <parameter key="horizon" value="1"/>
            <parameter key="main_criterion" value="prediction_trend_accuracy"/>
          </operator>
          <operator activated="true" class="performance_regression" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="313" y="30">
            <parameter key="root_mean_squared_error" value="false"/>
            <parameter key="relative_error" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_op="Performance (2)" to_port="performance"/>
          <connect from_op="Performance" from_port="example set" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="313" y="30">
        <list key="log">
          <parameter key="per" value="operator.Validation.value.performance"/>
          <parameter key="per2" value="operator.Validation.value.performance2"/>
        </list>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
      <connect from_op="Log" from_port="through 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

MariusHelf · January 2013

The logic of passing one performance vector into another Performance operator simply adds the new measures to the input performance object

qwertz · January 2013

I got the part with passing a value into another operator. What makes me puzzeld is the part that performance and performance1 values are the same and that a performance value which is delivered to avg2 cannot be logged from performance 1 or 2.

Cheers
Sachs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"[Solved] Performance measurement with trend direction"

Answers