Prediction Trend Accuracy doesn't do what's expected

maxdama · September 2008

The regression performance measure prediction_trend_accuracy currently compares the correct label and the predicted label to the previous rightmost data point in an example. When using multivariateseries2window (and maybe other cases too) the rightmost point may not be the previous value of the series being predicted.

For example, with a hypothetical 2 attribute example, we start with data (ex. stock market) like this:
price1 volume1
price2 volume2
price3 volume3
price4 volume4

After windowing to predict the next period's price it becomes this (assuming label dimension=0 and window_size=1):
price1 volume1 label1(price2)
price2 volume2 label2(price3)
price3 volume3 label3(price4)

Then the learner adds its predictions:
price1 volume1 label1(price2) pred1
price2 volume2 label2(price3) pred1
price3 volume3 label3(price4) pred1

And finally we evaluate it with prediction_trend_accuracy. The formula, from the source code, would be,
COUNTIF( {(pred1-volume1)*(label1-volume1), (pred1-volume1)*(label1-volume1), (pred1-volume1)*(label1-volume1)}, >=0) / 3

However one would expect it to use this formula,
COUNTIF( {(pred1-price1)*(label1-price1), (pred1-price1)*(label1-price1), (pred1-price1)*(label1-price1)}, >=0) / 3

I recommend at least adding a note in the description since the problem is hard to recognize. Rather than choosing the rightmost attribute you could make the user pick the correct column as a workaround.

Also, in the source code for PredictionTrendAccuracy the comment explaining it is missing some parts of the formula used to calculate the measure. Here is what it says,

This performance measure then calculates the actuals trend between the last time point
* in the series (T3 here) and the actual label (L) and compares it to the trend between T3
* and the prediction (P), sums the products between both trends, and divides this sum by the
* total number of examples, i.e. [(v4-v3)*(p1-v3)+(v5-v4)*(p2-v4)+...] / 7 in this example.

To agree with the code, the formula should be, [(if ((v4-v3)*(p1-v3)>=0), 1, 0) + (if ((v5-v4)*(p2-v4)>=0), 1, 0) +...] / 7

I'm just trying to help polish the software, not be picky so don't worry if there is not time to fix this immediately. If either of these two issues are actually me misinterpreting what is supposed to happen, then I'm happy to be corrected. If you'd like a clearer explanation I can try to do that too. Thanks

Regards,
Max

TobiasMalbrecht · September 2008

Hi Max,

thanks for pointing this out - and for polishing the software.

We always appreciate bug reports and suggestions from the users of RapidMiner. That helps us a lot in improving the functionality - and even more important - the stability of RapidMiner! So, we will have a look into the issue and see what is the problem here. Your report is probably enough to find it!

Thanks again,
Tobias

IngoRM · September 2008

Hi,

I just wanted to let you know that we have improved the prediction trend accuracy calculation in a way that it also is able to handle windowed series data created by the multivariate series windowing. We therefore added a new performance evaluation operator called "ForecastingPerformance". We also improved the multivariate windowing by creating speaking attribute names and, finally, added another operator "WindowExamples2ModellingData" which transforms the windowed data into a relative data representation. With those extensions, it is now also possible to predict values which were never part of the training data.

With this combination, series forecasting is now drastically improved with RapidMiner. Here is a complete example:


<operator name="Root" class="Process" expanded="yes">
    <operator name="OperatorChain" class="OperatorChain" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="number_examples"	value="800"/>
            <parameter key="number_of_attributes"	value="2"/>
            <parameter key="target_function"	value="driller oscillation timeseries"/>
        </operator>
        <operator name="FeatureGeneration (2)" class="FeatureGeneration">
            <list key="functions">
              <parameter key="scaled_att"	value="*(att1,const[100]())"/>
              <parameter key="series"	value="+(scaled_att, att2)"/>
            </list>
        </operator>
        <operator name="ExampleRangeFilter" class="ExampleRangeFilter">
            <parameter key="first_example"	value="1"/>
            <parameter key="last_example"	value="799"/>
        </operator>
        <operator name="FeatureNameFilter" class="FeatureNameFilter">
            <parameter key="skip_features_with_name"	value="scaled_att"/>
        </operator>
    </operator>
    <operator name="SeriesPredictionEvaluation" class="OperatorChain" expanded="yes">
        <operator name="MultivariateSeries2WindowExamples" class="MultivariateSeries2WindowExamples">
            <parameter key="window_size"	value="50"/>
        </operator>
        <operator name="WindowExamples2ModellingData" class="WindowExamples2ModellingData">
            <parameter key="horizon"	value="5"/>
            <parameter key="label_name_stem"	value="series"/>
        </operator>
        <operator name="RemoveUselessAttributes" class="RemoveUselessAttributes">
        </operator>
        <operator name="LinearRegression (2)" class="LinearRegression">
            <parameter key="keep_example_set"	value="true"/>
        </operator>
        <operator name="ModelApplier (2)" class="ModelApplier">
            <list key="application_parameters">
            </list>
            <parameter key="keep_model"	value="true"/>
        </operator>
        <operator name="LabelTransformationChain (2)" class="OperatorChain" expanded="no">
            <operator name="ChangeAttributeName (3)" class="ChangeAttributeName">
                <parameter key="new_name"	value="prediction"/>
                <parameter key="old_name"	value="prediction(series-0)"/>
            </operator>
            <operator name="FeatureGeneration (3)" class="FeatureGeneration">
                <list key="functions">
                  <parameter key="original_label"	value="+(base_value,series-0)"/>
                  <parameter key="transformed_prediction"	value="+(base_value,prediction)"/>
                </list>
                <parameter key="keep_all"	value="true"/>
            </operator>
            <operator name="FeatureNameFilter (3)" class="FeatureNameFilter">
                <parameter key="filter_special_features"	value="true"/>
                <parameter key="skip_features_with_name"	value="series-0||prediction||base_value"/>
            </operator>
            <operator name="ChangeAttributeRole (2)" class="ChangeAttributeRole">
                <parameter key="name"	value="original_label"/>
                <parameter key="target_role"	value="label"/>
            </operator>
            <operator name="ChangeAttributeRole (6)" class="ChangeAttributeRole">
                <parameter key="name"	value="transformed_prediction"/>
                <parameter key="target_role"	value="prediction"/>
            </operator>
            <operator name="IdTagging (3)" class="IdTagging">
            </operator>
            <operator name="ChangeAttributeRole (7)" class="ChangeAttributeRole">
                <parameter key="name"	value="id"/>
            </operator>
        </operator>
        <operator name="ForecastingPerformance" class="ForecastingPerformance">
            <parameter key="horizon"	value="5"/>
            <parameter key="keep_example_set"	value="true"/>
            <parameter key="prediction_trend_accuracy"	value="true"/>
        </operator>
    </operator>
</operator>

Of course the estimated performance is way to optimistic (training error) but it is interesting that the forecasting also works well for linearly separated cross validation now.

Cheers,
Ingo

Legacy User · September 2008

Im noob here and to DM in general sorry for all the questions. Just wanted to ask, isn't sum of squares or variance between label and predicted value best measure of forecast accuracy? (label0-prediction0)^2 + (label1-prediction1)^2... etc. don't we want to minimize this?

Also, is slidingwindowvalidation the same as simplesplit if horizon is 1 unless cumulative training is on?

Does cumulative training do anything (doesn't seem to affect results)? how is it possible to retrain after every prediction starting with just the first 1-2 points for the initial training set and retraining after each new prediction?

How can I get a record of the label and prediction from the model applier to plot a visual of how well the learner predicted the trend (or the last/best one if in an optimization loop)? At the moment I have to use a breakpoint.

Is it possible to switch the objective of a parameteroptimization between maximize and minimize function? e.g. to minimize prediction/label variance

regressionperformance prediction_average appears to be returning label average.

from PredictionTrendAccuracy:

129 for (int i = horizon; i < labels.length; i++) {
130 double actualTrend = labels - labels[i - horizon];
131 double predictionTrend = predictions - predictions[i - horizon];
132 if (actualTrend * predictionTrend >= 0) {
133 correctCounter += weights[i - horizon];
134 }
135 length += weights[i - horizon];
136 }

can you explain this code?

thanks for the great work and help... Dani

IngoRM · September 2008

Hi,

Just wanted to ask, isn't sum of squares or variance between label and predicted value best measure of forecast accuracy? (label0-prediction0)^2 + (label1-prediction1)^2... etc. don't we want to minimize this?

Not necessarily. The squared error often is the error which internally is minimized by the learning method. But you - as a user - could of course be interested in other performance measurements.

Also, is slidingwindowvalidation the same as simplesplit if horizon is 1 unless cumulative training is on?

No. Sliding window validation performs several training and test and reports the average values. Simple split validation only trains once and tests once. And more important: unless the sampling type is not changed to "linear", the data will be shuffled and data from the future will be given to the learner which is often not desired.

Does cumulative training do anything (doesn't seem to affect results)? how is it possible to retrain after every prediction starting with just the first 1-2 points for the initial training set and retraining after each new prediction?

Sorry, I am not sure I fully understand the question. But you could add breakpoints before the learner / applier to see the difference in data which is delivered to the inner operators.

How can I get a record of the label and prediction from the model applier to plot a visual of how well the learner predicted the trend (or the last/best one if in an optimization loop)? At the moment I have to use a breakpoint.

Write the data (label + prediction) to a file with the ExampleSetWriter in "append" mode and load the complete file after finishing the validation run in order to visualize it.

Is it possible to switch the objective of a parameteroptimization between maximize and minimize function? e.g. to minimize prediction/label variance

This is usually not necessary. The optimization operators always maximize the function but the function itself decides if it should be maximized or minimized. If, for example, an error E should be minimized than the value -1
* E is maximized.

can you explain this code?

The latest code version is the following:


1        for (int i = horizon; i < labels.length; i++) {
2       	double actualTrend = labels - labels[i - horizon];
3        	double predictionTrend = predictions - labels[i - horizon];
4        	if (actualTrend * predictionTrend >= 0) {
5        		correctCounter += weights[i - horizon];
6        	}
7        	length += weights[i - horizon];
8        }

The loop (1) runs over the complete data set and performs the following steps:

Calculate the trend between now and the true value after horizon steps (2)
Calculate the trend between now and the predicted value after horizon steps (3)
If the trend is the same (the product between both values will be greater than 0) (4)....
...we add the weight of the example to our correct counter (5); the weight often is 1
In both cases (correct or not) the weight is also added to the field "length". The final result is than the correct counter divided by length.

Cheers,
Ingo

Legacy User · December 2008

Hi Ingo,
is that class right "WindowExamples2ModellingData"?
Ciao
Winfried

land · December 2008

Hi Winfried,
at least this class exists...But if its the correct class depends on your purpose. Since I don't know what you are going to do, I can't answer this question.

Greetings,
Sebastian

Legacy User · February 2009

"WindowExamples2ModellingData" --> error
"WindowExamples2ModelingData" --> right
(not ll , single l)

udatny · March 2011

Hello,

the PredictionTrendAccuracy gives quite optimistic values.
my question is, if it is valid (correct) to count a prediction as correct when just one of the values "actualTrend" or "predictionTrend" equals zero?
in my opinion a correct prediction is one where (actualTrend>0 && predictionTrend >0) || (actualTrend<0 && predictionTrend <0) || (actualTrend==0 && predictionTrend ==0)

i refer to this code from PredictionTrendAccuracy

// fixed bug: length was set to 1. (3.11.2010)
length = 0;
for (int i = horizon; i < labels.length; i++) {
double actualTrend = labels - labels[i - horizon];
double predictionTrend = predictions - labels[i - horizon];
if (actualTrend * predictionTrend >= 0) {
correctCounter += weights[i - horizon];
}
length += weights[i - horizon];
}

kind regards,

ud

Ingo Mierswa wrote:
Hi,

Not necessarily. The squared error often is the error which internally is minimized by the learning method. But you - as a user - could of course be interested in other performance measurements.

No. Sliding window validation performs several training and test and reports the average values. Simple split validation only trains once and tests once. And more important: unless the sampling type is not changed to "linear", the data will be shuffled and data from the future will be given to the learner which is often not desired.

Sorry, I am not sure I fully understand the question. But you could add breakpoints before the learner / applier to see the difference in data which is delivered to the inner operators.

Write the data (label + prediction) to a file with the ExampleSetWriter in "append" mode and load the complete file after finishing the validation run in order to visualize it.

This is usually not necessary. The optimization operators always maximize the function but the function itself decides if it should be maximized or minimized. If, for example, an error E should be minimized than the value -1
* E is maximized.

The latest code version is the following:

1 for (int i = horizon; i < labels.length; i++) {
2 double actualTrend = labels - labels[i - horizon];
3 double predictionTrend = predictions - labels[i - horizon];
4 if (actualTrend * predictionTrend >= 0) {
5 correctCounter += weights[i - horizon];
6 }
7 length += weights[i - horizon];
8 }

The loop (1) runs over the complete data set and performs the following steps:

Calculate the trend between now and the true value after horizon steps (2)
Calculate the trend between now and the predicted value after horizon steps (3)
If the trend is the same (the product between both values will be greater than 0) (4)....
...we add the weight of the example to our correct counter (5); the weight often is 1
In both cases (correct or not) the weight is also added to the field "length". The final result is than the correct counter divided by length.

Cheers,
Ingo

haddock · March 2011

Hi there,

I can find PredictionTrendAccuracy.java in a legacy Yale ::) directory, but not in the current source. Is that really the one you are thinking of?

udatny · March 2011

im referring to this class:

https://rapidminer.svn.sourceforge.net/svnroot/rapidminer/Plugins/ValueSeries/Vega/src/com/rapidminer/operator/performance/PredictionTrendAccuracy.java

regards,

ud

haddock · March 2011

Hi there Ud,

OK, we are talking about series of numbers, your view is that ....

in my opinion a correct prediction is one where (actualTrend>0 && predictionTrend >0) || (actualTrend<0 && predictionTrend <0) || (actualTrend==0 && predictionTrend ==0)

By which you would get the same 'score' just so long as the slope of the the trend is the same in both actual and predicted, 'up', 'down', or 'flat'. The code you disagree with does not however give any credit for cases where both predicted and actual are 'flat', that is precisely the difference of opinion.

I fail to see how one could adjudicate a definitively right answer; however I can see that in my own domain your proposal could have undesirable side-effects, in the form of drawing conclusions from nothing. Let me explain..

Mainly my datamining involves looking for patterns in the foreign exchange market, and I work at the the one minute interval, and below. The vast majority of my time slots involve some form of inactivity, where the close is the same as the open; that is not because great correlation is being exuded, it is because the market for that currency pair is in bed, and nothing is going on! On the other hand, when markets are open there is nearly always some movement, much more frequently than there being none. The smaller the time frame the more this is true.

I don't care for the Thought Police or Slot 42, and like to remind myself that there would be no need for data-mining, if only someone could define meaning.. ;D

Just putting the other side!

wessel · March 2011

Hey,

Is there a reference to Prediction Trend Accuracy in literature?
I searched google, and I only found the rapidminer forums.

Currently I'm doing research, using my own code, to compare different measures of performance, for a forecasting (regression) task.

The most stable measures of performance seem to be those that use a baseline predictor.
For example, a baseline predictor would output the last known target value as a prediction.
You can then count the number of times your learner outperforms the baseline.
And the confidence interval:
double p = worse / (worse + better);
double v = 1.96 * Math.sqrt(p * (1 - p) / (worse + bettter));

This said, any measure that calculates some form of accuracy, for a regression problem, can be quite misleading.
The first post was about stock price prediction.
It is possible to create a predictor with good accuracy but poor squared error.
Such a predictor would make fairly good predictions on most occasions, but be way off at some occasions.
Such a predictor would be practically useless for stock market predictions,
because you don't care about high accuracy, you care about the amount of money earned by your predictions.
So you should consider a measure, that not only checks if your predictions are better or worse, but also how much better or worse.

udatny · March 2011

hi haddock,

thanks for sharing your view on these things

as i understand (or see) it, the 'problem' is not this one:

The code you disagree with does not however give any credit for cases where both predicted and actual are 'flat', that is precisely the difference of opinion.

in fact it does, as i read the code. 0*0=0 which is >=0
what i dont agree with, is, when the actual trend is 'flat' but the predicted trend is 'up' (for example) that such a case is credited as an accurate prediction.

kind regards,

ud

haddock · March 2011

Hi there Ud,

Thanks for pointing that out, you're right, the existing code would in fact score any combo of 'flat' and 'positive' to 'correct'. I don't go along with that either; I should have looked at the code more closely, my excuse is that I was duped by Ingo...

If the trend is the same (the product between both values will be greater than 0)

- and I'm sticking to it!

For reasons that I laid out above I score a trend prediction as correct if Prediction*Actual>0, because my domain demands it.

Well spotted!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Prediction Trend Accuracy doesn't do what's expected

Answers