# [Solved] Another kind of performance measurement for time series

Especially in financial data mining one would build a model not on the actual stock price but on the difference to the last day.

Consequently, the result of a prediction process will be an estimation about the change of the price from one day until the next.

The currently available "forecasting performance" operator for series determines whether the prediction trend is correct.

(e.g. delta[today] = 4; delta[prediction for tomorrow] = 6; delta[tomorrow] = 5 >> trend is true because tomorrow>today AND prediction>today)

In order to determine win/loss this is not sufficient.

(e.g. delta[today] = -4; delta[prediction for tomorrow] = -3; delta[tomorrow] = -2 >> trend is true but the share still loses value)

Hence, the main question rather is wether delta[tomorrow] will be positive or negative.

(e.g. delta[prediction for tomorrow] = -3; delta[tomorrow] = -2 >> trend should be true because prediction and tomorrow have the same sign)

(e.g. delta[prediction for tomorrow] = 4; delta[tomorrow] = -1 >> trend should be false)

(e.g. delta[prediction for tomorrow] = 1; delta[tomorrow] = 3 >> trend should be true)

Can anyone help how to realize this kind of performance measurement?

PS: With the existing operator I discovered pretty good prediction trend accuracy rates of 0.7 to 0.8 but the overall win/loss simulation was only slightly above 0.5 due to the issue described above. So I was wondering whether another data preprocessing could help (e.g. transform the stock values into binominal data like "up" and "down" but SVMs are not able to handle binominal data). So far I calculate the daily percental change for all attributes and the label. The best correlating attributes are then used to build a model in the SVM. Does anyone happen to know wether there are other essential steps in preprocessing to improve prediction quality?

Thank you for your help!

Kind regards

Sachs

Consequently, the result of a prediction process will be an estimation about the change of the price from one day until the next.

The currently available "forecasting performance" operator for series determines whether the prediction trend is correct.

(e.g. delta[today] = 4; delta[prediction for tomorrow] = 6; delta[tomorrow] = 5 >> trend is true because tomorrow>today AND prediction>today)

In order to determine win/loss this is not sufficient.

(e.g. delta[today] = -4; delta[prediction for tomorrow] = -3; delta[tomorrow] = -2 >> trend is true but the share still loses value)

Hence, the main question rather is wether delta[tomorrow] will be positive or negative.

(e.g. delta[prediction for tomorrow] = -3; delta[tomorrow] = -2 >> trend should be true because prediction and tomorrow have the same sign)

(e.g. delta[prediction for tomorrow] = 4; delta[tomorrow] = -1 >> trend should be false)

(e.g. delta[prediction for tomorrow] = 1; delta[tomorrow] = 3 >> trend should be true)

Can anyone help how to realize this kind of performance measurement?

PS: With the existing operator I discovered pretty good prediction trend accuracy rates of 0.7 to 0.8 but the overall win/loss simulation was only slightly above 0.5 due to the issue described above. So I was wondering whether another data preprocessing could help (e.g. transform the stock values into binominal data like "up" and "down" but SVMs are not able to handle binominal data). So far I calculate the daily percental change for all attributes and the label. The best correlating attributes are then used to build a model in the SVM. Does anyone happen to know wether there are other essential steps in preprocessing to improve prediction quality?

Thank you for your help!

Kind regards

Sachs

0

## Answers

1,869Unicornyou could probably use a combination of Generate Attributes and Aggregate to calculate any desired performance measure. Of course those operators work on example sets and write their results into an example set, but once you have the final value you can extract it as a performance measure with the Extract Performance operator with performance_type set to data_value.

Hope this helps!

Best regards,

Marius

537MavenAlternatively, convert your data to differences each day, so data points are actual deltas that are computed in advance?

130Contributor IIThank you for your replies!

I am not familiar with the script operator yet - so I am going to try the combination of Generate Attributes and Aggregate first.

I don't get the second part on converting to differences each day. The input data is already the difference. But if the predicted trend is positive it doesn't mean necessarily an absolute positive result as today's difference could be e.g. -5 and prediction is -3. So the trend is up but still the overall result is negative.

Best regards

Sachs

537MavenThere is the operator called "Predict Series"

This gives you "real" and "predicted" for your label attribute.

So you have 2 arrays with N data points

real.length() = N and predicted.length() = N

Can you write pseudo code with this arrays?

130Contributor II"Predict Series" unfortunatelly won't work with my example because this operator requires univariate data. However, I believe that the prediction part of the model works already fine.

Any comments welcome...

Best regards

Sachs

537MavenAs far as I'm aware you can run predict series and then implement a script that does something like:

(e.g. delta[prediction for tomorrow] = -3; delta[tomorrow] = -2 >> trend should be true because prediction and tomorrow have the same sign)

(e.g. delta[prediction for tomorrow] = 4; delta[tomorrow] = -1 >> trend should be false)

(e.g. delta[prediction for tomorrow] = 1; delta[tomorrow] = 3 >> trend should be true)

But maybe I misunderstood from the beginning, in this case I'm sorry.

Best regards,

Wessel

130Contributor IISorry that it took me so long to answer your kindful offer. Though it it took a long time it doesn't mean that it is less important to me. The delay is caused by a multiple months travel and the access to internet is very limited.

I am going to write some pseudo code and post it in the next days.

Thank you very much!

Sachs

537Maven130Contributor IIPTA is described in http://rapid-i.com/api/rapidminer-4.6/com/rapidminer/operator/performance/PredictionTrendAccuracy.html

In contrast to PTA I need a formula which calculates [(if ((v4)*(p1)>=0), 1, 0) + (if ((v5)*(p2)>=0), 1, 0) +...] / 7

In other words: The substraction is left out.

I would appreciate your help very much!!

Kind regards

Sachs

130Contributor IICheers

Sachs