"Time series forecast (with Rapid Miner)"

DaiWizard · June 2013

Hi!

I've set up a model exactly as described by Thomas Ott of 'neuralmarkettrends' in videos 8-10 - and it's working well so far.

But what I would still need is the output of the probability for the predicted label (horizon = 1). The model only gives the average values in form of
prediction_trend_accuracy: 0.807 +/- 0.067 (mikro: 0.807).

Thanks for your help !

wessel · June 2013

Hello.

I'm now using Google to find the video you describe.
Next time please use a direct link to the video that is of interest.
Video link:
https://www.youtube.com/watch?v=UmGIGEJMmN8

Can you upload your process?

As far as I understand the process is as follows:
- Order your data by date
- Split your data into two parts
- Use data before date X for training, use data after date X for testing.
- Features for training use created using windowing
- SVM is used as learner
* This process does not deal with horizons very well, neuralmarkettrends1 is aware of this fact, but does not want to complicate his video

Now to answer your question:
My suggestion would be to rescale absolute error to fall into range 0 to 1, and use this as a measure of probability.

This is the best answer I can give right now.
You need to provide better information to get a better answer.

Best regards,

Wessel

DaiWizard · June 2013

Thank you wessel for your answer!

You are right the question was a bit too unprecise, however you got it right that's the way I'm doing it.

Unfortunately I don't know what to do exactly regarding your answer "Now to answer your question:
My suggestion would be to rescale absolute error to fall into range 0 to 1, and use this as a measure of probabilit".

Where do I get the absolute error from ?

Thank you in advance !

wessel · June 2013

Using this process you can define any performance measure you want.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Gen TS" width="90" x="45" y="30">
<parameter key="target_function" value="driller oscillation timeseries"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Create Sum" width="90" x="180" y="30">
<list key="function_descriptions">
<parameter key="sum" value="str(11*att1+22*att2+33*att3+44*att4+att5)"/>
</list>
</operator>
<operator activated="true" class="guess_types" compatibility="5.3.008" expanded="true" height="76" name="Guess Types" width="90" x="315" y="30"/>
<operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Sum" width="90" x="450" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="sum"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.008" expanded="true" height="94" name="Normalize" width="90" x="585" y="30">
<parameter key="method" value="range transformation"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="76" name="Win 3 2" width="90" x="720" y="30">
<parameter key="window_size" value="3"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="sum"/>
<parameter key="horizon" value="2"/>
</operator>
<operator activated="true" class="series:predict_series" compatibility="5.3.000" expanded="true" height="60" name="Predict: 22 5 22" width="90" x="45" y="120">
<parameter key="window_width" value="15"/>
<parameter key="horizon" value="2"/>
<parameter key="max_training_set_size" value="15"/>
<process expanded="true">
<operator activated="true" class="relevance_vector_machine" compatibility="5.3.008" expanded="true" height="76" name="Relevance Vector Machine" width="90" x="45" y="30"/>
<connect from_port="window example set" to_op="Relevance Vector Machine" to_port="training set"/>
<connect from_op="Relevance Vector Machine" from_port="model" to_port="prediction model"/>
<portSpacing port="source_window example set" spacing="0"/>
<portSpacing port="sink_prediction model" spacing="0"/>
</process>
</operator>
<operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="180" y="120">
<parameter key="old_name" value="prediction(label)"/>
<parameter key="new_name" value="pred"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="120">
<list key="function_descriptions">
<parameter key="pred_times_label" value="pred*label"/>
<parameter key="pred_times_label_greater_0" value="if(pred*label>=0, 1, 0)"/>
<parameter key="abs_pred_minus_label" value="abs(pred-label)"/>
</list>
</operator>
<operator activated="true" class="extract_performance" compatibility="5.3.008" expanded="true" height="76" name="Performance" width="90" x="469" y="119">
<parameter key="performance_type" value="statistics"/>
<parameter key="attribute_name" value="abs_pred_minus_label"/>
</operator>
<connect from_op="Gen TS" from_port="output" to_op="Create Sum" to_port="example set input"/>
<connect from_op="Create Sum" from_port="example set output" to_op="Guess Types" to_port="example set input"/>
<connect from_op="Guess Types" from_port="example set output" to_op="Select Sum" to_port="example set input"/>
<connect from_op="Select Sum" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Win 3 2" to_port="example set input"/>
<connect from_op="Win 3 2" from_port="example set output" to_op="Predict: 22 5 22" to_port="example set"/>
<connect from_op="Predict: 22 5 22" from_port="example set" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Performance" to_port="example set"/>
<connect from_op="Performance" from_port="performance" to_port="result 1"/>
<connect from_op="Performance" from_port="example set" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

wessel · June 2013

You should get a result looking like this:
( I have problems uploading images, will edit this image later, just go into results dataset and plot "predicted" and "label" and maybe "abs_pred_minus_label" ).

Try figure out why absolute error is different from average(abs_pred_minus_label)
Also note that I'm not using a fixed split, instead I'm using a sliding window validation, because this is the proper way to validate time series models).

This XML shows how you can use the Regression Performance Operator.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="subprocess" compatibility="5.3.008" expanded="true" height="76" name="Generate Data (6)" width="90" x="45" y="30">
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="driller oscillation timeseries"/>
<parameter key="number_examples" value="200"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Sum" width="90" x="180" y="30">
<list key="function_descriptions">
<parameter key="sum" value="str(11*att1+22*att2+33*att3+44*att4+att5)"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Sum" width="90" x="319" y="29">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="sum"/>
</operator>
<operator activated="true" class="parse_numbers" compatibility="5.3.008" expanded="true" height="76" name="Parse Numbers (2)" width="90" x="441" y="26">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="sum"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.008" expanded="true" height="94" name="Normalize" width="90" x="561" y="27">
<parameter key="method" value="range transformation"/>
</operator>
<operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename Label" width="90" x="699" y="28">
<parameter key="old_name" value="sum"/>
<parameter key="new_name" value="label"/>
<list key="rename_additional_attributes"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate Sum" to_port="example set input"/>
<connect from_op="Generate Sum" from_port="example set output" to_op="Select Sum" to_port="example set input"/>
<connect from_op="Select Sum" from_port="example set output" to_op="Parse Numbers (2)" to_port="example set input"/>
<connect from_op="Parse Numbers (2)" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Rename Label" to_port="example set input"/>
<connect from_op="Rename Label" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="76" name="Win 3 2" width="90" x="187" y="32">
<parameter key="window_size" value="3"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="label"/>
<parameter key="horizon" value="2"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.3.008" expanded="true" height="94" name="Multiply" width="90" x="309" y="34"/>
<operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="112" name="Validation" width="90" x="515" y="30">
<parameter key="training_window_width" value="15"/>
<parameter key="test_window_width" value="1"/>
<parameter key="horizon" value="2"/>
<parameter key="average_performances_only" value="false"/>
<process expanded="true">
<operator activated="true" class="relevance_vector_machine" compatibility="5.3.008" expanded="true" height="76" name="Relevance VM (2)" width="90" x="152" y="50"/>
<connect from_port="training" to_op="Relevance VM (2)" to_port="training set"/>
<connect from_op="Relevance VM (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.008" expanded="true" height="76" name="Apply Model" width="90" x="91" y="12">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="5.3.008" expanded="true" height="76" name="Performance" width="90" x="282" y="61">
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="absolute_error" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="series:predict_series" compatibility="5.3.000" expanded="true" height="60" name="Predict: 22 5 22" width="90" x="78" y="331">
<parameter key="window_width" value="15"/>
<parameter key="horizon" value="2"/>
<parameter key="max_training_set_size" value="15"/>
<process expanded="true">
<operator activated="true" class="relevance_vector_machine" compatibility="5.3.008" expanded="true" height="76" name="Relevance VM" width="90" x="412" y="29"/>
<connect from_port="window example set" to_op="Relevance VM" to_port="training set"/>
<connect from_op="Relevance VM" from_port="model" to_port="prediction model"/>
<portSpacing port="source_window example set" spacing="0"/>
<portSpacing port="sink_prediction model" spacing="0"/>
</process>
</operator>
<operator activated="true" class="rename" compatibility="5.3.008" expanded="true" height="76" name="Rename" width="90" x="263" y="330">
<parameter key="old_name" value="prediction(label)"/>
<parameter key="new_name" value="pred"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes" width="90" x="439" y="335">
<list key="function_descriptions">
<parameter key="abs_pred_minus_label" value="abs(pred-label)"/>
</list>
</operator>
<operator activated="true" class="extract_performance" compatibility="5.3.008" expanded="true" height="76" name="Performance (2)" width="90" x="657" y="349">
<parameter key="performance_type" value="statistics"/>
<parameter key="attribute_name" value="abs_pred_minus_label"/>
</operator>
<connect from_op="Generate Data (6)" from_port="out 1" to_op="Win 3 2" to_port="example set input"/>
<connect from_op="Win 3 2" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
<connect from_op="Multiply" from_port="output 2" to_op="Predict: 22 5 22" to_port="example set"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
<connect from_op="Predict: 22 5 22" from_port="example set" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Performance (2)" to_port="example set"/>
<connect from_op="Performance (2)" from_port="performance" to_port="result 2"/>
<connect from_op="Performance (2)" from_port="example set" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>

DaiWizard · June 2013

Dear Wessel!

Thank you so much for your answer. Due to the fact that I'm a beginner I don't know how to import your data as a new operator into my process of video 8 to 10 & I'm not sure at which position of the chain to position this operator then.

Best regards, Dai Wizard!

wessel · June 2013

Click view.

Create new perspective.

In show view, tick XML, untick all others.

In XML tab:
Paste XML code

Click green V symbol.

Return to your standard view.

DaiWizard · June 2013

Hi!

Thank you wessel for your tips but I'm afraid it looks too complicated for me, I think I cannot handle (understand) it completely. Therefore I've created a PDF - file that you could view using this link: http://www.professor-heusenstamm.com/model.pdf

Bild 1 shows my original process, Bild 2 is the content of the validation operator.
Bild 3 shows the general performance output.

Bild 4 is my latest progress :-) I've inserted the "Log - Operator" and defined here the values for performance and prediction accuracy.

Bild 5 shows the result of the latter.

My question is: Did I insert the Log - operator at the correct position in the process (Bild4) to be sure it delivers the performance of the predicted n+1 value, that's content of "Read Excel (2)" or do I have to rearrange / add something ???

As usual I'm looking forward to anybodies comments.

wessel · June 2013

My process (I call this process not model) looks like this:

http://i.snag.gy/STABy.jpg

I used this button to create a new perspective (I named this perspective XML):
http://i.snag.gy/A53kc.jpg

So now my screen looks like:
http://i.snag.gy/6QXgV.jpg

This is easy for sharing processes.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Time series forecast (with Rapid Miner)"

Answers