Time Series & Prediction Label Value Range

listslists Member Posts: 39 Guru
edited November 2018 in Help

 

Following the tutorials from 2010, "Rapidminer 5.0 Video Tutorial #10 - Financial Time Series Modeling" from Thomas Ott,

I get prediction labels in the format '31.000' etc., while my actual label values are between 0 and 9 (see below).

What's going on here? Is it because of my RM-Version, or did I made an unforced mistake?

Who can help?

PS:

Label = n1

My out of sample data are the last 10 of a bigger sample (youngest).

My inner sample data is of the rest of the data (historically earlier). 

 

messed_up_1.gif

 

PS: Are there any new videos- related to time series available/found?

Tagged:

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I'm not sure what your process looks like and what algorithm you are using but if you remember from my tutorials that point forecasting was not as robust as trend forecasting in RapidMiner. If you want to do get point forecasts I suggest using the forecast library and R and wrapping it inside RapidMiner. 

     

    There is one updated written tutorial in Vijay and Bala's book, I think Chapter 10.

  • listslists Member Posts: 39 Guru

    Thank you for the response Thomas,

     

    Actually I want to predict directions but I'm wondering about the value ranges in the forecast.

    Can you confirm that I made no significant mistake and that this is still the right way to do so?

    Here is my process (Attachment)....

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    I see that you're using an SVM with a dot kernel. What is this time series? Production units? Sales? The application of the SVM, it's kernel, C value, and gamma can have a dramatic effect on the forecasting the direction of your time series (see attached).  Without knowing the data, it almost looks like a GLM would work better but I would check.

    C vs gamma.pngC vs gamma

     

     

     

     

  • listslists Member Posts: 39 Guru

    Hello T-Bone, thank you for the response.

     

    I see the C-parameter of the SVM operator but no gamma. How did you produced the image 'C vs gamma'

    The data is real live data (see attachment). 

    If I had 100 datasets could I use 90 of them as inner sample data and 10 of the 100 as outer sample/validation data?

    Should the validation data be younger then the training data?

     

    Thank you for the advices.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Ah yes, the gamma parameter becomes available once you change the kernel from dot to anything else.  So I changed it to an RBF kernel, which tends to perform better in time series. I also took your process and then created a parameter optimization scheme on it. Once the C and gamma changed, the results started to come into line.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve plus_5_inner_sample_sqlite" width="90" x="45" y="85">
    <parameter key="repository_entry" value="../data/plus_5_inner_sample_sqlite"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.3.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="187">
    <parameter key="attribute_name" value="drDateText"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.3.000" expanded="true" height="82" name="Windowing" width="90" x="380" y="187">
    <parameter key="window_size" value="1"/>
    <parameter key="create_label" value="true"/>
    <parameter key="label_attribute" value="n1"/>
    </operator>
    <operator activated="true" class="optimize_parameters_grid" compatibility="7.3.001" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="581" y="187">
    <list key="parameters">
    <parameter key="SVM.kernel_gamma" value="[0.001;1000;10;logarithmic]"/>
    <parameter key="SVM.C" value="[0;10000;10;linear]"/>
    </list>
    <process expanded="true">
    <operator activated="true" class="series:sliding_window_validation" compatibility="7.3.000" expanded="true" height="124" name="Validation" width="90" x="112" y="34">
    <parameter key="training_window_width" value="20"/>
    <parameter key="training_window_step_size" value="5"/>
    <parameter key="test_window_width" value="20"/>
    <parameter key="horizon" value="5"/>
    <process expanded="true">
    <operator activated="true" class="support_vector_machine" compatibility="7.3.001" expanded="true" height="124" name="SVM" width="90" x="179" y="34">
    <parameter key="kernel_type" value="radial"/>
    <parameter key="kernel_gamma" value="0.0039810717055349725"/>
    </operator>
    <connect from_port="training" to_op="SVM" to_port="training set"/>
    <connect from_op="SVM" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="7.3.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="series:forecasting_performance" compatibility="7.3.000" expanded="true" height="82" name="Performance" width="90" x="313" y="34">
    <parameter key="horizon" value="1"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="log" compatibility="7.3.001" expanded="true" height="82" name="Log" width="90" x="246" y="85">
    <list key="log">
    <parameter key="C" value="operator.SVM.parameter.C"/>
    <parameter key="Gamma" value="operator.SVM.parameter.kernel_gamma"/>
    <parameter key="Forecast Perf" value="operator.Validation.value.performance"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Validation" to_port="training"/>
    <connect from_op="Validation" from_port="model" to_port="result 1"/>
    <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
    <connect from_op="Log" from_port="through 1" to_port="performance"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="false" class="legacy:write_model" compatibility="7.3.001" expanded="true" height="68" name="Write Model" width="90" x="246" y="34">
    <parameter key="model_file" value="C:\0000_TRANSFER\HTML5\LOTTO_CORE_2016\lottoData\PLUS_5_ARCHIV_DATA\testmod.mod"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve plus_5_outer_sample_sqlite" width="90" x="112" y="493">
    <parameter key="repository_entry" value="../data/plus_5_outer_sample_sqlite"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="7.3.001" expanded="true" height="82" name="Set Role (2)" width="90" x="246" y="340">
    <parameter key="attribute_name" value="drDateText"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="false" class="read_excel" compatibility="7.3.001" expanded="true" height="68" name="Read Excel" width="90" x="112" y="187">
    <parameter key="excel_file" value="C:\0000_TRANSFER\HTML5\LOTTO_CORE_2016\lottoData\PLUS_5_ARCHIV_DATA\plus_5_inner_sample_sqlite.xlsx"/>
    <parameter key="sheet_number" value="2"/>
    <parameter key="imported_cell_range" value="A1:F4451"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="n1.true.integer.attribute"/>
    <parameter key="1" value="n2.true.integer.attribute"/>
    <parameter key="2" value="n3.true.integer.attribute"/>
    <parameter key="3" value="n4.true.integer.attribute"/>
    <parameter key="4" value="n5.true.integer.attribute"/>
    <parameter key="5" value="drDateText.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="false" class="read_excel" compatibility="7.3.001" expanded="true" height="68" name="Read Excel (2)" width="90" x="112" y="340">
    <parameter key="excel_file" value="C:\0000_TRANSFER\HTML5\LOTTO_CORE_2016\lottoData\PLUS_5_ARCHIV_DATA\plus_5_outer_sample_sqlite.xlsx"/>
    <parameter key="sheet_number" value="2"/>
    <parameter key="imported_cell_range" value="A1:F11"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="date_format" value="yyyy-MM-dd"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="n1.true.integer.attribute"/>
    <parameter key="1" value="n2.true.integer.attribute"/>
    <parameter key="2" value="n3.true.integer.attribute"/>
    <parameter key="3" value="n4.true.integer.attribute"/>
    <parameter key="4" value="n5.true.integer.attribute"/>
    <parameter key="5" value="drDateText.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="false" class="legacy:read_model" compatibility="7.3.001" expanded="true" height="68" name="Read Model" width="90" x="514" y="391">
    <parameter key="model_file" value="C:\0000_TRANSFER\HTML5\LOTTO_CORE_2016\lottoData\PLUS_5_ARCHIV_DATA\testmod.mod"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.3.000" expanded="true" height="82" name="Windowing (2)" width="90" x="380" y="340">
    <parameter key="window_size" value="1"/>
    <parameter key="label_attribute" value="n1"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="7.3.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="782" y="289">
    <list key="application_parameters"/>
    </operator>
    <connect from_op="Retrieve plus_5_inner_sample_sqlite" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Windowing" to_port="example set input"/>
    <connect from_op="Windowing" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_op="Apply Model (2)" to_port="model"/>
    <connect from_op="Retrieve plus_5_outer_sample_sqlite" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
    <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <description align="center" color="yellow" colored="false" height="135" resized="true" width="394" x="374" y="26">N(x)-Richtungs-Prediction Part A&lt;br&gt;Vorgehen: Node Windowing n1-n5 schalten&lt;br/&gt;(siehe Bookmarks)&lt;br/&gt;&lt;br&gt;</description>
    </process>
    </operator>
    </process>

    With respect to your question on using a Cross or Split Validation, you could try those operators but then you lose the dependency of the time series.

     

    Note: I don;t know how powerful your machine is but the more parameters you choose to optimize will increase the run time. 

  • listslists Member Posts: 39 Guru

    Thank you very very much Thomas.

    It's wonderful. I'm speechless.

     

    A few questions remain.

    If I understand it right, I now can take the best performing C- and gamma parameters from the log, rewire the setup and use them "hard coded" to get the best predictions for the complete dataset in a shorter time. Is this right?

    A prediction for an up-to-date tomorrow data is represented in the last row of the result. Is this right?

     

    results_tomorrow_0.png

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    With respect to your first question, yes. The optimized values of C and gamma can now be used in your process. Just put them into the parameters and run your process again. This time faster. 

     

    With respect to your last question, yes you should expect the value to be lower.  When using Windowing and setting your Label column, you will shift back your label value in time and use the window to predict the value for the current window.  It's a bit confusing but for a refresheer check out this Community thread: http://community.rapidminer.com/t5/RapidMiner-Studio/Time-Series-using-Windowing-operator-in-RapidMiner/m-p/31791

     

    In cases like this I usually convert the label to Down or Up values using the Classify by Trend operator. Good luck!

  • listslists Member Posts: 39 Guru

    Thank you Thomas,

     

     

    Currently I have no idea how to use the Classify by Trend Operator.

    But since I will write to Excel I can classify via VBA.

     

    Are there any usefull features to determine overfitting in RM?

     

    In this case, what do you think perfomance wise about SVM versus Recurrent Neural Networks?

    I tried RNN a little bit in TensorFlow (with no success, still learning).

     

    PS: I tried to give you thumb up, but my browser fails at that point.

     

     

Sign In or Register to comment.