Forecasting process - doing it right?

maurits_freriksmaurits_freriks Member Posts: 28 Contributor I
edited December 2018 in Help

Hi all!,

 

Already 3 months I'm focusing on a forecasting process. With historical data I would like to forecast the flow of the future days. With hulp from @Thomas_Ott and @lionelderkrikor I obtained the next process (see code). I would thank those two people from helping me out! But I desturb them all the time  therefore I'm posting it now in this forum. 

 

The idea from me to built the process is as follows: 

------------------------------------------------------------------------------------------------

Example:

Imagine the following days. 

Monday - day before yesterday
Tuesday - yesterday
Wednesday - today
Thursday - tomorrow 

 

It is now wednesday and we obviously don't know the flow of today because it's now flowing. But we know the flow's of monday and tuesday. Now the client ask for an forecast of the flow of tomorrow (thursday). Therefore we used the flow's of monday and tuesday to determine and predict the flow on thursday. 
And so on of eacht individual day in the future .. 

------------------------------------------------------------------------------------------------

 

Am I doing this right in the process I built?!

 

I'm wondering because I obtained the next results (see the images below). The graph I get looks too accurate that this could not be an realistic forecast. It looks like the model takes alse the flow of the day which should be predicted. Also the tabel which I get as output looks not the way I would. 

 

Data - 0: is the data input. So row 1 is 1-1-2017, row 2 is 2-1-2017. 

Prediction: How is this prediction being made? Prediciton 1-1-2017 (today) needs input data of 29 december and 30 december (yesterday and the day before yesterday). But these days don't exist in the dataset. 

 

Below the trainingset you could find my trainingset:

https://drive.google.com/file/d/1r-YSKKxXq2tzSsg-GaXBgb48b8csQq5U/view?usp=sharing

Testset:

https://drive.google.com/open?id=15ULtSC36sUUlGCcsICYoAAQMpXnfaK6M

 

It is now not about getting the best model. I just want to know if the process I built corresponds to my idea. 

 

Please let me know if you would like to help me. Wednesday I have to present my results! 

 

With kind regards,

 

Maurits Freriks

 


Screen Shot 2018-02-05 at 12.45.38.pngScreen Shot 2018-02-05 at 12.45.55.png

1 Scoring process:

2 Trainingsprocess:

<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Thomas ott test VRS" width="90" x="112" y="34">
<parameter key="repository_entry" value="../data/Thomas ott test VRS"/>
</operator>
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Test VRS with dates" width="90" x="112" y="238">
<parameter key="repository_entry" value="../data/Test VRS with dates"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing Test" width="90" x="246" y="238">
<parameter key="window_size" value="1"/>
<description align="center" color="transparent" colored="false" width="126">Set the Window size parameter based on the what the optimization said was the best in Process 01.</description>
</operator>
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="447" y="34">
<list key="application_parameters"/>
<description align="center" color="transparent" colored="false" width="126">Apply model and Windowed data, output predictions.</description>
</operator>
<operator activated="true" class="write_excel" compatibility="8.0.001" expanded="true" height="82" name="Write Excel" width="90" x="581" y="34">
<parameter key="excel_file" value="/Users/Maurits/Documents/BA 3/Minor/stage/Tests/SVM/Bedum/Output RapidMiner Thomas ott Bedum Test.xlsx"/>
</operator>
<connect from_op="Retrieve Thomas ott test VRS" from_port="output" to_op="Apply Model" to_port="model"/>
<connect from_op="Retrieve Test VRS with dates" from_port="output" to_op="Windowing Test" to_port="example set input"/>
<connect from_op="Windowing Test" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Write Excel" to_port="input"/>
<connect from_op="Write Excel" from_port="through" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="false" class="write_excel" compatibility="8.0.001" expanded="true" height="82" name="Write Excel" width="90" x="1519" y="187">
<parameter key="excel_file" value="/Users/Maurits/Documents/BA 3/Minor/stage/Tests/SVM/Output RapidMiner Thomas ott.xlsx"/>
</operator>
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Train VRS with dates" width="90" x="45" y="187">
<parameter key="repository_entry" value="../data/Train VRS with dates"/>
</operator>
<operator activated="true" class="sort" compatibility="8.0.001" expanded="true" height="82" name="Sort" width="90" x="45" y="34">
<parameter key="attribute_name" value="time"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="data"/>
<description align="center" color="transparent" colored="false" width="126">Select the 'A' column</description>
</operator>
<operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="313" y="34">
<list key="attributes">
<parameter key="data" value="1"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Lag 'A' column for striping out spikes</description>
</operator>
<operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
<list key="aggregation_attributes">
<parameter key="data" value="standard_deviation"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Calculate std dev of 'A', push to macro</description>
</operator>
<operator activated="true" class="extract_macro" compatibility="8.0.001" expanded="true" height="68" name="Extract Macro" width="90" x="648" y="34">
<parameter key="macro" value="stdev"/>
<parameter key="macro_type" value="data_value"/>
<parameter key="attribute_name" value="standard_deviation(data)"/>
<parameter key="example_index" value="1"/>
<list key="additional_macros"/>
<description align="center" color="transparent" colored="false" width="126">extract std dev value to use in Generate Attributes</description>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="648" y="238">
<list key="function_descriptions">
<parameter key="Maintainence" value="if(data &lt; ([data-1]-data), 1, 0)"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Create a Maintenance attribute to help filter out the days it's in maintenance mode</description>
</operator>
<operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="782" y="238">
<list key="filters_list">
<parameter key="filters_entry_key" value="Maintainence.eq.0"/>
</list>
<description align="center" color="transparent" colored="false" width="126">Select only non maintenance mode days</description>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="916" y="238">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="data"/>
<description align="center" color="transparent" colored="false" width="126">Select 'A' again</description>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="145" name="Optimize Parameters (Grid)" width="90" x="1050" y="238">
<list key="parameters">
<parameter key="Validation.cumulative_training" value="true,false"/>
<parameter key="SVM.kernel_gamma" value="[0.01;1;5;logarithmic]"/>
<parameter key="SVM.C" value="[0;10000;4;linear]"/>
<parameter key="Validation.training_window_width" value="[40;60;5;linear]"/>
<parameter key="Validation.training_window_step_size" value="[4;6;2;linear]"/>
<parameter key="Validation.test_window_width" value="[3;5;2;linear]"/>
</list>
<process expanded="true">
<operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="82" name="Set Macro" width="90" x="45" y="34">
<parameter key="macro" value="day_ahead"/>
<parameter key="value" value="1"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing Train" width="90" x="179" y="34">
<parameter key="window_size" value="%{day_ahead}"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="data"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing Test" width="90" x="380" y="187">
<parameter key="window_size" value="1"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="380" y="34">
<parameter key="training_window_width" value="60"/>
<parameter key="training_window_step_size" value="6"/>
<parameter key="test_window_width" value="5"/>
<process expanded="true">
<operator activated="true" class="support_vector_machine" compatibility="8.0.001" expanded="true" height="124" name="SVM" width="90" x="112" y="34">
<parameter key="kernel_type" value="radial"/>
<parameter key="C" value="10000.0"/>
</operator>
<connect from_port="training" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="8.0.001" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34">
<parameter key="main_criterion" value="root_mean_squared_error"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="8.0.001" expanded="true" height="82" name="Log" width="90" x="581" y="85">
<parameter key="filename" value="tmp"/>
<list key="log">
<parameter key="C" value="operator.SVM.parameter.C"/>
<parameter key="Gamma" value="operator.SVM.parameter.kernel_gamma"/>
<parameter key="Training Width" value="operator.Validation.parameter.training_window_width"/>
<parameter key="Step Width" value="operator.Validation.parameter.training_window_step_size"/>
<parameter key="Testing Width" value="operator.Validation.parameter.test_window_width"/>
<parameter key="Perf" value="operator.Validation.value.performance"/>
<parameter key="Set Macro Value" value="operator.Set Macro.value.macro_value"/>
</list>
</operator>
<connect from_port="input 1" to_op="Set Macro" to_port="through 1"/>
<connect from_op="Set Macro" from_port="through 1" to_op="Windowing Train" to_port="example set input"/>
<connect from_op="Windowing Train" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Windowing Train" from_port="original" to_op="Windowing Test" to_port="example set input"/>
<connect from_op="Windowing Test" from_port="example set output" to_port="result 2"/>
<connect from_op="Validation" from_port="model" to_port="result 1"/>
<connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Optimize and store optimized model</description>
</operator>
<operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store" width="90" x="1251" y="187">
<parameter key="repository_entry" value="../data/Thomas ott test VRS"/>
<description align="center" color="transparent" colored="false" width="126">Store optimized model</description>
</operator>
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="1385" y="289">
<list key="application_parameters"/>
<description align="center" color="transparent" colored="false" width="126">Sanity Check. Review 'A' time series against predicted 'A' time series from training data set.</description>
</operator>
<connect from_op="Retrieve Train VRS with dates" from_port="output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
<connect from_op="Lag Series" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
<connect from_op="Aggregate" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 2"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_op="Store" to_port="input"/>
<connect from_op="Optimize Parameters (Grid)" from_port="result 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Store" from_port="through" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>

 

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @maurits_freriks Ok, first off did you read my response on how the Windowing operator works here: https://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Time-Series-using-Windowing-operator-in-RapidMiner/m-p/31791

     

    If you did, then great. It'll help you understand the Windowing operator better. 

     

    Ok, if you know Monday and Tuesday's flow values, and Wednesday is zero, then I would make sure the training data set's last entry is zero. Then I'd train the model on using a Horizon of 2 and check the forecast performance for 2.

     

    OR you could make a naive assumption for Wednesday's value. Instead of zero you could take an average value and then train the model on with a Horizon of 1 and check the forecast performance for 1. Or you could just set Wednesday's value equal to Tuesday's value. It's ok to make assumptions, as long as you note them. 

     

    Update: Remember the Horizon parameter is for how far ahead in days you want to predict your value. The Window parameter itself is just how many days you want train the model on. This may help clarify a few things. 

  • maurits_freriksmaurits_freriks Member Posts: 28 Contributor I

    Thanks for your reply @Thomas_Ott. Let me get into the next sentences.

     

    "Ok, if you know Monday and Tuesday's flow values, and Wednesday is zero, then I would make sure the training data set's last entry is zero. Then I'd train the model on using a Horizon of 2 and check the forecast performance for 2"

     

    If you only want to predict the value of thursday, yes the flow of wedneseday is zero. BUT you would like to do this continiously for the next couple of days/months/years. So for example if you would like to predict te value of friday, then you need the value of wednesday as well, so in my opinion setting the flow of wednesday to zero is not a solution right? 

     

     

    To make it more clear, you only USE the values of monday and tuesday to predict the value of thursday. Than it doens't mean that the value of wednesday doesn't exist in you trainingsett? This is the reason why we use "windowing" right? Then you select a certain amount of days to make predictions? 

     

    Btw: I read the post and it looks really intresting and clear, but in my opinion I used this in the process I described above, but still we don't get the result i would like to have. 

     

    Regards,

     

    Maurits Freriks

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @maurits_freriks in your training data you have days and days of data. If you set your Horizon parameter to 2, then the model will try to predict 2 time units ahead (in your case the time units are days, but it could arbitrarily be weeks, months, years, etc). So let me correct myself a bit for clarity, in essence you don't really need Wednesday values if you have only Monday & Tuesday if you want to predict Thursday.

     

    Continuing on, if you want to predict for Friday (time +2), you will need to have Wednesday's (time) production flow.

     

    If you are using the Forecast Performance operator to test your trend accuracy, then please adjust the horizon value to 2 as well. This way you can measure how well your (time +2) forecast really is. 

  • maurits_freriksmaurits_freriks Member Posts: 28 Contributor I

     

    Thanks @Thomas_Ott

    After reading your previous post and adding this to my new process I came up with the process below. 

    Unfortunately I got the next error and I really don't know what to do now. This error came up by the apply model (2).

     

    I don't care now at the performance. I only would like to know if the process is doing what I said above. Then I could make conclusions tomorrow and later on optimize the process. But most important is if the process represent my method. 

     

    Method: 

    take the flow of the last/previous days to predict today the flow of tomorrow. 

    example:

    - flows of: 1 january, 2 january, 3 january, 4january, 5 january.

    - today: 6 january
    - value to predict: 7 january.

     

    And so on ...

     

     

     

    Screen Shot 2018-02-05 at 22.03.42.png

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Train VRS with dates" width="90" x="45" y="187">
    <parameter key="repository_entry" value="../data/Train VRS with dates"/>
    </operator>
    <operator activated="true" class="sort" compatibility="8.0.001" expanded="true" height="82" name="Sort" width="90" x="45" y="34">
    <parameter key="attribute_name" value="time"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="data"/>
    <description align="center" color="transparent" colored="false" width="126">Select the 'A' column</description>
    </operator>
    <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="313" y="34">
    <list key="attributes">
    <parameter key="data" value="1"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Lag 'A' column for striping out spikes</description>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.0.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="34">
    <list key="aggregation_attributes">
    <parameter key="data" value="standard_deviation"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Calculate std dev of 'A', push to macro</description>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="8.0.001" expanded="true" height="68" name="Extract Macro" width="90" x="648" y="34">
    <parameter key="macro" value="stdev"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="standard_deviation(data)"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    <description align="center" color="transparent" colored="false" width="126">extract std dev value to use in Generate Attributes</description>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="648" y="238">
    <list key="function_descriptions">
    <parameter key="Maintainence" value="if(data &lt; ([data-1]-data), 1, 0)"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Create a Maintenance attribute to help filter out the days it's in maintenance mode</description>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.0.001" expanded="true" height="103" name="Filter Examples" width="90" x="782" y="238">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Maintainence.eq.0"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Select only non maintenance mode days</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.0.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="916" y="238">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="data"/>
    <description align="center" color="transparent" colored="false" width="126">Select 'A' again</description>
    </operator>
    <operator activated="true" class="optimize_parameters_grid" compatibility="8.0.001" expanded="true" height="145" name="Optimize Parameters (Grid)" width="90" x="1050" y="238">
    <list key="parameters">
    <parameter key="Validation.cumulative_training" value="true,false"/>
    <parameter key="SVM.kernel_gamma" value="[0.01;1;5;logarithmic]"/>
    <parameter key="SVM.C" value="[0;10000;4;linear]"/>
    <parameter key="Validation.training_window_width" value="[40;60;5;linear]"/>
    <parameter key="Validation.training_window_step_size" value="[4;6;2;linear]"/>
    <parameter key="Validation.test_window_width" value="[3;5;2;linear]"/>
    </list>
    <process expanded="true">
    <operator activated="true" class="set_macro" compatibility="8.0.001" expanded="true" height="82" name="Set Macro" width="90" x="45" y="34">
    <parameter key="macro" value="day_ahead"/>
    <parameter key="value" value="5"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing Train" width="90" x="179" y="34">
    <parameter key="window_size" value="%{day_ahead}"/>
    <parameter key="create_label" value="true"/>
    <parameter key="label_attribute" value="data"/>
    </operator>
    <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing Test" width="90" x="380" y="187">
    <parameter key="window_size" value="1"/>
    </operator>
    <operator activated="true" class="series:sliding_window_validation" compatibility="7.4.000" expanded="true" height="124" name="Validation" width="90" x="380" y="34">
    <parameter key="training_window_width" value="60"/>
    <parameter key="training_window_step_size" value="6"/>
    <parameter key="test_window_width" value="5"/>
    <parameter key="horizon" value="2"/>
    <process expanded="true">
    <operator activated="true" class="support_vector_machine" compatibility="8.0.001" expanded="true" height="124" name="SVM" width="90" x="112" y="34">
    <parameter key="kernel_type" value="radial"/>
    <parameter key="C" value="10000.0"/>
    </operator>
    <connect from_port="training" to_op="SVM" to_port="training set"/>
    <connect from_op="SVM" from_port="model" to_port="model"/>
    <portSpacing port="source_training" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    <portSpacing port="sink_through 1" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="series:forecasting_performance" compatibility="7.4.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
    <parameter key="horizon" value="2"/>
    </operator>
    <connect from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
    <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
    <portSpacing port="source_model" spacing="0"/>
    <portSpacing port="source_test set" spacing="0"/>
    <portSpacing port="source_through 1" spacing="0"/>
    <portSpacing port="sink_averagable 1" spacing="0"/>
    <portSpacing port="sink_averagable 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="log" compatibility="8.0.001" expanded="true" height="82" name="Log" width="90" x="581" y="238">
    <parameter key="filename" value="tmp"/>
    <list key="log">
    <parameter key="C" value="operator.SVM.parameter.C"/>
    <parameter key="Gamma" value="operator.SVM.parameter.kernel_gamma"/>
    <parameter key="Training Width" value="operator.Validation.parameter.training_window_width"/>
    <parameter key="Step Width" value="operator.Validation.parameter.training_window_step_size"/>
    <parameter key="Testing Width" value="operator.Validation.parameter.test_window_width"/>
    <parameter key="Perf" value="operator.Validation.value.performance"/>
    <parameter key="Set Macro Value" value="operator.Set Macro.value.macro_value"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Set Macro" to_port="through 1"/>
    <connect from_op="Set Macro" from_port="through 1" to_op="Windowing Train" to_port="example set input"/>
    <connect from_op="Windowing Train" from_port="example set output" to_op="Validation" to_port="training"/>
    <connect from_op="Windowing Train" from_port="original" to_op="Windowing Test" to_port="example set input"/>
    <connect from_op="Windowing Test" from_port="example set output" to_port="result 2"/>
    <connect from_op="Validation" from_port="model" to_port="result 1"/>
    <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
    <connect from_op="Log" from_port="through 1" to_port="performance"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_performance" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Optimize and store optimized model</description>
    </operator>
    <operator activated="true" class="store" compatibility="8.0.001" expanded="true" height="68" name="Store" width="90" x="1251" y="187">
    <parameter key="repository_entry" value="../data/Thomas ott test VRS"/>
    <description align="center" color="transparent" colored="false" width="126">Store optimized model</description>
    </operator>
    <operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="1385" y="289">
    <list key="application_parameters"/>
    <description align="center" color="transparent" colored="false" width="126">Sanity Check. Review 'A' time series against predicted 'A' time series from training data set.</description>
    </operator>
    <operator activated="true" class="write_excel" compatibility="8.0.001" expanded="true" height="82" name="Write Excel" width="90" x="1519" y="187">
    <parameter key="excel_file" value="/Users/Maurits/Documents/BA 3/Minor/stage/Tests/SVM/Output RapidMiner Thomas ott.xlsx"/>
    </operator>
    <connect from_op="Retrieve Train VRS with dates" from_port="output" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Aggregate" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 2"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 1"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="result 1" to_op="Store" to_port="input"/>
    <connect from_op="Optimize Parameters (Grid)" from_port="result 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
    <connect from_op="Store" from_port="through" to_op="Apply Model (2)" to_port="model"/>
    <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Write Excel" to_port="input"/>
    <connect from_op="Write Excel" from_port="through" to_port="result 3"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Check that the Window Size parameter is the same for all Windows. 

  • maurits_freriksmaurits_freriks Member Posts: 28 Contributor I

    Hi all, 

     

    After editing the process I came up with the following.

    Screen Shot 2018-02-06 at 12.14.00.png

    This is an image about the trained data. In de plot series you only could plot the series Data-0 to data-5 but in my method I would like to plot series of data-7 with the prediction. Because you use the datapoints of data0,data1,data2,data3,data4 to predict the value of data-7. So I'm doing here something wrong right?

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    To show data-7 value, you would need to create a Window Size parameter of 8. So you will use 8 days of data to predict your Horizon.The prediction(label) is your predicted Horizon, how many days you set that out. 

  • maurits_freriksmaurits_freriks Member Posts: 28 Contributor I

    Thanks @Thomas_Ott

     

    Yes for showing the 7th data value you need to create a window size parameter of 8. But I don't want to use all the 8 days, I only want to use the first 5 days to predict the 7th day. 

    So windowsize = 5 and horizon = 2 is correct in this case? 
     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yes, that is correct. Window Size = 5 and Horizon = 2. Just note, you will not get a 'data-8' attribute. You will get data-0 through data-4. 

  • maurits_freriksmaurits_freriks Member Posts: 28 Contributor I

    I will just compare those values in an excel spreadsheet! 

     

    Thanks  @Thomas_Ott

  • maurits_freriksmaurits_freriks Member Posts: 28 Contributor I

    Sorry @Thomas_Ott here I am again for specify the parameters. If I want to built the process with input day 1,2,3,4,5 to predict the value of day 7?

     

    Do I have to fill this parameters?

    Trainset:

    Set macro 

    macro = day_ahead
    value = 5? 

     

    Windowing train:
    Windowsize=%{day_ahead}
    stepsize=1

    Windowing test:
    Windowsize=%{day_ahead}

    Performance (forecast)
    Horizon = 2

     

    Scoring process:
    Windowing test:
    Windowsize = 5
    stepsize = 1

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Yes, that is correct. Just make sure the Set Macro operator is not being used in the Optimize Parameter operator (i.e. being optimized).

Sign In or Register to comment.