RapidMiner

Highlighted
Learner I malek_sraj
Learner I

Label versus prediction(label) in series forecasting

Hi, 

 

I have been working with series forecasting similar to what is available on http://www.simafore.com/blog/bid/109175/Time-Series-Forecasting-using-RapidMiner-for-cost-modeling-2.... However, I have noticed that when comparing trend the author plotted the prediction(label) pattern versus Commodity A-0 pattern to show the trend. I do not understand why didn't we plot the label vs. prediction(label) to examine the trend.

 

I have done some forecasts in which when I compare the trends of prediction(label) to label my the trend results in 50% accuracy. However, when I compare the prediction(label) against the "label"-0 value I get high trend accuracy of 75%.

 

What am I missing here?

4 REPLIES
RM Certified Expert
RM Certified Expert

Re: Label versus prediction(label) in series forecasting

So it's been a while since I looked at that post but based on his snapshot, this is what I why I think he did that. 

 

In the upper flow he used the Windowing operator to create a label column from the existing time series data. What happened was that he selected the output attribute column and called it label. The Window then offset the remaining attribiute columns and renamed as Commodity-0, Commodity-1, etc. Depending on the window size, you'll have attributes like xyz-0, xyz-1, xyz-2, etc. He then trained the model to predict the label column by shifting the rest of the attributes in time.

 

The second flow is where he used the scoring data. He had to use the same window size but made sure the label wasn't there. after all that's what you want to predict. So that set generate a prediction(label) and he compared with Commodity-0 because (and here's my guess), there was only one time series. 

 

Have you checked out all the time series stuff on the Community?  I wrote a very detailed response on using the Windowing operator here: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Time-Series-using-Windowing-operator-in-R...

Learner I malek_sraj
Learner I

Re: Label versus prediction(label) in series forecasting

Hi Thomas, 

 

Thank you for your immediate response. I have a good understanding of the window operator and I also watched your videos on series forecasting.

 

Still I am a bit preplexed as I have developed a series forecasting code (very simple): a series goes through a windowing operator, inputted into a neural network model that has been trained using a windowed series with a horizon of 1, and the output. The output would show for each row: t-4, t-3, t-2, t-1, t-0, prediction(label). Now what is amusing is that when comparing the trend of the prediction(label) it follows that of t-0. Imagine the below:

 

t-10,t-9,t-8,t-7,t-6, prediction(t-5)

t-9,t-8,t-7,t-6,t-5, prediction(t-4)

t-8,t-7,t-6,t-5,t-4, prediction(t-3)

....

....

 

Why would I observe that prediction(t-5), prediction(t-4), prediction(t-3) would follow t-6,t-5,t-4?

Should not I observe that prediction(t-5), prediction(t-4), prediction(t-3) follow t-5,t-4,t-3 to test for trend forecast accuracy?

 

Much appreciated, Thomas!

RM Certified Expert
RM Certified Expert

Re: Label versus prediction(label) in series forecasting

Well you have to be careful here. If you use a Window operator across many attributes to predict a label, then each attribute column should have influence the outcome of the label. So it's not really ok to assume that prediction(label) follows att1-0 if you have att1-1, att1-2, att1-3, etc. If you can post some sample data and process, we can inspect it. 

Learner I malek_sraj
Learner I

Re: Label versus prediction(label) in series forecasting

Hi Thomas, 

Below is the XML code and I attached a file for your reference. When I compare the trend against p-0 I get 88% trend accuracy yet against the lable I get 74% and this happens with all wavelet details. Is it wavelet related or more the effect of the neural network?

 


<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process"> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve 30min_44Days_20170507" width="90" x="179" y="238"> <parameter key="repository_entry" value="../30min_44Days_20170507"/> </operator> <operator activated="true" class="select_attributes" compatibility="7.4.000" expanded="true" height="82" name="Select Attributes (39)" width="90" x="380" y="238"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Close"/> </operator> <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Modwt (3)" width="90" x="514" y="238"> <parameter key="script" value="rm_main = function(data)&#10;{&#10;&#9;library('wavelets')&#10;&#9;x &lt;- as.numeric(unlist(data))&#10;&#9;w &lt;- modwt(x, filter=&quot;d2&quot;, n.levels=6)&#10;&#9;p &lt;- unlist(w@W[4])&#10;&#9;p &lt;- as.data.frame(p)&#10;&#9;return(p)&#10;}&#10;"/> </operator> <operator activated="true" class="split_data" compatibility="7.4.000" expanded="true" height="82" name="Split Data (3)" width="90" x="648" y="238"> <enumeration key="partitions"> <parameter key="ratio" value="0.7"/> <parameter key="ratio" value="0.3"/> </enumeration> <parameter key="sampling_type" value="linear sampling"/> </operator> <operator activated="true" class="series:windowing" compatibility="7.4.000" expanded="true" height="82" name="Windowing (3)" width="90" x="782" y="238"> <parameter key="window_size" value="6"/> <parameter key="create_label" value="true"/> <parameter key="label_attribute" value="p"/> </operator> <operator activated="true" class="neural_net" compatibility="7.4.000" expanded="true" height="82" name="Neural Net (3)" width="90" x="916" y="238"> <list key="hidden_layers"/> <parameter key="training_cycles" value="1000"/> <parameter key="learning_rate" value="0.6"/> <parameter key="momentum" value="0.3"/> </operator> <operator activated="true" class="apply_model" compatibility="7.4.000" expanded="true" height="82" name="Apply Model (4)" width="90" x="1050" y="238"> <list key="application_parameters"/> </operator> <operator activated="true" class="write_excel" compatibility="7.4.000" expanded="true" height="82" name="Write Excel" width="90" x="1184" y="238"> <parameter key="excel_file" value="C:\Users\msraj002\Desktop\output.xlsx"/> </operator> <connect from_op="Retrieve 30min_44Days_20170507" from_port="output" to_op="Select Attributes (39)" to_port="example set input"/> <connect from_op="Select Attributes (39)" from_port="example set output" to_op="Modwt (3)" to_port="input 1"/> <connect from_op="Modwt (3)" from_port="output 1" to_op="Split Data (3)" to_port="example set"/> <connect from_op="Split Data (3)" from_port="partition 1" to_op="Windowing (3)" to_port="example set input"/> <connect from_op="Windowing (3)" from_port="example set output" to_op="Neural Net (3)" to_port="training set"/> <connect from_op="Neural Net (3)" from_port="model" to_op="Apply Model (4)" to_port="model"/> <connect from_op="Neural Net (3)" from_port="exampleSet" to_op="Apply Model (4)" to_port="unlabelled data"/> <connect from_op="Apply Model (4)" from_port="labelled data" to_op="Write Excel" to_port="input"/> <connect from_op="Write Excel" from_port="through" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>

 

 

Best,

 

Malek

 

 

Twitter Feed