RapidMiner

Time Series using Windowing operator in RapidMiner

SOLVED
Highlighted
Contributor II

Time Series using Windowing operator in RapidMiner

I'm trying to use a time series model in RapidMiner to forecast premium paid to an insurance company. Specifically, I have an entry for each month from January 2009 - December 2015, I want to be able to forecast the data for the next 12 months (January 2016-December 2016).

I'm having trouble understanding how the Windowing operator works, I have a few questions:

1) What goes into selecting a window size? If I want to forecast Premium over the next 12 months, is my window size 12? And if so, why do I get 12 attributes for each original attribute in my data set (the original Premium amount in one of these 12)? I get that this is supposed to explain the corresponding label value (which is just the next row's original Premium, not sure why this is happening either), but where are these numbers coming from and why does RapidMiner generate these?

2) What does the option "create single attributes" do?

3) The horizon field: If this is the distance between the last window value and the value to predict, does this mean I can't at once predict the next 12 months of data? Even if I enter the horizon as 1 (which I take to mean, give me the prediction for January 2016 since the last data point is for December 2015), then why is there no label value for December 2015 or January 2016 in the output when I run the process?

I'm a beginner, and I would really appreciate any help!

10 REPLIES
Contributor II

Re: Time Series using Windowing operator in RapidMiner

Contributor II

Re: Time Series using Windowing operator in RapidMiner

Yes, that was what I was going off of. The steps in the article are just outlined, not explained.

For example: "Window size: determines how many "attributes" are created for the cross sectional data. Each row of the original time series within the window width will become a new attribute" - this doesn't really explain why this happens, or what I'm supposed to conclude from the many attributes the Windowing operator generates.

Same goes for Thomas Ott's youtube videos- these resources are just telling me what to do, rather than explaining why they're doing what they're doing and what that's used for.

Just hoping for some more clarity on this, since I can't find much online.

Community Manager

Re: Time Series using Windowing operator in RapidMiner

[ Edited ]

Hi Rainaddi,

 

I'm the author of those old videos and you're right, I didn't explain why I choose the Windowing parameters as I did.

 

First off, there's another (older) and more detailed explanation of the Windowing operator in our community: http://community.rapidminer.com/t5/RapidMiner-Studio/Prediction-Forecasting-with-RM/td-p/210 check that out too.

 

Great questions, let me start by prefacing that Series extension is a fantastic for forecasting trend directions and it's decent at doing point forecasts too but in a point forecast is what you're after, I'd mashup the R Forecast` Library in Studio. Pretty easy to do.

 

Note, a lot of the parameters I chose will typically be a first starting point. I will make a "best guess" and then from there use a Parameter Optimization to vary parameters such as Window Size, Training/Testing Window Width, Step Size, etc. 

 

I think Simafore's blog said it best, using the Windowing operator is like taking a "cross section of data" in time. You can have multiple attributes (columns) that have different data points to help describe your label (target variable). For example, let's take this simple stock close dataset. It has XOM, FB, and MSFT Closing values. We're interested in forecasting the trend of XOM_CLOSE using it as the Label (target variable) and FB and MSFT closing prices as part of the input. You want to create a multivariate data set to describe the XOM.

 

 

WindowingExample 1.png

So how do you use FB_CLOSE and MSFT_CLOSE in your forecast? That's where the Window operator comes in, I want to take that data and make a "window" of  FB/MSFT data points that describe some XOM data point in time. Question is, what size window to use? That's where a bit of domain knowledge comes in and you'll have to make your first "best guess," remembering that you can change the Window size when you use Parameter Optimization later.  

 

For this argument, let's take a 5 day Window (the trading week is typically 5 days). That is the Window Size.  The Step Size is how far you want to advance the Window. Setting the Step Size also requires a bit of Domain knowledge because you could have be forecasting for Weekly, Quarterly, or Monthly types of data. For our example, we advanced it by 1 (the next day).  

 

You should see something like this:

 

WindowingExample 2.png

The image above is what you should see. I put red boxes on it to illustrate the next point. The red boxes highlight an important concept. In example row 1, the Date-4 column corresponds to the closing price of XOM and MSFT (FB was cut off in screen shot) to XOM_CLOSE-4 and MSFT_CLOSE-4. Likewise in example row 3, Date-3 corresponds to the closing price of XOM and MSFT for XOM_CLOSE-3 and MSFT_CLOSE-3.  Now you have a 5 day Window of data on an example (row) by example (row) basis. This is good but we're not complete yet.

 

Why is that important to rotate your data series from columns to rows? You could easily just use a simple univariate column and do a Linear Regression on it, which is just fine, but what if you want to use more than one variable and eventually test the performance (ie. the trend accuracy)? For that you have to transform the data set into the above screenshot because it preps it for the Sliding Window Validation operator (the Sliding Window Validation operator is how you backtest your multivariate data series).

 

Before you can do that, you'll have to Create a Label from your above data set. You have to tell the Windowing operator what column (attribute) should be used to train a model too. There are two main parameters you should use here, the Create a Label toggle and the Horizon parameter. Those parameters will tell RapidMiner which attribute to use for the Label column (XOM_CLOSE) and what value you want to forecast too, in this case it's the value in Jan 6, 2016 for XOM_CLOSE (73.69)

 

WindowingExample 3.png

That looks like this:

 

WindowingExample 4.png

The next step would be to feed this data into a Sliding Window Validation operator and nest an algorithm in there to back test your assumptions.

 

Hope this helps. 

Regards,
Thomas
LinkedIn: Thomas Ott
Blog: Neural Market Trends
Community Manager

Re: Time Series using Windowing operator in RapidMiner

Ok, I should learn to read the first question, not the last one.

 

Item 1: See my response as well the link I posted

Item 2: Create Single Attributes parameter has to do with how you want to Studio to recognize the data series. There are additional operators in the Series extension that require the data to be transformed to a "Series" datatype (this is specific for how that particular operator has to read in the data). Typically this is not needed, so leave the toggle on.

Item 3: You should be able to point forcast your values beyond one, but I've never did that for my specific problems, so I'd suggest you experiment there. Why isn't there label values for Dec 15/Jan 16, great quesiton and that has to do with how large of window you created in the first pass. This is why you will always need to use a second Windowing operator (with no "Create Label" toggled on) for your testing set. I'll have to follow up on this a bit later this week when I have more time. 

Regards,
Thomas
LinkedIn: Thomas Ott
Blog: Neural Market Trends
Community Manager

Re: Time Series using Windowing operator in RapidMiner

And  here is a sample process:

 

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data" compatibility="7.1.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
        <parameter key="target_function" value="sinus classification"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="82" name="Windowing" width="90" x="179" y="34">
        <parameter key="window_size" value="5"/>
        <parameter key="create_label" value="true"/>
        <parameter key="label_attribute" value="label"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="187">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="series:windowing" compatibility="5.3.000" expanded="true" height="82" name="Windowing (2)" width="90" x="648" y="187">
        <parameter key="window_size" value="5"/>
        <parameter key="label_attribute" value="label"/>
        <parameter key="horizon" value="5"/>
      </operator>
      <operator activated="true" class="series:sliding_window_validation" compatibility="5.3.000" expanded="true" height="124" name="Validation" width="90" x="581" y="34">
        <parameter key="training_window_width" value="10"/>
        <parameter key="test_window_width" value="10"/>
        <process expanded="true">
          <operator activated="true" class="k_nn" compatibility="7.1.001" expanded="true" height="82" name="k-NN" width="90" x="232" y="34"/>
          <connect from_port="training" to_op="k-NN" to_port="training set"/>
          <connect from_op="k-NN" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="series:forecasting_performance" compatibility="5.3.000" expanded="true" height="82" name="Performance" width="90" x="313" y="34">
            <parameter key="horizon" value="1"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model (2)" width="90" x="849" y="136">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Windowing" to_port="example set input"/>
      <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Windowing" from_port="original" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Windowing (2)" to_port="example set input"/>
      <connect from_op="Windowing (2)" from_port="example set output" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Regards,
Thomas
LinkedIn: Thomas Ott
Blog: Neural Market Trends
Contributor II

Re: Time Series using Windowing operator in RapidMiner

Dear Mr.Thomas Ott,

 

I am doing time series analysis for predicting the size of coming emails for optimizing resources.

I watched your videos and build a model. I have a dataset which consists of only one variable that is "Total bytes" and this information is based on 9 consecutive weeks of the academic year. I divided my dataset into two parts such as 8weeks data as training and 9th week data as testing.

 

So I am using SVM and Cross-validation operators. My problem is

  • During using windowing operator I want to select series representation as "encode-series-by-attribute" and window size as "10". But it shows error message "The parameter window-size specifies a window size, but the value 10 exceed the number of attributes".
  • In SVM, it shows an error message that "Support Vector Machine cannot handle polynomial attributes".

I am new to RapidMiner Studio. Please help me.

 

Thank you

 

Sunita

Contributor II

Re: Time Series using Windowing operator in RapidMiner

Hi there Sunita,

 

Regarding the SVM error message... Does your dataset have any attribute of type String? Some algorithms can only work with numerical attributes so cannot deal with text attributes. I recommend you to transform your non-numerical attribute to a numerical one.

 

If you have a String type parameter, you could use the "Nominal to Numerical" module.

 

I hope this helps you

 

Iker

Regular Contributor

Re: Time Series using Windowing operator in RapidMiner

[ Edited ]

Mistakenly doubled.

Attachments

Super Contributor

Re: Time Series using Windowing operator in RapidMiner

[ Edited ]

i have similiar problem like this, for training i use data from 2009-2015 and for testing i use data on 2016 (data is monthly) to predict data for 2017. both of training and testing i set 12 for window size, 1 step size and 1 for horizon.

 

But the result from testing is only 1 row when i imagine the result is 12 row (12 mont hin 2017). i know why the result is only 1 row, its because the complete window which 12 window size is only 1, when i checked add incomplete windows, its appears 12 row but i think something is not right......

 

@Thomas_Ott