Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Forward looking moving average or moving sum
Does anyone know how to calculated a forward looking moving sum or average (for time series data)? Is there an operator that does this so that the look forward length is a variable than can be optimized? From what I can tell, this is not possible. This is very useful for averaging out noisy time series data.
NB -- this is different than calculating a moving average and then taking the difference of that moving average for each example.
Thanks,
Rob
NB -- this is different than calculating a moving average and then taking the difference of that moving average for each example.
Thanks,
Rob
0
Answers
Like
1
2
3
4
5
6
7
should become (for window = 2)
1 3
2 5
3 7
4 9
5 11
6 13
7 ?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="390" width="413">
<operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
<operator activated="true" class="generate_id" compatibility="5.2.008" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
<operator activated="true" class="series:moving_average" compatibility="5.2.000" expanded="true" height="76" name="Moving Average" width="90" x="313" y="30">
<parameter key="attribute_name" value="id"/>
<parameter key="window_width" value="2"/>
<parameter key="aggregation_function" value="sum"/>
<parameter key="result_position" value="start"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Moving Average" to_port="example set input"/>
<connect from_op="Moving Average" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Thanks for your reply. For whatever reason, I cannot copy and paste that xml code into rapidminer. Is there something aside from a simple copy and paste that I need to do?
In any event, more specifically:
Window 2
ID Attrib1 Label (created with operator)
1 2 10 (i.e. 4+6)
2 4 14 (i.e. 6+8)
3 6 18
4 8 22
5 10 ??
6 12 ??
So yeah, I think you have the idea right but I can't see the process. T/he key is have it be an operator where I can optimize the window length (ideally when the same variable also controls a moving average that looks backwards. The problem is that one needs to increase the look forward length in a linear fashion to the lookback length when using the window operator). Otherwise, the forward looking label has a look ahead bias.
Thnx again.
Rob
PS I was able to see your process but this is nothing more than the moving average operator that calculates the moving average look backwards, not forwards.
Happy Mining!
~Marius
This is what I get out as a result:
http://img1.uploadscreenshot.com/images/orig/10/27505485476-orig.jpg
Here is the xml.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="390" width="570">
<operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="driller oscillation timeseries"/>
<parameter key="number_examples" value="550"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="180" y="30">
<list key="function_descriptions">
<parameter key="att1" value="att1"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="315" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="att1"/>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="5.2.008" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="450" y="30">
<list key="parameters">
<parameter key="Windowing.window_size" value="[1.0;10;10;linear]"/>
</list>
<process expanded="true" height="390" width="480">
<operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="45" y="30">
<parameter key="horizon" value="5"/>
<parameter key="window_size" value="10"/>
<parameter key="create_label" value="true"/>
<parameter key="label_attribute" value="att1"/>
</operator>
<operator activated="true" class="series:sliding_window_validation" compatibility="5.2.000" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
<parameter key="training_window_width" value="10"/>
<parameter key="test_window_width" value="1"/>
<parameter key="horizon" value="5"/>
<process expanded="true">
<operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" name="Linear Regression"/>
<connect from_port="training" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" name="Apply Model">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.008" expanded="true" name="Performance (2)"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="366" y="62">
<list key="log">
<parameter key="window_size" value="operator.Windowing.parameter.window_size"/>
<parameter key="performance" value="operator.Validation.value.performance"/>
</list>
</operator>
<connect from_port="input 1" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Thanks alot for your help. This has been driving me crazy.
You can take the average yourself after.
Note that linear regression on a window computes an optimal weighted sum, which in special cases, can exactly be the average (all weights equal).
And you actually optimize these weighted on the stuff you are optimizing.
If you insist on using the moving average, you most likely also require the lag series operator.
Look at the example below.
Did not include the Sliding Window Validation this just, just to get some extra variation in the Process design.
Best regards,
Wessel
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="390" width="570">
<operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="driller oscillation timeseries"/>
<parameter key="number_examples" value="550"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="180" y="30">
<list key="function_descriptions">
<parameter key="att1" value="att1"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="315" y="30">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="att1"/>
</operator>
<operator activated="true" class="optimize_parameters_grid" compatibility="5.2.008" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="450" y="30">
<list key="parameters">
<parameter key="Moving Average.window_width" value="[2;10;11;linear]"/>
</list>
<process expanded="true" height="390" width="570">
<operator activated="true" class="series:moving_average" compatibility="5.2.000" expanded="true" height="76" name="Moving Average" width="90" x="45" y="30">
<parameter key="attribute_name" value="att1"/>
<parameter key="window_width" value="10"/>
<parameter key="result_position" value="start"/>
</operator>
<operator activated="true" class="series:lag_series" compatibility="5.2.000" expanded="true" height="76" name="Lag Series" width="90" x="180" y="30">
<list key="attributes">
<parameter key="moving_average(att1)" value="20"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="315" y="30">
<parameter key="name" value="att1"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="moving_average(att1)" value="moving_average(att1)"/>
</list>
</operator>
<operator activated="true" class="filter_examples" compatibility="5.2.008" expanded="true" height="76" name="Filter Examples" width="90" x="450" y="30">
<parameter key="condition_class" value="no_missing_attributes"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="45" y="120"/>
<operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="180" y="120">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="313" y="120"/>
<operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="447" y="120">
<list key="log">
<parameter key="performance" value="operator.Performance.value.performance"/>
<parameter key="window_width" value="operator.Moving Average.parameter.window_width"/>
</list>
</operator>
<connect from_port="input 1" to_op="Moving Average" to_port="example set input"/>
<connect from_op="Moving Average" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
<connect from_op="Lag Series" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Linear Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_op="Log" to_port="through 1"/>
<connect from_op="Log" from_port="through 1" to_port="performance"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_performance" spacing="126"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
<connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
An example :
ID Att1 Label (moving average of Att1 using Windowing length = 2), looking one time step forward) Att2 (moving average of Att1, lookback = 2)
1 2 3 ?
2 4 5 3
3 6 7 5
4 8 9 7
5 10 ? 9
You'll notice that in each example the Att2 incorporates the current label you want to predict. Unless you condition the moving average to sync with you lagging operator you will undoubtedly incorporate your label in your moving averages and the optimize will settle on result which incorporates look ahead bias. Does this make sense? It's basically an ARMA model.
Thanks again for your help. I really appreciate it!
The windowing, and lag operator are there to avoid what you call the "look ahead bias".
I understand how ARMA works, but I'm not sure it really fits into the windowing (embedding) framework.
ARMA has a different philosophy on how to do validation, and it is easy to mix up terminology.
You can simulate ARMA using RapidMiner operators, but this will of course be more clumsy then using an actual implementation.
<process version="5.2.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
<process expanded="true" height="390" width="617">
<operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="29" y="170"/>
<operator activated="true" class="set_macro" compatibility="5.2.008" expanded="true" height="76" name="Set Macro" width="90" x="179" y="165">
<parameter key="macro" value="w"/>
<parameter key="value" value="1"/>
</operator>
<operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="313" y="165"/>
<connect from_op="Generate Data" from_port="output" to_op="Set Macro" to_port="through 1"/>
<connect from_op="Set Macro" from_port="through 1" to_op="Windowing" to_port="example set input"/>
<connect from_op="Windowing" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>