Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Forward looking moving average or moving sum

RMRRRMRR Member Posts: 12 Contributor II
edited November 2018 in Help
Does anyone know how to calculated a forward looking moving sum or average (for time series data)?  Is there an operator that does this so that the look forward length is a variable than can be optimized?  From what I can tell, this is not possible.  This is very useful for averaging out noisy time series data.

NB -- this is different than calculating a moving average and then taking the difference of that moving average for each example.

Thanks,

Rob

Answers

  • wesselwessel Member Posts: 537 Maven
    Can you give an example of what you want?

    Like

    1
    2
    3
    4
    5
    6
    7

    should become (for window = 2)

    1 3
    2 5
    3 7
    4 9
    5 11
    6 13
    7 ?






    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="390" width="413">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="generate_id" compatibility="5.2.008" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
          <operator activated="true" class="series:moving_average" compatibility="5.2.000" expanded="true" height="76" name="Moving Average" width="90" x="313" y="30">
            <parameter key="attribute_name" value="id"/>
            <parameter key="window_width" value="2"/>
            <parameter key="aggregation_function" value="sum"/>
            <parameter key="result_position" value="start"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Moving Average" to_port="example set input"/>
          <connect from_op="Moving Average" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>





  • RMRRRMRR Member Posts: 12 Contributor II
    Hi wessel,

    Thanks for your reply. For whatever reason, I cannot copy and paste that xml code into rapidminer.  Is there something aside from a simple copy and paste that I need to do?

    In any event, more specifically:

    Window 2

    ID Attrib1 Label (created with operator)
    1 2 10 (i.e. 4+6)
    2 4 14 (i.e. 6+8)
    3 6 18
    4 8 22
    5 10 ??
    6 12 ??

    So yeah, I think you have the idea right but I can't see the process.  T/he key is have it be an operator where I can optimize the window length (ideally when the same variable also controls a moving average that looks backwards.  The problem is that one needs to increase the look forward length in a linear fashion to the lookback length when using the window operator).   Otherwise, the forward looking label has a look ahead bias.

    Thnx again.

    Rob

    PS I was able to see your process but this is nothing more than the moving average operator that calculates the moving average look backwards, not forwards.

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    RMRR wrote:
    Is there something aside from a simple copy and paste that I need to do?
    Yes, you need to press the button with the green checkmark.

    Happy Mining!
    ~Marius
  • RMRRRMRR Member Posts: 12 Contributor II
    Thanks!  I did this and revised my comment.

  • wesselwessel Member Posts: 537 Maven
    Hey,

    This is what I get out as a result:
    image
    http://img1.uploadscreenshot.com/images/orig/10/27505485476-orig.jpg



    Here is the xml.




    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="390" width="570">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="driller oscillation timeseries"/>
            <parameter key="number_examples" value="550"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="180" y="30">
            <list key="function_descriptions">
              <parameter key="att1" value="att1"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="315" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="att1"/>
          </operator>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.2.008" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="450" y="30">
            <list key="parameters">
              <parameter key="Windowing.window_size" value="[1.0;10;10;linear]"/>
            </list>
            <process expanded="true" height="390" width="480">
              <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="45" y="30">
                <parameter key="horizon" value="5"/>
                <parameter key="window_size" value="10"/>
                <parameter key="create_label" value="true"/>
                <parameter key="label_attribute" value="att1"/>
              </operator>
              <operator activated="true" class="series:sliding_window_validation" compatibility="5.2.000" expanded="true" height="112" name="Validation" width="90" x="179" y="30">
                <parameter key="training_window_width" value="10"/>
                <parameter key="test_window_width" value="1"/>
                <parameter key="horizon" value="5"/>
                <process expanded="true">
                  <operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" name="Linear Regression"/>
                  <connect from_port="training" to_op="Linear Regression" to_port="training set"/>
                  <connect from_op="Linear Regression" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" name="Apply Model">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" name="Performance (2)"/>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
                  <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="366" y="62">
                <list key="log">
                  <parameter key="window_size" value="operator.Windowing.parameter.window_size"/>
                  <parameter key="performance" value="operator.Validation.value.performance"/>
                </list>
              </operator>
              <connect from_port="input 1" to_op="Windowing" to_port="example set input"/>
              <connect from_op="Windowing" from_port="example set output" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • RMRRRMRR Member Posts: 12 Contributor II
    This does not include a moving average and also has the issue that the windowing feature and the horizon will cross such that when the windowing length is => the target horizon.  If you include the moving average feature you run into the same issue.  You are always looking ahead.

    Thanks alot for your help.  This has been driving me crazy.
  • wesselwessel Member Posts: 537 Maven
    Normally, you want to use the windowing operator.
    You can take the average yourself after.
    Note that linear regression on a window computes an optimal weighted sum, which in special cases, can exactly be the average (all weights equal).
    And you actually optimize these weighted on the stuff you are optimizing.

    If you insist on using the moving average, you most likely also require the lag series operator.
    Look at the example below.
    Did not include the Sliding Window Validation this just, just to get some extra variation in the Process design.

    Best regards,

    Wessel
  • wesselwessel Member Posts: 537 Maven
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="390" width="570">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="driller oscillation timeseries"/>
            <parameter key="number_examples" value="550"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="180" y="30">
            <list key="function_descriptions">
              <parameter key="att1" value="att1"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="315" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="att1"/>
          </operator>
          <operator activated="true" class="optimize_parameters_grid" compatibility="5.2.008" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="450" y="30">
            <list key="parameters">
              <parameter key="Moving Average.window_width" value="[2;10;11;linear]"/>
            </list>
            <process expanded="true" height="390" width="570">
              <operator activated="true" class="series:moving_average" compatibility="5.2.000" expanded="true" height="76" name="Moving Average" width="90" x="45" y="30">
                <parameter key="attribute_name" value="att1"/>
                <parameter key="window_width" value="10"/>
                <parameter key="result_position" value="start"/>
              </operator>
              <operator activated="true" class="series:lag_series" compatibility="5.2.000" expanded="true" height="76" name="Lag Series" width="90" x="180" y="30">
                <list key="attributes">
                  <parameter key="moving_average(att1)" value="20"/>
                </list>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.2.008" expanded="true" height="76" name="Set Role" width="90" x="315" y="30">
                <parameter key="name" value="att1"/>
                <parameter key="target_role" value="label"/>
                <list key="set_additional_roles">
                  <parameter key="moving_average(att1)" value="moving_average(att1)"/>
                </list>
              </operator>
              <operator activated="true" class="filter_examples" compatibility="5.2.008" expanded="true" height="76" name="Filter Examples" width="90" x="450" y="30">
                <parameter key="condition_class" value="no_missing_attributes"/>
              </operator>
              <operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="45" y="120"/>
              <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="180" y="120">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance" width="90" x="313" y="120"/>
              <operator activated="true" class="log" compatibility="5.2.008" expanded="true" height="76" name="Log" width="90" x="447" y="120">
                <list key="log">
                  <parameter key="performance" value="operator.Performance.value.performance"/>
                  <parameter key="window_width" value="operator.Moving Average.parameter.window_width"/>
                </list>
              </operator>
              <connect from_port="input 1" to_op="Moving Average" to_port="example set input"/>
              <connect from_op="Moving Average" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
              <connect from_op="Lag Series" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
              <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_op="Linear Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_op="Log" to_port="through 1"/>
              <connect from_op="Log" from_port="through 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="126"/>
              <portSpacing port="sink_result 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • RMRRRMRR Member Posts: 12 Contributor II
    The problem here is that the moving average and lag need to move together since one is dependent on the another if you want avoid a look ahead bias in time series data.  With noisy data, you'd like to find the optimal moving average (looking back) to predict some average move in your forecast horizon.  The issue is that one doesn't know how far back to average in order to generate a forward looking average prediction.

    An example :

    ID Att1 Label (moving average of Att1 using Windowing length = 2), looking one time step forward) Att2 (moving average of Att1, lookback = 2)
    1 2                 3                                                                                                                                    ?
    2 4                 5                                                                                                                                         3
    3 6                 7                                                                                                                                      5
    4 8                    9                                                                                                                                      7
    5 10            ?                                                                                                                                         9

    You'll notice that in each example the Att2 incorporates the current label you want to predict.  Unless you condition the moving average to sync with you lagging operator you will undoubtedly incorporate your label in your moving averages and the optimize will settle on result which incorporates look ahead bias.  Does this make sense?  It's basically an ARMA model.

    Thanks again for your help.  I really appreciate it!
  • wesselwessel Member Posts: 537 Maven
    Everything you describe is perfectly possible in RapidMiner.
    The windowing, and lag operator are there to avoid what you call the "look ahead bias".
    I understand how ARMA works, but I'm not sure it really fits into the windowing (embedding) framework.
    ARMA has a different philosophy on how to do validation, and it is easy to mix up terminology.
    You can simulate ARMA using RapidMiner operators, but this will of course be more clumsy then using an actual implementation.
  • RMRRRMRR Member Posts: 12 Contributor II
    Right so how does one include this condition: that the lookback horizon depends on the forward moving average operator?  This was my initial problem.
  • wesselwessel Member Posts: 537 Maven
    You can use macros to change parameters in both operators.
  • RMRRRMRR Member Posts: 12 Contributor II
    Right but I'm quite confused by how macros are written.  Can you provide some help in this regard?  Thanks!!
  • RMRRRMRR Member Posts: 12 Contributor II
    Help, please.  Anyone? 
  • wesselwessel Member Posts: 537 Maven
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="390" width="617">
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="29" y="170"/>
          <operator activated="true" class="set_macro" compatibility="5.2.008" expanded="true" height="76" name="Set Macro" width="90" x="179" y="165">
            <parameter key="macro" value="w"/>
            <parameter key="value" value="1"/>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="5.2.000" expanded="true" height="76" name="Windowing" width="90" x="313" y="165"/>
          <connect from_op="Generate Data" from_port="output" to_op="Set Macro" to_port="through 1"/>
          <connect from_op="Set Macro" from_port="through 1" to_op="Windowing" to_port="example set input"/>
          <connect from_op="Windowing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.