Use previous value to calculate next value

amundamund Member Posts: 22 Maven
edited November 2018 in Help

Hi,

 

I'm trying to perform a calculation that is very easy to do in Excel, but I can't find a way to do in RapidMiner. I expected this to be a little tricky in RM, but also thought this would be a common question on the forum. I'm still new to RM, so it's very possible that I have overlooked something.

 

Basically I need the previous value of an attribute to calculate the next.

 

 

Here's an example of a similar calculation in Excel. To calculate B2:

B2 = (B1*2 + A2) / 10

 

To calculate B3:

B3 = (B2*2 + A3) / 10

 

And so on. So I'm using the current value of A, and the previous value of B to get the value of B.

OK. I tried the lag operator, but it can't deliver the previous value of 'itself'. Do I need to use a loop operator? If someone could help me with an example I'd be very happy.

 

 

Answers

  • amundamund Member Posts: 22 Maven

    Ok, I think I'm a little closer to the solution, but I'm still stuck.

     

    It seems like it can be done with a combination of Extract Macro and Loop Examples.

    However I can't find a way to set the example index to be the previous value.

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    That's not hard to do at all, you were on the right track with the Lag operator. See below:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.5.001" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="179" y="34">
    <list key="attributes">
    <parameter key="a1" value="1"/>
    </list>
    <description align="center" color="transparent" colored="false" width="126">Lag Close value by 1</description>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.5.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="My Excel Like Math" value="(a1-[a1-1])*10"/>
    </list>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.5.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="a1-1"/>
    <parameter key="invert_selection" value="true"/>
    <description align="center" color="transparent" colored="false" width="126">Remove Lagged column</description>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="7.5.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="581" y="34">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="default" value="zero"/>
    <list key="columns"/>
    <description align="center" color="transparent" colored="false" width="126">Replace missing value in Log Returns columns by 0</description>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • amundamund Member Posts: 22 Maven

    That code is very clever, but I actually need to use the previous value of the Math-attribute.

     

    So.. 

     

    Say we have the a1 attribute and the Math-attribute from the example.

    The math-attribute should use the current value of a1, and the previous value of itself.

     

    Something like this.

     

    Row no.         a1           Math

    1                    1.5            

    2                    1.2           (prev. value of Math * 9 + 1.2) / 10

    3                    1.6           (prev. value of Math * 9 + 1.6) / 10

    4                    1.3           (prev. value of Math * 9 + 1.3) / 10

     

    I can't do that with the lag operator, right?

  • amundamund Member Posts: 22 Maven

    Any clue, @Thomas_Ott? I'm unable to move forward with my grand plan.

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hey, 

     

    Lag as Tom proposed in combination with Generate Attributes should do the trick.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • amundamund Member Posts: 22 Maven

    I have tried this, but it doesn't solve my problem. The lag operator can give me the previous value of att1, but that's not what I'm looking for here. I need to create a new attribute that takes the current value of att1, and the previous value of itself to get the current value.

     

    Again - in excel - to calculate cell B2, i would use A2 and B1. Cell B3 requires A3 and B2 values. And so on. Lagging either attributes doesn't do the trick here (or at least not without some additional logic)

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    exactly this is what you can do with Lag+Generate Attributes. See attached process.

     

    Best,

    Martin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="7.3.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="number_of_attributes" value="2"/>
    </operator>
    <operator activated="true" class="series:lag_series" compatibility="5.3.000" expanded="true" height="82" name="Lag Series" width="90" x="179" y="34">
    <list key="attributes">
    <parameter key="att1" value="1"/>
    </list>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="New Attribute" value="att2+[att1-1]"/>
    </list>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • amundamund Member Posts: 22 Maven

    I haven't been able to express myself clearly about this and I'm sorry about this. Normally your solution would work, but it doesn't here and the reason is I need to perform some calculations in the new attribute.

     

    First let's start with this data set.

     

    Row Id           Gain

    1                         5

    2                         0

    3                         3

    4                         1

    5                         0

    6                         2

    ..

     

    Now. We need to generate a new attribute. Let's just label it calc1.

    calc1 is calculated like this:

     

    ( (Previous Value of calc1 * 13) + Current value of 'Gain') / 14

     

    Our new data set then looks like this (calculations explained on right side)

     

    Row Id           Gain           Calc1

    1                         5               0,357      <-  (0*13+5)/14

    2                         0               0,331      <-  (0,357*13+0)/14

    3                         3               0,562      <-  (0,331*13+3)/14

    4                         1               0,593      <-  (0,562*13+1)/14

    5                         0               0,551      <-  (0,593*13+0)/14

    6                         2               0,654      <-  (0,551*13+2)/14

    ..

     

     

    As you can see this is different because the new attribute we are creating really needs to access the previous value of itself, to create the next. Because of the calculation we are performing, lagging is not appropriate (whereas if you simply needed to add current and previous values together, lagging would be perfect).

     

     

  • amundamund Member Posts: 22 Maven

    I haven't been able to express myself clearly about this and I'm sorry about this. Normally your solution would work, but it doesn't here and the reason is I need to perform some calculations in the new attribute.

     

    First let's start with this data set.

     

    Row Id           Gain

    1                         5

    2                         0

    3                         3

    4                         1

    5                         0

    ..

     

    Now. We need to generate a new attribute. Let's just label it calc1.

    calc1 is calculated like this:

     

    ( (Previous Value of calc1 * 13) + Current value of 'Gain') / 14

     

    Our new data set then looks like this (calculations explained on right side)

     

    Row Id           Gain           Calc1

    1                         5               0,357      <-  (0*13+5)/14

    2                         0               0,331      <-  (0,357*13+0)/14

    3                         3               0,562      <-  (0,331*13+3)/14

    4                         1               0,593      <-  (0,562*13+1)/14

    5                         0               0,551      <-  (0,593*13+0)/14

    ..

     

     

    As you can see this is different because the new attribute we are creating really needs to access the previous value of itself, to create the next. Because of the calculation we are performing, lagging is not appropriate (whereas if you simply needed to add current and previous values together, lagging would be perfect).

  • athanasios_ladaathanasios_lada Member Posts: 1 Contributor I

    Hey,

     

    have you found a solution to this?

     

     

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi @amund,

     

     

    I think I managed to do what you want:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="9.0.002" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="number_of_attributes" value="2"/>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="9.0.002" expanded="true" height="68" name="Extract Macro" width="90" x="179" y="34">
    <parameter key="macro" value="previous"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="att2"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="loop_examples" compatibility="9.0.002" expanded="true" height="82" name="Loop Examples" width="90" x="380" y="34">
    <process expanded="true">
    <operator activated="true" class="branch" compatibility="9.0.002" expanded="true" height="82" name="Branch" width="90" x="380" y="34">
    <parameter key="condition_type" value="macro_defined"/>
    <parameter key="condition_value" value="eval(%{example}) == 1"/>
    <process expanded="true">
    <connect from_port="condition" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    <process expanded="true">
    <operator activated="true" class="generate_macro" compatibility="9.0.002" expanded="true" height="82" name="Generate Macro" width="90" x="112" y="34">
    <list key="function_descriptions">
    <parameter key="previous" value="(eval(%{previous}) * 9 + 1.2) / 10"/>
    </list>
    </operator>
    <operator activated="true" class="set_data" compatibility="9.0.002" expanded="true" height="82" name="Set Data" width="90" x="313" y="34">
    <parameter key="example_index" value="%{example}"/>
    <parameter key="attribute_name" value="att2"/>
    <parameter key="value" value="%{previous}"/>
    <list key="additional_values"/>
    </operator>
    <connect from_port="condition" to_op="Generate Macro" to_port="through 1"/>
    <connect from_op="Generate Macro" from_port="through 1" to_op="Set Data" to_port="example set input"/>
    <connect from_op="Set Data" from_port="example set output" to_port="input 1"/>
    <portSpacing port="source_condition" spacing="0"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_input 1" spacing="0"/>
    <portSpacing port="sink_input 2" spacing="0"/>
    </process>
    </operator>
    <connect from_port="example set" to_op="Branch" to_port="condition"/>
    <connect from_op="Branch" from_port="input 1" to_port="example set"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="sink_example set" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Loop Examples" to_port="example set"/>
    <connect from_op="Loop Examples" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Values depend on the first value (row 1) of att2, the others are actually replaced. I hope it helps!

     

    Best regards,

    Sebastian

Sign In or Register to comment.