[Solved] Calculating the deltas between following examples

qwertzqwertz Member Posts: 130 Contributor II
edited November 2018 in Help
Dear all,

I have a data set like this

i=id  att1    att2
1     5       1
2     8       4
3     3       3
4     4       7

Now I would like to transform this into a new example set by applying the following rule:
Subtract example i+1 of attribute x by example i of the same attribute (e.g. "8-5")
Even better would be a custom formula that allows to calculate the percental change between two following examples (e.g. "(8-5)/5*100" )


I tried the "distance transformation" operator of the series extension for Rapidminer. However, it only provides absolutes while it remains unclear wheter the delta is positive or negative. Moreover, this operator additionally requires transformation from data to series and back.

Another way I could think of is to use the "windowing" operator by generating additional attributes shifted by one example. Then one could apply the "generate attributes" operator for calculation. However, I wasn't able so far to figure out a working process.
Especially as I have to run it with different attributes all the time so that an automated handling of the attribute's names would be highly appreciated.


Search tags "delta" and "distance" revealed no useful results.



Looking forward to hearing from you
Sachs

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi,
    Loop though the examples using macros.
    Best H
  • swissrussswissruss RapidMiner Certified Expert, Member Posts: 11 Contributor II
    Hi Sachs,

    I think you're on the right track with the series/windowing operators. The ones you're looking for are "Lag" (which finds the previous value) and "Differentiate" (which finds the difference in absolute (signed) terms). Then all you need to need to do is generate the % based on these two values. Since both operators require an attribute as argument, you need to wrap them in a Loop Attributes Operator to repeat for multiple attributes in an example set. I'll attach examples for single and multiple attributes using the Iris dataset (nonsense values of course), which you should be able to adapt - as soon as I've worked out how!

    Cheers,

    Russ
  • swissrussswissruss RapidMiner Certified Expert, Member Posts: 11 Contributor II
    Hi Sachs,

    Looks like attachments aren't possible (really?!), here's the XML, just cut out, save and import.

    Single Attribute:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true" height="521" width="955">
          <operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="id|a1|"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="series:differentiate_example_set" compatibility="5.2.000" expanded="true" height="76" name="Differentiate" width="90" x="313" y="75">
            <parameter key="attribute_name" value="a1"/>
            <parameter key="change_mode" value="difference"/>
            <parameter key="lag" value="1"/>
            <parameter key="keep_original_attribute" value="true"/>
          </operator>
          <operator activated="true" class="series:lag_series" compatibility="5.2.000" expanded="true" height="76" name="Lag Series" width="90" x="447" y="75">
            <list key="attributes">
              <parameter key="a1" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="581" y="75">
            <parameter key="old_name" value="change(a1)"/>
            <parameter key="new_name" value="change_a1"/>
            <list key="rename_additional_attributes">
              <parameter key="a1-1" value="lag_a1"/>
            </list>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.006" expanded="true" height="76" name="Generate Attributes" width="90" x="715" y="75">
            <list key="function_descriptions">
              <parameter key="pcchange(a1)" value="change_a1/lag_a1"/>
            </list>
            <parameter key="use_standard_constants" value="true"/>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (2)" width="90" x="849" y="75">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="change_a1"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Differentiate" to_port="example set input"/>
          <connect from_op="Differentiate" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
          <connect from_op="Lag Series" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Multiple Attributes:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true" height="521" width="882">
          <operator activated="true" class="retrieve" compatibility="5.2.006" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="|id|a2|a1"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="5.2.006" expanded="true" height="60" name="Loop Attributes" width="90" x="380" y="75">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="|a2|a1"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="iteration_macro" value="loop_attribute"/>
            <process expanded="true" height="575" width="992">
              <operator activated="true" class="series:differentiate_example_set" compatibility="5.2.000" expanded="true" height="76" name="Differentiate" width="90" x="45" y="30">
                <parameter key="attribute_name" value="%{loop_attribute}"/>
                <parameter key="change_mode" value="difference"/>
                <parameter key="lag" value="1"/>
                <parameter key="keep_original_attribute" value="true"/>
              </operator>
              <operator activated="true" class="series:lag_series" compatibility="5.2.000" expanded="true" height="76" name="Lag Series" width="90" x="179" y="75">
                <list key="attributes">
                  <parameter key="%{loop_attribute}" value="1"/>
                </list>
              </operator>
              <operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="380" y="120">
                <parameter key="old_name" value="change(%{loop_attribute})"/>
                <parameter key="new_name" value="change_%{loop_attribute}"/>
                <list key="rename_additional_attributes">
                  <parameter key="%{loop_attribute}-1" value="lag_%{loop_attribute}"/>
                </list>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="5.2.006" expanded="true" height="76" name="Generate Attributes" width="90" x="514" y="120">
                <list key="function_descriptions">
                  <parameter key="pcchange(%{loop_attribute})" value="change_%{loop_attribute}/lag_%{loop_attribute}"/>
                </list>
                <parameter key="use_standard_constants" value="true"/>
                <parameter key="keep_all" value="true"/>
              </operator>
              <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes (2)" width="90" x="715" y="120">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attribute" value="change_%{loop_attribute}"/>
                <parameter key="attributes" value="|change_%{loop_attribute}|lag_%{loop_attribute}"/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="true"/>
                <parameter key="include_special_attributes" value="false"/>
              </operator>
              <connect from_port="example set" to_op="Differentiate" to_port="example set input"/>
              <connect from_op="Differentiate" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
              <connect from_op="Lag Series" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
              <connect from_op="Select Attributes (2)" from_port="example set output" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Loop Attributes" to_port="example set"/>
          <connect from_op="Loop Attributes" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    HTH,

    Russ
  • qwertzqwertz Member Posts: 130 Contributor II


    Hi there!

    That really worked out :) Thank you!

    I would have never expected the function "difference" under an operator called "differentiate".
    Isn't that something completly different?
    Anyway, glad to have this operator being part of Rapidminer :)


    @haddock: Just to get the idea behing your approach: Do you mean something like in the attached code? While I loop through the examples I store the last one in a macro to do calculation before iterating to the next example.
    Observation 1: The very first calculated value is wrong because I need to initialize the macro. Of course, this could be filtered / corrected later after the loop.
    Observation 2: It is not possible to use the "generate attributes" operator in the loop because that way it would overwrite the new attribute all the time and in the end it would read the same value in all lines.
    That's probably not surprising to the more experienced user but I wanted to share what I came across on my learning curve.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="161" width="547">
          <operator activated="true" class="generate_data" compatibility="5.2.003" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30"/>
          <operator activated="true" class="set_macro" compatibility="5.2.003" expanded="true" height="76" name="Set Macro" width="90" x="179" y="30">
            <parameter key="macro" value="last"/>
            <parameter key="value" value="1"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.003" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="30">
            <list key="function_descriptions">
              <parameter key="new" value="0"/>
            </list>
          </operator>
          <operator activated="true" class="loop_examples" compatibility="5.2.003" expanded="true" height="76" name="Loop Examples" width="90" x="447" y="30">
            <process expanded="true" height="512" width="640">
              <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro (2)" width="90" x="45" y="30">
                <parameter key="macro" value="current"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="att1"/>
                <parameter key="example_index" value="%{example}"/>
              </operator>
              <operator activated="true" class="generate_macro" compatibility="5.2.003" expanded="true" height="76" name="Generate Macro" width="90" x="179" y="30">
                <list key="function_descriptions">
                  <parameter key="result" value="%{current}/%{last}"/>
                </list>
              </operator>
              <operator activated="true" class="set_data" compatibility="5.2.003" expanded="true" height="76" name="Set Data" width="90" x="313" y="30">
                <parameter key="example_index" value="%{example}"/>
                <parameter key="attribute_name" value="new"/>
                <parameter key="value" value="%{result}"/>
                <list key="additional_values"/>
              </operator>
              <operator activated="true" class="extract_macro" compatibility="5.2.003" expanded="true" height="60" name="Extract Macro" width="90" x="447" y="30">
                <parameter key="macro" value="last"/>
                <parameter key="macro_type" value="data_value"/>
                <parameter key="attribute_name" value="att1"/>
                <parameter key="example_index" value="%{example}"/>
              </operator>
              <connect from_port="example set" to_op="Extract Macro (2)" to_port="example set"/>
              <connect from_op="Extract Macro (2)" from_port="example set" to_op="Generate Macro" to_port="through 1"/>
              <connect from_op="Generate Macro" from_port="through 1" to_op="Set Data" to_port="example set input"/>
              <connect from_op="Set Data" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
              <connect from_op="Extract Macro" from_port="example set" to_port="example set"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Set Macro" to_port="through 1"/>
          <connect from_op="Set Macro" from_port="through 1" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Loop Examples" to_port="example set"/>
          <connect from_op="Loop Examples" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    PS: Indeed, there is no upload function - at least not to my knowledge as I was looking for it also.



    Thank you all
    Sachs
  • swissrussswissruss RapidMiner Certified Expert, Member Posts: 11 Contributor II
    No problem! I guess differentiate is usually associated with the meaning assigned to it in calculus, but I'm not a series specialist, so maybe it's the correct expression in that context!

    Glad it helped

    Russ
Sign In or Register to comment.