RapidMiner

RapidMiner

compare values in current and previous row

Contributor II

compare values in current and previous row

Hi


I want to know intervals of purchases. For example, I have records like

- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06

How I can add "days since the last purchase"?  like

- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days


11 REPLIES
Regular Contributor

Re: compare values in current and previous row

Hi,

Operator "Generate Attribute" is useful. There are many ways to add new attributes.
Especially you can manipulate date attributes with "date_add", "date_diff", "date_before", etc.
Here is an example that adds 5 days to each date:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_sales_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Sales Data" width="90" x="45" y="75"/>
      <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="date"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="75">
        <list key="function_descriptions">
          <parameter key="date02" value="date_add(date, 5, DATE_UNIT_DAY)"/>
        </list>
      </operator>
      <connect from_op="Generate Sales Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
      <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Contributor II

Re: compare values in current and previous row

Hi,


Thanks for your reply. What I expected is not how to calculate dates, but retrieve values from previous records and compare with the current ones.
Isn't it possible?
Regular Contributor

Re: compare values in current and previous row

Perhaps you should clarify our example. What do you mean by "X" ?
Contributor II

Re: compare values in current and previous row

Sorry for confusion.

- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days

What I would like to know is, how many days passed since the last purchases.
Line 1 is the first record for user01, so we can't calculate the duration. I meant "X" as null.
Line 2 is the same as Line 1.
Line 3, it is the second purchase of user 01, the diff of 2012/01/01 and 2012/01/04 is "3 days".
Line 4, it is the second purchase of user 02, the diff of 2012/01/02 and 2012/01/06 is "4 days".

Does it make sense?

Super Contributor

Re: compare values in current and previous row

Hello

Here's an example that uses the Lag operator that you could try.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="5.3.013" expanded="true" height="76" name="Subprocess" width="90" x="112" y="75">
        <process expanded="true">
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="111" y="30">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user1&quot;"/>
              <parameter key="date" value="&quot;2013-08-01&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="112" y="120">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user1&quot;"/>
              <parameter key="date" value="&quot;2013-08-08&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="112" y="210">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user2&quot;"/>
              <parameter key="date" value="&quot;2013-08-01&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="112" y="300">
            <list key="attribute_values">
              <parameter key="user" value="&quot;user2&quot;"/>
              <parameter key="date" value="&quot;2013-08-06&quot;"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="130" name="Append" width="90" x="380" y="75"/>
          <operator activated="true" class="nominal_to_date" compatibility="5.3.013" expanded="true" height="76" name="Nominal to Date" width="90" x="581" y="75">
            <parameter key="attribute_name" value="date"/>
            <parameter key="date_format" value="yyyy-MM-dd"/>
          </operator>
          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
          <connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append" to_port="example set 4"/>
          <connect from_op="Append" from_port="merged set" to_op="Nominal to Date" to_port="example set input"/>
          <connect from_op="Nominal to Date" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="loop_values" compatibility="5.3.013" expanded="true" height="76" name="Loop Values" width="90" x="112" y="165">
        <parameter key="attribute" value="user"/>
        <process expanded="true">
          <operator activated="true" class="filter_examples" compatibility="5.3.013" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="75">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="user=%{loop_value}"/>
          </operator>
          <operator activated="true" class="series:lag_series" compatibility="5.3.000" expanded="true" height="76" name="Lag Series" width="90" x="380" y="75">
            <list key="attributes">
              <parameter key="date" value="1"/>
            </list>
          </operator>
          <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
          <connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="76" name="Append (2)" width="90" x="112" y="255"/>
      <operator activated="true" class="rename" compatibility="5.3.013" expanded="true" height="76" name="Rename" width="90" x="246" y="75">
        <parameter key="old_name" value="date-1"/>
        <parameter key="new_name" value="date1"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="165">
        <list key="function_descriptions">
          <parameter key="diff" value="date_diff(date1,date)/1000/3600/24"/>
        </list>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.3.013" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="255">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="|user|diff"/>
      </operator>
      <connect from_op="Subprocess" from_port="out 1" to_op="Loop Values" to_port="example set"/>
      <connect from_op="Loop Values" from_port="out 1" to_op="Append (2)" to_port="example set 1"/>
      <connect from_op="Append (2)" from_port="merged set" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>


regards

Andrew
Contributor II

Re: compare values in current and previous row

Hi Andrew


Thanks for your post, however I can't execute it with RapidMiner 5.3.
I could not find the operator named "Lag Series". Is this why?

Aug 13, 2013 7:49:50 PM SEVERE: Process failed: The dummy operator Lag Series (replacing series:lag_series) cannot be executed.
Regular Contributor

Re: compare values in current and previous row

The lag operator requires time series plugin.

Maybe you want to experiment with the Script operator.
Put this inside your script operator and find out what happens:



ExampleSet es = operator.getInput(ExampleSet.class);

es.recalculateAllAttributeStatistics();

for (Attribute a : es.getAttributes()) {
    double mean = es.getStatistics(a, Statistics.AVERAGE);
    String name = a.getName();
    for (Example example : es) {
        example[name] = example[name] - mean;
    }
}

double last = 0;
for (Example e : es) {
    e["a"] = e["id"] + last;
    last = e["id"];
}

int size = es.size()
for (int i = 1; i < size; i++) {
Example e1 = es.getExample(i-1);
Example e0 = es.getExample(i);
e0["b"] = e0["id"] + e1["id"];
}


return es;
Regular Contributor

Re: compare values in current and previous row

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" breakpoints="after" class="generate_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="target_function" value="driller oscillation timeseries"/>
        <parameter key="number_of_attributes" value="2"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="5.3.013" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
      <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
        <list key="function_descriptions">
          <parameter key="a" value="&quot;z&quot;"/>
          <parameter key="b" value="&quot;z&quot;"/>
        </list>
      </operator>
      <operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="450" y="29">
        <parameter key="script" value="&#10;ExampleSet es = operator.getInput(ExampleSet.class);&#10;&#10;es.recalculateAllAttributeStatistics();&#10;&#10;for (Attribute a : es.getAttributes()) {&#10;    double mean = es.getStatistics(a, Statistics.AVERAGE);&#10;    String name = a.getName();&#10;    for (Example example : es) {&#10;        example[name] = example[name] - mean;&#10;    }&#10;}&#10;&#10;double last = 0;&#10;for (Example e : es) {&#10;    e[&quot;a&quot;] = e[&quot;id&quot;] + last;&#10;    last = e[&quot;id&quot;];&#10;}&#10;&#10;int size = es.size()&#10;for (int i = 1; i &lt; size; i++) {&#10;&#9;Example e1 = es.getExample(i-1);&#10;&#9;Example e0 = es.getExample(i);&#10;&#9;e0[&quot;b&quot;] = e0[&quot;id&quot;] + e1[&quot;id&quot;];&#9;&#9;&#10;}&#10;&#10;&#10;return es;"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
      <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Contributor II

Re: compare values in current and previous row

Hi wessel

Thanks for your reply. "Execute Script" seems very helpful.
Where is the document regarding to the scripting?