compare values in current and previous row

sebisawasebisawa Member Posts: 6 Contributor II
edited November 2018 in Help
Hi


I want to know intervals of purchases. For example, I have records like

- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06

How I can add "days since the last purchase"?  like

- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days


Answers

  • frasfras Member Posts: 93 Contributor II
    Hi,

    Operator "Generate Attribute" is useful. There are many ways to add new attributes.
    Especially you can manipulate date attributes with "date_add", "date_diff", "date_before", etc.
    Here is an example that adds 5 days to each date:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_sales_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Sales Data" width="90" x="45" y="75"/>
          <operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="date"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="75">
            <list key="function_descriptions">
              <parameter key="date02" value="date_add(date, 5, DATE_UNIT_DAY)"/>
            </list>
          </operator>
          <connect from_op="Generate Sales Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • sebisawasebisawa Member Posts: 6 Contributor II
    Hi,


    Thanks for your reply. What I expected is not how to calculate dates, but retrieve values from previous records and compare with the current ones.
    Isn't it possible?
  • frasfras Member Posts: 93 Contributor II
    Perhaps you should clarify our example. What do you mean by "X" ?
  • sebisawasebisawa Member Posts: 6 Contributor II
    Sorry for confusion.

    - user01,2012/01/01,X
    - user02,2012/01/02,X
    - user01,2012/01/04,3days
    - user02,2012/01/06,4days

    What I would like to know is, how many days passed since the last purchases.
    Line 1 is the first record for user01, so we can't calculate the duration. I meant "X" as null.
    Line 2 is the same as Line 1.
    Line 3, it is the second purchase of user 01, the diff of 2012/01/01 and 2012/01/04 is "3 days".
    Line 4, it is the second purchase of user 02, the diff of 2012/01/02 and 2012/01/06 is "4 days".

    Does it make sense?

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Here's an example that uses the Lag operator that you could try.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="5.3.013" expanded="true" height="76" name="Subprocess" width="90" x="112" y="75">
            <process expanded="true">
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="111" y="30">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user1&quot;"/>
                  <parameter key="date" value="&quot;2013-08-01&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="112" y="120">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user1&quot;"/>
                  <parameter key="date" value="&quot;2013-08-08&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="112" y="210">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user2&quot;"/>
                  <parameter key="date" value="&quot;2013-08-01&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_data_user_specification" compatibility="5.3.013" expanded="true" height="60" name="Generate Data by User Specification (4)" width="90" x="112" y="300">
                <list key="attribute_values">
                  <parameter key="user" value="&quot;user2&quot;"/>
                  <parameter key="date" value="&quot;2013-08-06&quot;"/>
                </list>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="130" name="Append" width="90" x="380" y="75"/>
              <operator activated="true" class="nominal_to_date" compatibility="5.3.013" expanded="true" height="76" name="Nominal to Date" width="90" x="581" y="75">
                <parameter key="attribute_name" value="date"/>
                <parameter key="date_format" value="yyyy-MM-dd"/>
              </operator>
              <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
              <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
              <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
              <connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append" to_port="example set 4"/>
              <connect from_op="Append" from_port="merged set" to_op="Nominal to Date" to_port="example set input"/>
              <connect from_op="Nominal to Date" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.3.013" expanded="true" height="76" name="Loop Values" width="90" x="112" y="165">
            <parameter key="attribute" value="user"/>
            <process expanded="true">
              <operator activated="true" class="filter_examples" compatibility="5.3.013" expanded="true" height="76" name="Filter Examples" width="90" x="179" y="75">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="user=%{loop_value}"/>
              </operator>
              <operator activated="true" class="series:lag_series" compatibility="5.3.000" expanded="true" height="76" name="Lag Series" width="90" x="380" y="75">
                <list key="attributes">
                  <parameter key="date" value="1"/>
                </list>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
              <connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.013" expanded="true" height="76" name="Append (2)" width="90" x="112" y="255"/>
          <operator activated="true" class="rename" compatibility="5.3.013" expanded="true" height="76" name="Rename" width="90" x="246" y="75">
            <parameter key="old_name" value="date-1"/>
            <parameter key="new_name" value="date1"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="246" y="165">
            <list key="function_descriptions">
              <parameter key="diff" value="date_diff(date1,date)/1000/3600/24"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.3.013" expanded="true" height="76" name="Select Attributes" width="90" x="246" y="255">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|user|diff"/>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_op="Append (2)" to_port="example set 1"/>
          <connect from_op="Append (2)" from_port="merged set" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    regards

    Andrew
  • sebisawasebisawa Member Posts: 6 Contributor II
    Hi Andrew


    Thanks for your post, however I can't execute it with RapidMiner 5.3.
    I could not find the operator named "Lag Series". Is this why?

    Aug 13, 2013 7:49:50 PM SEVERE: Process failed: The dummy operator Lag Series (replacing series:lag_series) cannot be executed.
  • wesselwessel Member Posts: 537 Maven
    The lag operator requires time series plugin.

    Maybe you want to experiment with the Script operator.
    Put this inside your script operator and find out what happens:



    ExampleSet es = operator.getInput(ExampleSet.class);

    es.recalculateAllAttributeStatistics();

    for (Attribute a : es.getAttributes()) {
        double mean = es.getStatistics(a, Statistics.AVERAGE);
        String name = a.getName();
        for (Example example : es) {
            example[name] = example[name] - mean;
        }
    }

    double last = 0;
    for (Example e : es) {
        e["a"] = e["id"] + last;
        last = e["id"];
    }

    int size = es.size()
    for (int i = 1; i < size; i++) {
    Example e1 = es.getExample(i-1);
    Example e0 = es.getExample(i);
    e0["b"] = e0["id"] + e1["id"];
    }


    return es;
  • wesselwessel Member Posts: 537 Maven
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="generate_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="driller oscillation timeseries"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="5.3.013" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
          <operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
            <list key="function_descriptions">
              <parameter key="a" value="&quot;z&quot;"/>
              <parameter key="b" value="&quot;z&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="450" y="29">
            <parameter key="script" value="&#10;ExampleSet es = operator.getInput(ExampleSet.class);&#10;&#10;es.recalculateAllAttributeStatistics();&#10;&#10;for (Attribute a : es.getAttributes()) {&#10;    double mean = es.getStatistics(a, Statistics.AVERAGE);&#10;    String name = a.getName();&#10;    for (Example example : es) {&#10;        example[name] = example[name] - mean;&#10;    }&#10;}&#10;&#10;double last = 0;&#10;for (Example e : es) {&#10;    e[&quot;a&quot;] = e[&quot;id&quot;] + last;&#10;    last = e[&quot;id&quot;];&#10;}&#10;&#10;int size = es.size()&#10;for (int i = 1; i &lt; size; i++) {&#10;&#9;Example e1 = es.getExample(i-1);&#10;&#9;Example e0 = es.getExample(i);&#10;&#9;e0[&quot;b&quot;] = e0[&quot;id&quot;] + e1[&quot;id&quot;];&#9;&#9;&#10;}&#10;&#10;&#10;return es;"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • sebisawasebisawa Member Posts: 6 Contributor II
    Hi wessel

    Thanks for your reply. "Execute Script" seems very helpful.
    Where is the document regarding to the scripting?
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    The time series extension is what you need for the Lag operator. 
  • sebisawasebisawa Member Posts: 6 Contributor II
    Hi Andrew and Wessel

    Thanks for your kindly support.

    I could run the process from Andrew by downloading Series Extension from the marketplace.
    However it takes long time to finish processing. I assume it is because the examples must
    be filtered as much as users exists, it is not efficient. So I use this operator without "loop values"
    operator.

    Script one is fantastic and someday I would love to try.

    Best regards
Sign In or Register to comment.