The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
compare values in current and previous row
Hi
I want to know intervals of purchases. For example, I have records like
- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06
How I can add "days since the last purchase"? like
- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days
I want to know intervals of purchases. For example, I have records like
- user01,2012/01/01
- user02,2012/01/02
- user01,2012/01/04
- user02,2012/01/06
How I can add "days since the last purchase"? like
- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days
0
Answers
Operator "Generate Attribute" is useful. There are many ways to add new attributes.
Especially you can manipulate date attributes with "date_add", "date_diff", "date_before", etc.
Here is an example that adds 5 days to each date:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="generate_sales_data" compatibility="5.3.008" expanded="true" height="60" name="Generate Sales Data" width="90" x="45" y="75"/>
<operator activated="true" class="select_attributes" compatibility="5.3.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="75">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="date"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="5.3.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="313" y="75">
<list key="function_descriptions">
<parameter key="date02" value="date_add(date, 5, DATE_UNIT_DAY)"/>
</list>
</operator>
<connect from_op="Generate Sales Data" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
<connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Thanks for your reply. What I expected is not how to calculate dates, but retrieve values from previous records and compare with the current ones.
Isn't it possible?
- user01,2012/01/01,X
- user02,2012/01/02,X
- user01,2012/01/04,3days
- user02,2012/01/06,4days
What I would like to know is, how many days passed since the last purchases.
Line 1 is the first record for user01, so we can't calculate the duration. I meant "X" as null.
Line 2 is the same as Line 1.
Line 3, it is the second purchase of user 01, the diff of 2012/01/01 and 2012/01/04 is "3 days".
Line 4, it is the second purchase of user 02, the diff of 2012/01/02 and 2012/01/06 is "4 days".
Does it make sense?
Here's an example that uses the Lag operator that you could try. regards
Andrew
Thanks for your post, however I can't execute it with RapidMiner 5.3.
I could not find the operator named "Lag Series". Is this why?
Aug 13, 2013 7:49:50 PM SEVERE: Process failed: The dummy operator Lag Series (replacing series:lag_series) cannot be executed.
Maybe you want to experiment with the Script operator.
Put this inside your script operator and find out what happens:
ExampleSet es = operator.getInput(ExampleSet.class);
es.recalculateAllAttributeStatistics();
for (Attribute a : es.getAttributes()) {
double mean = es.getStatistics(a, Statistics.AVERAGE);
String name = a.getName();
for (Example example : es) {
example[name] = example[name] - mean;
}
}
double last = 0;
for (Example e : es) {
e["a"] = e["id"] + last;
last = e["id"];
}
int size = es.size()
for (int i = 1; i < size; i++) {
Example e1 = es.getExample(i-1);
Example e0 = es.getExample(i);
e0["b"] = e0["id"] + e1["id"];
}
return es;
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="generate_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="target_function" value="driller oscillation timeseries"/>
<parameter key="number_of_attributes" value="2"/>
</operator>
<operator activated="true" class="generate_id" compatibility="5.3.013" expanded="true" height="76" name="Generate ID" width="90" x="180" y="30"/>
<operator activated="true" class="generate_attributes" compatibility="5.3.013" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
<list key="function_descriptions">
<parameter key="a" value=""z""/>
<parameter key="b" value=""z""/>
</list>
</operator>
<operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="450" y="29">
<parameter key="script" value=" ExampleSet es = operator.getInput(ExampleSet.class); es.recalculateAllAttributeStatistics(); for (Attribute a : es.getAttributes()) { double mean = es.getStatistics(a, Statistics.AVERAGE); String name = a.getName(); for (Example example : es) { example[name] = example[name] - mean; } } double last = 0; for (Example e : es) { e["a"] = e["id"] + last; last = e["id"]; } int size = es.size() for (int i = 1; i < size; i++) { 	Example e1 = es.getExample(i-1); 	Example e0 = es.getExample(i); 	e0["b"] = e0["id"] + e1["id"];		 } return es;"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Script" to_port="input 1"/>
<connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Thanks for your reply. "Execute Script" seems very helpful.
Where is the document regarding to the scripting?
Thanks for your kindly support.
I could run the process from Andrew by downloading Series Extension from the marketplace.
However it takes long time to finish processing. I assume it is because the examples must
be filtered as much as users exists, it is not efficient. So I use this operator without "loop values"
operator.
Script one is fantastic and someday I would love to try.
Best regards