Access next row data

bea11005bea11005 Member Posts: 20 Maven
edited December 2018 in Help

Hello.

I'm doing my final work at university and I get some doubts.

In first place I wanna know if there's some way to access data in the next row.

In order to access previous row data I used Lag series operator but I can't find the way to do so on the next register.

 

My data is like this:

Discussion  Userid   Parent  Created   Modified

1                  1           0           12            14

1                  2           82         15            16

1                  1           85          17            20

1                  3           85         22             24

2                  45         0           26             32

2                  48         89         33             34

2                  46         90         34             35

I wanna calculate, for each userid, difference between modified(i+1)-created(i).

The attribute parent=0 means that's the first message on a discussion.

With that I wanna to calculate how many time is the between a message from a userid and his response.

For the first row I wanna 1 1 0 (16-12)=4

How can I do that? Is there a way to know what row corresponds to the last message of a discussion? How can I underline the previous row of a row with parent=0?

Best Answer

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    You can generate temporary unique ids using the Generate ID operator upstream and do the joins downstream. Then you can use a Select Attributes with invert toggled on to select that ID column attribute out. I do this all the time.

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you reverse the Sort order of your dataset then you should be able to use Lag again for this.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • bea11005bea11005 Member Posts: 20 Maven

    I don't understand.....can you explain me better?

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @bea11005 - perhaps this will help.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Untitled 6" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//RapidMiner OneDrive/random community stuff/Untitled 6"/>
    </operator>
    <operator activated="true" class="numerical_to_polynominal" compatibility="7.6.001" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Discussion"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="concurrency:loop_values" compatibility="7.6.001" expanded="true" height="82" name="Loop Values" width="90" x="313" y="34">
    <parameter key="attribute" value="Discussion"/>
    <parameter key="iteration_macro" value="id"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="45" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="Discussion.equals.%{id}"/>
    </list>
    </operator>
    <operator activated="true" class="sort" compatibility="7.6.001" expanded="true" height="82" name="Sort" width="90" x="179" y="34">
    <parameter key="attribute_name" value="Discussion"/>
    </operator>
    <operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="112" y="34">
    <list key="attributes">
    <parameter key="Created" value="1"/>
    </list>
    </operator>
    <connect from_port="in 1" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <process expanded="true">
    <connect from_port="in 1" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
    <list key="function_descriptions">
    <parameter key="DIFFERENCE" value="Modified-[Created-1]"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Handle Exception" to_port="in 1"/>
    <connect from_op="Handle Exception" from_port="out 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Untitled 6" from_port="output" to_op="Numerical to Polynominal" to_port="example set input"/>
    <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Loop Values" to_port="input 1"/>
    <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Basically you need to Sort each discussion first, then Lag.  See my process.


    Scott

     

  • bea11005bea11005 Member Posts: 20 Maven

    I can't use Loop Values......the process ends with no exit....

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @bea11005 - I'd recommend posting your XML process here (see "Read Before Posting" on right when you reply) and attach your dataset. This way we can replicate what you're doing and help you better.

     

    Scott

     

  • bea11005bea11005 Member Posts: 20 Maven

    Telcontar120 I can reverse the order twice on modified attribute because if I do, messages change their order and my process wouldn't be correct......

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi,

     

    if you have unique keys (IDs) in your example set, you can create a copy of it using Multiply, sort that the way you want, generate the required attribute, and join back based on the ID.

     

    Regards,

    Balázs

  • bea11005bea11005 Member Posts: 20 Maven

    I don't have unique id's......so I can't.

    Other thing I wanna know is that if it's possible to split my data depending on the value of attribute discussion.

    I wanna calculate difference between messages until I arrive to the last message of a discussion, where the distance will be 0 because ther'e no next message. I need this modified(i+1)-created(i) for all the messages except de last in a discussion.

    I've tried Loop values but I can't get any exit of this process...... how can I do both things?

     

  • bea11005bea11005 Member Posts: 20 Maven

    ooooo...... that's a good idea....I will try with the ID's generation but it seems it will work...

    Now I wanna know how to split data depending on value of discussion attribute....

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi!

     

    Loop Values is the operator you need. Inside the loop you can access the current value with the %{loop_value} macro by default. See the attached example:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.003" expanded="true" height="68" name="Retrieve Iris" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Iris"/>
    </operator>
    <operator activated="true" class="concurrency:loop_values" compatibility="7.6.003" expanded="true" height="82" name="Loop Values" width="90" x="179" y="34">
    <parameter key="attribute" value="label"/>
    <parameter key="enable_parallel_execution" value="false"/>
    <process expanded="true">
    <operator activated="true" class="filter_examples" compatibility="7.6.003" expanded="true" height="103" name="Filter Examples" width="90" x="112" y="34">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="label.equals.%{loop_value}"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Iris" from_port="output" to_op="Loop Values" to_port="input 1"/>
    <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Make sure that "Enable parallel execution" is switched off.

    Also, the loop attribute needs to be nominal. You can either create a copy of your original attribute and convert that to nominal (with Numerical to Polynominal or Format Numbers) or just convert the original if you don't need it in the numeric format later.

     

    Regards,

    Balázs

Sign In or Register to comment.