"Series by examples, differentiate, break by ID?"

mafern76mafern76 Member Posts: 45 Contributor II
edited June 2019 in Help
Hi! I have the following data...

id time att0
1 1 5
2 1 8
2 2 9
3 2 5
3 4 4
3 5 2
4 5 6
4 6 5
4 8 8
4 10 5

Different id's have different amount of recorded instances, some have one, some two... some maybe 10...

How can I get from that data, to this:

id time att0 diff_att0
1 1 5 0
2 1 8 0
2 2 9 1
3 2 5 0
3 4 4 -1
3 5 2 -2
4 5 6 0
4 6 5 -1
4 8 8 3
4 10 5 -3

Using the DIFFERENTIATE node in the Series Extension is possible to do so but disregarding the ID. Is there a way to break the DIFFERENTIATE by ID? Or another way to achieve this same result?

Thanks a lot!!

Best regards.

PD: the final idea is to then also aggregate the diff_att0 to get further information on how att0 moved through time.








Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi

    I think you need to pivot the table first. Then you get something like

    id att0_time1 att0_time2....
    1  5 ?
    2 1 2
    and you can easily work on this.


    Cheers,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • mafern76mafern76 Member Posts: 45 Contributor II
    Hi Martin, thank you very much for your answer, I thought about working horizontally as well, it provides the possibility for more detail regarding series progression...

    I haven't discovered yet though how can I automatically process any amount of attributes using macros.

    For example, if I generate (att0_time1 - att0_time2)... is there a way to %{macro}_time1 - %{macro}_time2 to generate for all attributes?

    Thanks a lot.

    PD: more about my data, it consists of various performance measures taken at different times, so some cases have 1 measurements, some 2, 3, and so on, actually, not on regular intervals.

    So the idea would be to be able to get a sense of performance progression and not just a general avg/min/max/sd aggregation...
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Hi Mafern.

    You might have a look at Gernate Functionset. If you have a table like

    att_time1 att_time2 ...

    it generates for example the difference/sum/product between all of them. Might be what you want.

    If you just want to have att_timeX- att_timeX+1 you might need to work with a loop (either usual, values or attributes).

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • mafern76mafern76 Member Posts: 45 Contributor II
    Thank you Martin!

    It's not Generate Function Set I need, just precise calculations like you said, with a loop.

    I'm using generate %{loop_attribute} - %{loop_attribute}_1 for example, this works, but my time intervals are arbitrary, so I need to generate a time instance index. Any idea on how to get it? I couldn't work it out on my own, thanks a lot.

    id time time_instance
    1 4 1
    1 8 2
    1 20 3
    2 3 1
    2 4 2
    2 80 3
    2 120 4




  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Hi...have you tried looping by value (ID) and then setting the data with the filtered set? I do this all the time.  It's less efficient than pivoting but sometimes cleaner.

    Scott
  • mafern76mafern76 Member Posts: 45 Contributor II
    Thank you sgenzer I made it work this way:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="loop_values" compatibility="5.3.015" expanded="true" height="76" name="Loop Values (2)" width="90" x="447" y="165">
        <parameter key="attribute" value="person_id"/>
        <parameter key="iteration_macro" value="loop_value"/>
        <parameter key="parallelize_iteration" value="false"/>
        <process expanded="true">
          <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples" width="90" x="112" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="person_id=%{loop_value}"/>
            <parameter key="invert_filter" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="5.3.015" expanded="true" height="76" name="Generate ID" width="90" x="313" y="30">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_port="out 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Then simply appending the collection result.

    So now I was able to horizontally get my data, but I'm having a bug using loop attribute to get differences and ratios through timestamps.

    I posted the issue at problems and support:

    http://rapid-i.com/rapidforum/index.php/topic,8677.0.html
  • mafern76mafern76 Member Posts: 45 Contributor II
    sgenzer wrote:

    Hi...have you tried looping by value (ID) and then setting the data with the filtered set? I do this all the time.  It's less efficient than pivoting but sometimes cleaner.

    Scott
    Scott, I think you actually meant to apply differentiate inside the loop I posted before.

    That makes it work for me, I don't need to work horizontally anymore I guess.

    Thanks a bunch!
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    yes sorry that sounds right.  Glad you made it work!

    Scott
Sign In or Register to comment.