Options

Compare examples within an ExampleSet based on previous values

Leo_179Leo_179 Member Posts: 5 Contributor I
edited November 2018 in Help

Hi everybody,

 

my current ExampleSet looks like:

 

Row No.     att_1     att_2     att_3

1                  A       (empty)     C

2                  A           B        (empty)

3               (empty)    B        (empty)

4                  D           B            C

5               (empty)    B             C

6                  A        (empty)      C

 

So, what I now want to do is, to compare the examples based on the value of the previous rows as long as there is no false. In this case the result_attribute should show "1" until the first false occurs. So far so good...

The "special" thing now is, that if one of the previous values shows (empty), the result of the comparision should still be true. After the first false occurs, the process has to start again and the result_attribute should now show "2". And so on and so on. For example, the solution should look like:

 

Row No.    att_1      att_2      att_3        Result

1                 A        (empty)      C               1     

2                 A            B        (empty)         1

3              (empty)     B        (empty)         1

4                D             B            C              2

5             (empty)      B            C               2

6                A        (empty)       C               3

 

Little explanation: The result of row two shows "1" because the comparison between row one and two is true. The result of row three still shows "1" because the comparison between row one/two and three is true. Row 4 shows now "2" because the comparison between row one/two/three and four is false (att_1 shows D which is not equal to the previous A in row two) and the process has to start again at the point of the first false. 

 

I hope you can follow my explanation and you could give me a little help.

 

Thanks in advance!

 

Best regards,

Leo

 

 

Answers

  • Options
    Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist

    Hi @Leo_179,

     

    From your explanation I don't get what exactly is the reason for the result of the comparison.

    In your use case the examples depend on one another which is the clear id that you are dealing with a Time / Value Series problem (even if you do not have a time dimension available in your data).

     

    I would start with creating a helper Attribute (Generate Attributes) which can be used for comparison.

    Then use the Operator Lag Series from the Value Series extension to create a new Attribute containing the value of the helper Attribute from the previous row.

    Generate a new (empty) Attribute which will later contain the desired result.

    Finally use Loop Examples to compare each helper value and use Set Data to set the desired value in the in newly created Attribute.

    You do know about the use of Macros, right? They are crucial in this setup!

     

    Happy Mining,

    Edin

  • Options
    Leo_179Leo_179 Member Posts: 5 Contributor I

    Hi @Edin_Klapic,

     

    thanks for your fast answer!

     

    The reason why I want to do this, is to decrease the number of examples by combining all examples until a „not-fitting value“ occurs. In the end, according to the example above, the final ExampleSet should look like:

     

    Current Example Set:

     

    Row No.      att_1      att_2      att_3    

    1                    A         (empty)     C            

    2                    A             B        (empty)           

    3                 (empty)      B        (empty)   

    4                   D              B           C

    5                 (empty)      B            C

    6                    A           (empty)     C

     

    Second Step:

     

    Row No.      att_1      att_2      att_3       Result

    1                    A         (empty)     C               1      

    2                    A             B        (empty)        1        

    3                 (empty)      B        (empty)        1 

    4                   D              B           C              2

    5                 (empty)      B            C              2

    6                    A           (empty)     C             3

                       

     

    Final ExampleSet:

     

    Row No.      att_1      att_2      att_3     Result

    1                    A             B          C            1

    2                    D             B          C           2 

    3                    A          (empty)    C           3

     

    Therefore, I thought it would be a good idea to first find out which examples could be combined and afterwards classify them (thats what I need to know!). In the final step I would use the Aggregate operator to combine all examples which show the same value at the result-attribute.

     

    Hopefully, its now a little bit more clear and you could give me some help by solving this "problem".

     

    Best regards,

    Leo

Sign In or Register to comment.