Options

Compare Examples within a ExampleSet

Leo_179Leo_179 Member Posts: 5 Contributor I
edited November 2018 in Help

Hello together,

 

The ExampleSet looks like:

 

Row No.      Att1       Att2      Att3

1                   A            B          C

2                   A            B          C

3                   A            B          C

4                   D           E           F

5                   D           E           F

6                   A           B           C

7                   D          E           F

7                   D          E           F            

 

So, what I now want to do is to compare each example with the one in the first row and check if they are similar to each other. If true the Result attribute has to show the same output (here "1" for row 1,2 and 3). This should be continued until the similarity is not true for the first time (here after row 3). After that the process has to be start again but this time the "first row" needs to be the one which was not similar on the previous comparision process (so in this case row 4). The following examples have to be compared with the new "first row" (e.g. row 5 with row 4, row 6 with row4 ... until the next false occures). This time the Result attribute should show the output "2".

And so on, and so on....

It is importend not to change the order of the examples because i need to know how often there is a difference within the ExampleSet.

 

This is how it should look like in the end:

 

Row No.      Att1       Att2      Att3      Result

1                   A            B          C           1

2                   A            B          C           1

3                   A            B          C           1

4                   D           E           F           2

5                   D           E           F           2

6                   A           B           C           3

7                   D          E           F            4

8                   D          E           F            4

 

I was trying to solve the problem with the LoopExample and Generate Attribute operator but it didn't really work.

So does anybody has an idea? I have no clue :) 

 

Many thanks and best regards,

Leo

 

Best Answer

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    Dear all,

     

    @Thomas_Ott : Nice suggestion ! 

    Lag Series was in deed the "key operator" to perform this last task. Thanks for your help.

     

    @Leo_179, Here the new process to apply on your whole dataset to see if it gives relevant results : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="187">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Compare_Examplesets\Compare_Examplesets.xlsx"/>
    <parameter key="imported_cell_range" value="A1:C9"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Att1.true.polynominal.attribute"/>
    <parameter key="1" value="Att2.true.polynominal.attribute"/>
    <parameter key="2" value="Att3.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="124" name="Multiply (3)" width="90" x="179" y="187"/>
    <operator activated="true" class="loop_examples" compatibility="8.2.000" expanded="true" height="103" name="Loop Examples" width="90" x="313" y="136">
    <process expanded="true">
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="124" name="Multiply" width="90" x="45" y="34"/>
    <operator activated="true" class="filter_example_range" compatibility="8.2.000" expanded="true" height="82" name="Filter Example Range (3)" width="90" x="45" y="238">
    <parameter key="first_example" value="1"/>
    <parameter key="last_example" value="1"/>
    </operator>
    <operator activated="true" class="filter_example_range" compatibility="8.2.000" expanded="true" height="82" name="Filter Example Range" width="90" x="179" y="34">
    <parameter key="first_example" value="%{example}"/>
    <parameter key="last_example" value="%{example}"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.2.000" expanded="true" height="103" name="Append" width="90" x="179" y="136"/>
    <operator activated="true" class="filter_example_range" compatibility="8.2.000" expanded="true" height="82" name="Filter Example Range (2)" width="90" x="380" y="136">
    <parameter key="first_example" value="%{example}"/>
    <parameter key="last_example" value="%{example}"/>
    </operator>
    <operator activated="true" class="cross_distances" compatibility="8.2.000" expanded="true" height="103" name="Cross Distances" width="90" x="514" y="85"/>
    <connect from_port="example set" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Append" to_port="example set 2"/>
    <connect from_op="Multiply" from_port="output 3" to_op="Filter Example Range (3)" to_port="example set input"/>
    <connect from_op="Filter Example Range (3)" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Cross Distances" to_port="request set"/>
    <connect from_op="Append" from_port="merged set" to_op="Filter Example Range (2)" to_port="example set input"/>
    <connect from_op="Filter Example Range (2)" from_port="example set output" to_op="Cross Distances" to_port="reference set"/>
    <connect from_op="Cross Distances" from_port="result set" to_port="output 1"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="sink_example set" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="8.2.000" expanded="true" height="82" name="Append (2)" width="90" x="447" y="187"/>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID" width="90" x="581" y="187"/>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="313" y="289"/>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="715" y="238">
    <parameter key="remove_double_attributes" value="false"/>
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="849" y="187">
    <list key="function_descriptions">
    <parameter key="Result" value="if(round([distance],3)==1.732,1,0)"/>
    </list>
    </operator>
    <operator activated="true" class="loop_examples" compatibility="8.2.000" expanded="true" height="103" name="Loop Examples (2)" width="90" x="983" y="187">
    <process expanded="true">
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Result"/>
    </operator>
    <operator activated="true" class="series:lag_series" compatibility="7.4.000" expanded="true" height="82" name="Lag Series" width="90" x="380" y="85">
    <list key="attributes">
    <parameter key="Result" value="%{example}"/>
    </list>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (2)" width="90" x="514" y="85">
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="648" y="85">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="Result-%{example}"/>
    <parameter key="attributes" value="id"/>
    </operator>
    <operator activated="true" class="transpose" compatibility="8.2.000" expanded="true" height="82" name="Transpose" width="90" x="782" y="85"/>
    <connect from_port="example set" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Lag Series" to_port="example set input"/>
    <connect from_op="Lag Series" from_port="example set output" to_op="Join (2)" to_port="right"/>
    <connect from_op="Lag Series" from_port="original" to_op="Join (2)" to_port="left"/>
    <connect from_op="Join (2)" from_port="join" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="sink_example set" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID (3)" width="90" x="1117" y="136"/>
    <operator activated="true" class="append" compatibility="8.2.000" expanded="true" height="82" name="Append (3)" width="90" x="1117" y="238"/>
    <operator activated="true" class="transpose" compatibility="8.2.000" expanded="true" height="82" name="Transpose (2)" width="90" x="1251" y="238"/>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="8.2.000" expanded="true" height="82" name="Loop Attributes" width="90" x="1385" y="238">
    <parameter key="attribute_filter_type" value="value_type"/>
    <parameter key="value_type" value="numeric"/>
    <parameter key="except_value_type" value="attribute_value"/>
    <process expanded="true">
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="380" y="34">
    <list key="aggregation_attributes">
    <parameter key="%{loop_attribute}" value="sum"/>
    </list>
    </operator>
    <operator activated="true" class="transpose" compatibility="8.2.000" expanded="true" height="82" name="Transpose (3)" width="90" x="581" y="34"/>
    <connect from_port="input 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Transpose (3)" to_port="example set input"/>
    <connect from_op="Transpose (3)" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="8.2.000" expanded="true" height="82" name="Append (4)" width="90" x="1519" y="238"/>
    <operator activated="true" class="sort" compatibility="8.2.000" expanded="true" height="82" name="Sort" width="90" x="1653" y="238">
    <parameter key="attribute_name" value="id"/>
    <parameter key="sorting_direction" value="decreasing"/>
    </operator>
    <operator activated="false" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="68" name="Execute Python" width="90" x="45" y="34">
    <parameter key="script" value="import pandas as pd&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;&#10; data.Result_2 = data.Result&#10; lenghtData = data.shape[0]&#10;&#10; for j in range(1,lenghtData):&#10;&#10; data.Result_2[j] = data.Result[j] + data.Result_2[j-1]&#10;&#10;&#10; # connect 2 output ports to see the results&#10; return data"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID (5)" width="90" x="1787" y="238"/>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join (3)" width="90" x="1921" y="187">
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="2122" y="187">
    <list key="function_descriptions">
    <parameter key="Final_Result" value="Result+ att_1"/>
    </list>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Multiply (3)" to_port="input"/>
    <connect from_op="Multiply (3)" from_port="output 1" to_op="Loop Examples" to_port="example set"/>
    <connect from_op="Multiply (3)" from_port="output 2" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Loop Examples" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
    <connect from_op="Append (2)" from_port="merged set" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Loop Examples (2)" to_port="example set"/>
    <connect from_op="Loop Examples (2)" from_port="example set" to_op="Generate ID (3)" to_port="example set input"/>
    <connect from_op="Loop Examples (2)" from_port="output 1" to_op="Append (3)" to_port="example set 1"/>
    <connect from_op="Generate ID (3)" from_port="example set output" to_op="Join (3)" to_port="left"/>
    <connect from_op="Append (3)" from_port="merged set" to_op="Transpose (2)" to_port="example set input"/>
    <connect from_op="Transpose (2)" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_op="Append (4)" to_port="example set 1"/>
    <connect from_op="Append (4)" from_port="merged set" to_op="Sort" to_port="example set input"/>
    <connect from_op="Sort" from_port="example set output" to_op="Generate ID (5)" to_port="example set input"/>
    <connect from_op="Generate ID (5)" from_port="example set output" to_op="Join (3)" to_port="right"/>
    <connect from_op="Join (3)" from_port="join" to_op="Generate Attributes (2)" to_port="example set input"/>
    <connect from_op="Generate Attributes (2)" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

    I hope it helps,

     

    Regards,

     

    Lionel

     

     

Answers

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @Leo_179 Try using the Aggreate operator for this. Aggregate by all your attributes and then under the Grouping, use the sum method. 

  • Options
    Leo_179Leo_179 Member Posts: 5 Contributor I

    Hi Thomas,

     

    thanks for your fast answer!

    Due to I'm new on working with rapidminer, could you please explain your solution a little bit more?! I'm not quite sure how to use the aggregate operator in this case. And what operators do I also need to solve the problem? 

     

    Best regards,

    Leo

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @Leo_179,

     

    I was not able to create a process with 100% RapidMiner's operators, so, in this case, it is with great disappointment, that I used a Python script (I will explain further...) for the last part of the process.

    To run this process, you must install the Python environment on your computer and install the Execute Python operator (from the MarketPlace)

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="136">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Compare_Examplesets\Compare_Examplesets.xlsx"/>
    <parameter key="imported_cell_range" value="A1:C9"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Att1.true.polynominal.attribute"/>
    <parameter key="1" value="Att2.true.polynominal.attribute"/>
    <parameter key="2" value="Att3.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="103" name="Multiply (3)" width="90" x="179" y="85"/>
    <operator activated="true" class="loop_examples" compatibility="8.2.000" expanded="true" height="124" name="Loop Examples" width="90" x="313" y="34">
    <process expanded="true">
    <operator activated="true" class="multiply" compatibility="8.2.000" expanded="true" height="103" name="Multiply" width="90" x="45" y="34"/>
    <operator activated="true" class="filter_example_range" compatibility="8.2.000" expanded="true" height="82" name="Filter Example Range" width="90" x="179" y="34">
    <parameter key="first_example" value="%{example}"/>
    <parameter key="last_example" value="%{example}"/>
    </operator>
    <operator activated="true" class="operator_toolbox:create_exampleset" compatibility="1.0.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="187">
    <parameter key="generator_type" value="comma_separated_text"/>
    <parameter key="number_of_examples" value="1"/>
    <list key="function_descriptions"/>
    <list key="numeric_series_configuration"/>
    <list key="date_series_configuration"/>
    <list key="date_series_configuration (interval)"/>
    <parameter key="input_csv_text" value="Att1,Att2,Att3&#10;A,B,C"/>
    </operator>
    <operator activated="true" class="append" compatibility="8.2.000" expanded="true" height="103" name="Append" width="90" x="179" y="136"/>
    <operator activated="true" class="filter_example_range" compatibility="8.2.000" expanded="true" height="82" name="Filter Example Range (2)" width="90" x="380" y="136">
    <parameter key="first_example" value="%{example}"/>
    <parameter key="last_example" value="%{example}"/>
    </operator>
    <operator activated="true" class="cross_distances" compatibility="8.2.000" expanded="true" height="103" name="Cross Distances" width="90" x="514" y="85"/>
    <connect from_port="example set" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Filter Example Range" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Append" to_port="example set 2"/>
    <connect from_op="Filter Example Range" from_port="example set output" to_op="Cross Distances" to_port="request set"/>
    <connect from_op="Create ExampleSet" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Filter Example Range (2)" to_port="example set input"/>
    <connect from_op="Filter Example Range (2)" from_port="example set output" to_op="Cross Distances" to_port="reference set"/>
    <connect from_op="Cross Distances" from_port="result set" to_port="output 1"/>
    <portSpacing port="source_example set" spacing="0"/>
    <portSpacing port="sink_example set" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    <portSpacing port="sink_output 3" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="8.2.000" expanded="true" height="82" name="Append (2)" width="90" x="447" y="85"/>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID" width="90" x="581" y="85"/>
    <operator activated="true" class="generate_id" compatibility="8.2.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="313" y="187"/>
    <operator activated="true" class="concurrency:join" compatibility="8.2.000" expanded="true" height="82" name="Join" width="90" x="715" y="136">
    <parameter key="remove_double_attributes" value="false"/>
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="8.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="849" y="136">
    <list key="function_descriptions">
    <parameter key="Result" value="if(round([distance],3)==1.732,1,0)"/>
    </list>
    </operator>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="983" y="136">
    <parameter key="script" value="import pandas as pd&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;&#10; data.Result_2 = data.Result&#10; lenghtData = data.shape[0]&#10;&#10; for j in range(1,lenghtData):&#10;&#10; data.Result_2[j] = data.Result[j] + data.Result_2[j-1]&#10;&#10; &#10;&#10; # connect 2 output ports to see the results&#10; return data"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Multiply (3)" to_port="input"/>
    <connect from_op="Multiply (3)" from_port="output 1" to_op="Loop Examples" to_port="example set"/>
    <connect from_op="Multiply (3)" from_port="output 2" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Loop Examples" from_port="output 1" to_op="Append (2)" to_port="example set 1"/>
    <connect from_op="Append (2)" from_port="merged set" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    In deed, with RapidMiner, I'm able to compute the distance between the examples (distance between example[i] and example[i-1]) and to obtain this : 

    Compare_Examplesets.png

    However, I'm not able to perform with RapidMiner the very simple last operation, which consist to : 

     - create an attribute 'Total' initialized to 0 

     - Iterate to sum : Total[i] =  Total[i-1] + Result[i].

    and to finally obtain this : 

    Compare_Examplesets_2.png

    So if someone has an idea to perform this last operation with RapidMiner, I am very curious to know it.

    (and more generally to solve this problem using only RapidMiner/without script).

     

    However, I hope it helps,

     

    Regards,

     

    Lionel

  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @lionelderkrikor will the Lag operator from the Series extension help?

  • Options
    Leo_179Leo_179 Member Posts: 5 Contributor I

    Dear all,

     

    thank you very much for your help! I'm now using the "Lag series" operator and it works quite well...

     

    Best regards,

    Leo

Sign In or Register to comment.