compare csv files by ID column

giorogogiorogo Member Posts: 13 Contributor I
edited November 2018 in Help

Hi, I'm new with this excellent program; I need help to perform the following task: I have two csv files two common columns (ID and emotion) I would like to create a task to compare these two files and get as a result two documents in which are shown in one all the ids with the same emotion and in another the ids with different emotions; for example id 001 file A felicity emotion, B emotion sadness file will be placed in the file with different emotions. Could you tell me step by step how should I do? Thank you

 

 

Best Answer

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    Hi again @giorogo,

     

    You can find here the complete process of what you want to do, based on the @mschmitz ' s idea :

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Compare_csv_files\crowdflower_emozionidominanti.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="text.true.polynominal.attribute"/>
    <parameter key="1" value="annotation.true.polynominal.attribute"/>
    <parameter key="2" value="id.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
    <parameter key="attribute_name" value="id"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="45" y="187">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Compare_csv_files\video_progettoTSI.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="id.true.integer.attribute"/>
    <parameter key="1" value="video.true.polynominal.attribute"/>
    <parameter key="2" value="text.true.polynominal.attribute"/>
    <parameter key="3" value="faccia.true.polynominal.attribute"/>
    <parameter key="4" value="emotion.true.polynominal.attribute"/>
    <parameter key="5" value="emozioneSecondaria.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role (2)" width="90" x="246" y="187">
    <parameter key="attribute_name" value="id"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="447" y="85">
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="85">
    <parameter key="parameter_expression" value="[annotation]==[emotion]"/>
    <parameter key="condition_class" value="expression"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="annotation.equals.[emotion]"/>
    </list>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Read CSV (2)" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
    <connect from_op="Filter Examples" from_port="unmatched example set" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    Hi,

     

    I think what you want to do is a join operator first where the key attribute is "id" in both sides.

    The result is a table like this:

    id    annotation    emotion     [.... other attributes]

    Afterwards, you use a Filter Example operator to split the table into to parts. the ones were annotation=emotion and the other.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • giorogogiorogo Member Posts: 13 Contributor I

    First of all thanks for the reply; unfortunately now the problem is in Filter example; is the configuration in the images correct?

    1.png 14.2K
    2.png 33.2K
    uno.png 56.9K
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @giorogo,

     

    You have to put a  two Set Role operators after your 2 Read CSV operators

    ans set your id attribute as id in the parameter panel.

    Here a screenshot of the process : 

    Compare_csv_files.pngttgt

    Regards,

     

    Lionel

     

  • giorogogiorogo Member Posts: 13 Contributor I

    I've done but I have this error (see images).

     

  • giorogogiorogo Member Posts: 13 Contributor I

    Thank you very much for your help !!! Problem solved! You are very kind

Sign In or Register to comment.