Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

compare csv files by ID column

giorogogiorogo Member Posts: 13 Learner III
edited November 2018 in Help

Hi, I'm new with this excellent program; I need help to perform the following task: I have two csv files two common columns (ID and emotion) I would like to create a task to compare these two files and get as a result two documents in which are shown in one all the ids with the same emotion and in another the ids with different emotions; for example id 001 file A felicity emotion, B emotion sadness file will be placed in the file with different emotions. Could you tell me step by step how should I do? Thank you

 

 

Best Answer

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    Hi again @giorogo,

     

    You can find here the complete process of what you want to do, based on the @mschmitz ' s idea :

    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Compare_csv_files\crowdflower_emozionidominanti.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="text.true.polynominal.attribute"/>
    <parameter key="1" value="annotation.true.polynominal.attribute"/>
    <parameter key="2" value="id.true.integer.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
    <parameter key="attribute_name" value="id"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV (2)" width="90" x="45" y="187">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Compare_csv_files\video_progettoTSI.csv"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <parameter key="encoding" value="windows-1252"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="id.true.integer.attribute"/>
    <parameter key="1" value="video.true.polynominal.attribute"/>
    <parameter key="2" value="text.true.polynominal.attribute"/>
    <parameter key="3" value="faccia.true.polynominal.attribute"/>
    <parameter key="4" value="emotion.true.polynominal.attribute"/>
    <parameter key="5" value="emozioneSecondaria.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role (2)" width="90" x="246" y="187">
    <parameter key="attribute_name" value="id"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="447" y="85">
    <list key="key_attributes"/>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="85">
    <parameter key="parameter_expression" value="[annotation]==[emotion]"/>
    <parameter key="condition_class" value="expression"/>
    <list key="filters_list">
    <parameter key="filters_entry_key" value="annotation.equals.[emotion]"/>
    </list>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Read CSV (2)" from_port="output" to_op="Set Role (2)" to_port="example set input"/>
    <connect from_op="Set Role (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Join" from_port="join" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
    <connect from_op="Filter Examples" from_port="unmatched example set" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope it helps,

     

    Regards,

     

    Lionel

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,529 RM Data Scientist

    Hi,

     

    I think what you want to do is a join operator first where the key attribute is "id" in both sides.

    The result is a table like this:

    id    annotation    emotion     [.... other attributes]

    Afterwards, you use a Filter Example operator to split the table into to parts. the ones were annotation=emotion and the other.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • giorogogiorogo Member Posts: 13 Learner III

    First of all thanks for the reply; unfortunately now the problem is in Filter example; is the configuration in the images correct?

    1.png 14.2K
    2.png 33.2K
    uno.png 56.9K
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @giorogo,

     

    You have to put a  two Set Role operators after your 2 Read CSV operators

    ans set your id attribute as id in the parameter panel.

    Here a screenshot of the process : 

    Compare_csv_files.pngttgt

    Regards,

     

    Lionel

     

  • giorogogiorogo Member Posts: 13 Learner III

    I've done but I have this error (see images).

     

  • giorogogiorogo Member Posts: 13 Learner III

    Thank you very much for your help !!! Problem solved! You are very kind

Sign In or Register to comment.