Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Generate new attributes depended on ID

olgakulesza2olgakulesza2 Member Posts: 15 Learner III
edited November 2018 in Help

Hello everyone, 

Example of my data are presented below. This is set of many books. First column is ID of a book. Now, there are 100 tags_name to each book. What I would like to obtain is table: 

book_id | rating | aurhor | titile | userid | tag_name1 |tag_name2|....|tag_name100|

So have the row which contains all tag_names for one book. 

Could you please help me?

 

tags_books_ratings_user10.PNG

Best Answer

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    Hi again @olgakulesza2,

     

    1. First the new release of the process to rename the name of your columns "tag" : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
    <parameter key="imported_cell_range" value="A1:D11"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Id.true.integer.attribute"/>
    <parameter key="1" value="Author.true.polynominal.attribute"/>
    <parameter key="2" value="Title.true.polynominal.attribute"/>
    <parameter key="3" value="Tag name.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
    <list key="aggregation_attributes">
    <parameter key="Tag name" value="concatenation"/>
    </list>
    <parameter key="group_by_attributes" value="Author|Title|Id"/>
    </operator>
    <operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="concat(Tag name)"/>
    <parameter key="split_pattern" value="[|]"/>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
    <parameter key="number_of_iterations" value="10"/>
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="rename_by_generic_names" compatibility="8.2.000" expanded="true" height="82" name="Rename by Generic Names" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="concat.*"/>
    <parameter key="generic_name_stem" value="tag"/>
    </operator>
    <connect from_port="input 1" to_op="Rename by Generic Names" to_port="example set input"/>
    <connect from_op="Rename by Generic Names" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    2. "Thanks @lionelderkrikor, but in that case I have splitted letters in each column"

     

    I am surprised because I have no problem on my side : 

    Tag_Name.png

    Can you post a screenshot of what you get ?

     

    Regards,

     

    Lionel

Answers

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @olgakulesza2,

     

    Does this process answer to your need (to adapt to your own dataset) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
    <parameter key="imported_cell_range" value="A1:D11"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Id.true.integer.attribute"/>
    <parameter key="1" value="Author.true.polynominal.attribute"/>
    <parameter key="2" value="Title.true.polynominal.attribute"/>
    <parameter key="3" value="Tag name.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
    <list key="aggregation_attributes">
    <parameter key="Tag name" value="concatenation"/>
    </list>
    <parameter key="group_by_attributes" value="Author|Title|Id"/>
    </operator>
    <operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="concat(Tag name)"/>
    <parameter key="split_pattern" value="[|]"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Pivot operator will do this easily for you, group by book id and index by tag.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • olgakulesza2olgakulesza2 Member Posts: 15 Learner III

    Thanks @Telcontar120, but then I will have just names of tags as a columns names and some numbers as a values. I want tag_names to be values, column name may be for example tag1.

  • olgakulesza2olgakulesza2 Member Posts: 15 Learner III

    Thanks @lionelderkrikor, but in that case I have splitted letters in each column. :( 

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    But then you can add a "Loop Attributes" and just replace the attribute value with the macro for the attribute name for all your tags, I think :-)

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • olgakulesza2olgakulesza2 Member Posts: 15 Learner III

    @Telcontar120 I think I don't get it :( Could you please tell me about it with details? I'm completely new at Rapid Miner and I don't know things you are talking about :( 

  • olgakulesza2olgakulesza2 Member Posts: 15 Learner III

    Now it works great, thank you @lionelderkrikor

     

Sign In or Register to comment.