The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Generate new attributes depended on ID

olgakulesza2olgakulesza2 Member Posts: 15 Contributor I
edited November 2018 in Help

Hello everyone, 

Example of my data are presented below. This is set of many books. First column is ID of a book. Now, there are 100 tags_name to each book. What I would like to obtain is table: 

book_id | rating | aurhor | titile | userid | tag_name1 |tag_name2|....|tag_name100|

So have the row which contains all tag_names for one book. 

Could you please help me?

 

tags_books_ratings_user10.PNG

Best Answer

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Solution Accepted

    Hi again @olgakulesza2,

     

    1. First the new release of the process to rename the name of your columns "tag" : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
    <parameter key="imported_cell_range" value="A1:D11"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Id.true.integer.attribute"/>
    <parameter key="1" value="Author.true.polynominal.attribute"/>
    <parameter key="2" value="Title.true.polynominal.attribute"/>
    <parameter key="3" value="Tag name.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
    <list key="aggregation_attributes">
    <parameter key="Tag name" value="concatenation"/>
    </list>
    <parameter key="group_by_attributes" value="Author|Title|Id"/>
    </operator>
    <operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="concat(Tag name)"/>
    <parameter key="split_pattern" value="[|]"/>
    </operator>
    <operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
    <parameter key="number_of_iterations" value="10"/>
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="rename_by_generic_names" compatibility="8.2.000" expanded="true" height="82" name="Rename by Generic Names" width="90" x="313" y="85">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="concat.*"/>
    <parameter key="generic_name_stem" value="tag"/>
    </operator>
    <connect from_port="input 1" to_op="Rename by Generic Names" to_port="example set input"/>
    <connect from_op="Rename by Generic Names" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_op="Loop" to_port="input 1"/>
    <connect from_op="Loop" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    2. "Thanks @lionelderkrikor, but in that case I have splitted letters in each column"

     

    I am surprised because I have no problem on my side : 

    Tag_Name.png

    Can you post a screenshot of what you get ?

     

    Regards,

     

    Lionel

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi @olgakulesza2,

     

    Does this process answer to your need (to adapt to your own dataset) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
    <parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
    <parameter key="imported_cell_range" value="A1:D11"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    </list>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="Id.true.integer.attribute"/>
    <parameter key="1" value="Author.true.polynominal.attribute"/>
    <parameter key="2" value="Title.true.polynominal.attribute"/>
    <parameter key="3" value="Tag name.true.polynominal.attribute"/>
    </list>
    </operator>
    <operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
    <list key="aggregation_attributes">
    <parameter key="Tag name" value="concatenation"/>
    </list>
    <parameter key="group_by_attributes" value="Author|Title|Id"/>
    </operator>
    <operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="concat(Tag name)"/>
    <parameter key="split_pattern" value="[|]"/>
    </operator>
    <connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
    <connect from_op="Split" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Pivot operator will do this easily for you, group by book id and index by tag.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    olgakulesza2olgakulesza2 Member Posts: 15 Contributor I

    Thanks @Telcontar120, but then I will have just names of tags as a columns names and some numbers as a values. I want tag_names to be values, column name may be for example tag1.

  • Options
    olgakulesza2olgakulesza2 Member Posts: 15 Contributor I

    Thanks @lionelderkrikor, but in that case I have splitted letters in each column. :( 

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    But then you can add a "Loop Attributes" and just replace the attribute value with the macro for the attribute name for all your tags, I think :-)

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    olgakulesza2olgakulesza2 Member Posts: 15 Contributor I

    @Telcontar120 I think I don't get it :( Could you please tell me about it with details? I'm completely new at Rapid Miner and I don't know things you are talking about :( 

  • Options
    olgakulesza2olgakulesza2 Member Posts: 15 Contributor I

    Now it works great, thank you @lionelderkrikor

     

Sign In or Register to comment.