Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Generate new attributes depended on ID
olgakulesza2
Member Posts: 15 Learner III
Hello everyone,
Example of my data are presented below. This is set of many books. First column is ID of a book. Now, there are 100 tags_name to each book. What I would like to obtain is table:
book_id | rating | aurhor | titile | userid | tag_name1 |tag_name2|....|tag_name100|
So have the row which contains all tag_names for one book.
Could you please help me?
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi again @olgakulesza2,
1. First the new release of the process to rename the name of your columns "tag" :
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.2.000" expanded="true" height="68" name="Read Excel" width="90" x="179" y="34">
<parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Tag_Name\Tag_name.xlsx"/>
<parameter key="imported_cell_range" value="A1:D11"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<list key="data_set_meta_data_information">
<parameter key="0" value="Id.true.integer.attribute"/>
<parameter key="1" value="Author.true.polynominal.attribute"/>
<parameter key="2" value="Title.true.polynominal.attribute"/>
<parameter key="3" value="Tag name.true.polynominal.attribute"/>
</list>
</operator>
<operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="313" y="34">
<list key="aggregation_attributes">
<parameter key="Tag name" value="concatenation"/>
</list>
<parameter key="group_by_attributes" value="Author|Title|Id"/>
</operator>
<operator activated="true" class="split" compatibility="8.2.000" expanded="true" height="82" name="Split" width="90" x="447" y="34">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="concat(Tag name)"/>
<parameter key="split_pattern" value="[|]"/>
</operator>
<operator activated="true" class="concurrency:loop" compatibility="8.2.000" expanded="true" height="82" name="Loop" width="90" x="581" y="34">
<parameter key="number_of_iterations" value="10"/>
<parameter key="reuse_results" value="true"/>
<process expanded="true">
<operator activated="true" class="rename_by_generic_names" compatibility="8.2.000" expanded="true" height="82" name="Rename by Generic Names" width="90" x="313" y="85">
<parameter key="attribute_filter_type" value="regular_expression"/>
<parameter key="regular_expression" value="concat.*"/>
<parameter key="generic_name_stem" value="tag"/>
</operator>
<connect from_port="input 1" to_op="Rename by Generic Names" to_port="example set input"/>
<connect from_op="Rename by Generic Names" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Loop" to_port="input 1"/>
<connect from_op="Loop" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>2. "Thanks @lionelderkrikor, but in that case I have splitted letters in each column"
I am surprised because I have no problem on my side :
Can you post a screenshot of what you get ?
Regards,
Lionel
1
Answers
Hi @olgakulesza2,
Does this process answer to your need (to adapt to your own dataset) :
Regards,
Lionel
Pivot operator will do this easily for you, group by book id and index by tag.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks @Telcontar120, but then I will have just names of tags as a columns names and some numbers as a values. I want tag_names to be values, column name may be for example tag1.
Thanks @lionelderkrikor, but in that case I have splitted letters in each column.
But then you can add a "Loop Attributes" and just replace the attribute value with the macro for the attribute name for all your tags, I think :-)
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
@Telcontar120 I think I don't get it Could you please tell me about it with details? I'm completely new at Rapid Miner and I don't know things you are talking about
Now it works great, thank you @lionelderkrikor!