De-pivot?

earmijoearmijo Member Posts: 270 Unicorn
edited November 2018 in Help

Imagine that I have an example set (after using clustering) that looks like this:

 

id,cluster,product1,product2
1,clust0,10.5,20.3
2,clust1,15.75,30.12
3,clust0,5.25,27.2
4,clust1,12.75,8.25
5,clust0,18.05,22.3
6,clust1,1.25,3.75

And I need it in the following format:

id,cluster,value,type
1,clust0,10.5,product1
2,clust1,15.75,product1
3,clust0,5.25,product1
4,clust1,12.75,product1
5,clust0,18.05,product1
6,clust1,1.25,product1
1,clust0,20.3,product2
2,clust1,30.12,product2
3,clust0,27.2,product2
4,clust1,8.25,product2
5,clust0,22.3,product2
6,clust1,3.75,product2

How could I do it? If I didn't have id and cluster I could easily use de-pivot. But I don't know how to do it in the presence of the other two attributes.  

 

 

 

Best Answer

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    Solution Accepted

    Hi @earmijo,

    You do not have to get rid of id or cluster columns to have your table de-pivoted,

     

    check out this process:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="id,cluster,product1,product2&#10;1,clust0,10.5,20.3&#10;2,clust1,15.75,30.12&#10;3,clust0,5.25,27.2&#10;4,clust1,12.75,8.25&#10;5,clust0,18.05,22.3&#10;6,clust1,1.25,3.75"/>
    </operator>
    <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
    <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
    <parameter key="column_separators" value=","/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="514" y="136">
    <list key="attribute_name">
    <parameter key="value" value="product.*"/>
    </list>
    <parameter key="index_attribute" value="type"/>
    <parameter key="create_nominal_index" value="true"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
    <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="De-Pivot" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Happy RapidMining!

    YY

Answers

  • AndrewAndrew RapidMiner Certified Expert, RapidMiner Certified Master, Member Posts: 47 Guru

    For fun, I made a process that does what you want but doesn't use de-pivot.  It uses the Loop Attributes operator to iterate over each regular attribute to be selected which allows an example set to be created containing the single regular attribute and all other special ones. Some renaming and generation is then done so that the output from the loop operation is a collection that can be appended together to yield the answer.

     

    Andrew

  • earmijoearmijo Member Posts: 270 Unicorn

    Thank you very much Andrew for taking the time to play with my problem. It works perfectly.

  • earmijoearmijo Member Posts: 270 Unicorn

    Thank you very much Andrew. It works perfectly. 

  • earmijoearmijo Member Posts: 270 Unicorn

    Thank you very much yyhuang. It works perfectly in this example. 

    How would I generalize it when the columns instead of being named "product1, product2, product 3..." were named "supermarket,gasoline,hotels, ...."?  In the first case I can use (as you did) "product.*". I tried ".*" in my case and it doesn't work.

     
  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist

    Very good question @earmijo ! Actually you still can use de-pivot for the columns that are not easily selecte by regex. You just need extra step for table join.

    check out this:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.4.001" expanded="true" height="68" name="Create Document" width="90" x="112" y="34">
    <parameter key="text" value="customer_id,cluster,gas,grocery&#10;11,clust0,10.5,20.3&#10;22,clust1,15.75,30.12&#10;33,clust0,5.25,27.2&#10;44,clust1,12.75,8.25&#10;55,clust0,18.05,22.3&#10;66,clust1,1.25,3.75"/>
    </operator>
    <operator activated="true" class="text:write_document" compatibility="7.4.001" expanded="true" height="82" name="Write Document" width="90" x="246" y="34"/>
    <operator activated="true" class="read_csv" compatibility="7.3.001" expanded="true" height="68" name="Read CSV" width="90" x="380" y="136">
    <parameter key="column_separators" value=","/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="7.3.001" expanded="true" height="82" name="Generate ID" width="90" x="514" y="136"/>
    <operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="136">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="cluster|customer_id"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="782" y="187">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="cluster|customer_id|id"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="de_pivot" compatibility="7.3.001" expanded="true" height="82" name="De-Pivot" width="90" x="782" y="34">
    <list key="attribute_name">
    <parameter key="value" value="(?!^id$)(^.*$)"/>
    </list>
    <parameter key="index_attribute" value="type"/>
    <parameter key="create_nominal_index" value="true"/>
    </operator>
    <operator activated="true" class="join" compatibility="7.3.001" expanded="true" height="82" name="Join" width="90" x="983" y="136">
    <parameter key="join_type" value="left"/>
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="id"/>
    </list>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Write Document" to_port="document"/>
    <connect from_op="Write Document" from_port="file" to_op="Read CSV" to_port="file"/>
    <connect from_op="Read CSV" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="original" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="De-Pivot" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
Sign In or Register to comment.