Pivoting a table using Unique/Distinct Values

ChikoChiko Member Posts: 26 Maven
edited November 2018 in Help

I have got data that looks like this table(I have simplified the table, but the idea is to get the number of unique product ids form unique columns and insert discount id where customer id meets product id, otherwise it will be a missing value(?)):

 

CustomerID  Product   Discount ID
Customer1  Product1   D1
Customer1  Product2   D3
Customer2  Product1   D2
Customer3  Product3   D3

 

I would like to turn this into this form:

 

CustomerID  Product1  Product2  Product3 
Customer1  D1  
Customer1 D3 ?
Customer2 D2  ? ?
Customer3 ? D3

 

I have played around with the PIVOT operator but I am struggling to achieve this.

 

Thanks very much in advance!

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Something like this?

     

    2016-11-29_11-41-29.png

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Pivot Example" width="90" x="112" y="34">
    <parameter key="repository_entry" value="//Personal/data/Pivot Example"/>
    </operator>
    <operator activated="true" class="pivot" compatibility="7.3.000" expanded="true" height="82" name="Pivot" width="90" x="246" y="34">
    <parameter key="group_attribute" value="CustomerID "/>
    <parameter key="index_attribute" value="Product  "/>
    <parameter key="consider_weights" value="false"/>
    </operator>
    <connect from_op="Retrieve Pivot Example" from_port="output" to_op="Pivot" to_port="example set input"/>
    <connect from_op="Pivot" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • ChikoChiko Member Posts: 26 Maven

    Thanks T-Bone, not exactly in that format(see 2 rows for Customer1 in original post), thats how far I got using PIVOT operator, however in my case I'd need Customer1 to have 2 rows/examples; one for Product1 with with a value D1 while the rest are missing values and the second example with a value of D3 on Product2 with the rest as missing values. It is this bit of not being able to GROUP BY CustomerID which is making the whole thing complex.

     

    I appreciate your efforts, thanks.

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Well @Chiko, this was an interesting problem.  For starters, as already shown, it is quite easy to turn your original dataset into the following table structure, using just the pivot operator and then doing some renaming:

    CustomerID Product1 Product2 Product3
    Customer1   D1            D3             ?
    Customer2   D2            ?                ?
    Customer3   ?               ?                D3

     

    This has one row per customer id, and separate attributes for each product discount combination.  But to get to the next version, to split it into separate rows for each customer id & product id combination, required several additional steps.  The attached process file should do it for you (the initial read csv operator simply starts with your original data provided in the first format), yielding a final table with the structure you want.

     

    However, this was a non-trivial exercise, and if you had  a lot of different products, you would need to do more work to build it out.  If I were you, I would think about whether the structure you are looking for is really necessary compared to the 2nd version above, which is quite simple to produce.

     

    I hope this helps!

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • ChikoChiko Member Posts: 26 Maven

    Thanks very much Telcontar120. Yes this is a hard one. My product run into thousands. I think I will prep the data outside RapidMiner using dynamic SQL.

     

    Thank you very much!

Sign In or Register to comment.