Pivoting a table using Unique/Distinct Values

Chiko · November 2016

I have got data that looks like this table(I have simplified the table, but the idea is to get the number of unique product ids form unique columns and insert discount id where customer id meets product id, otherwise it will be a missing value(?)):

CustomerID	Product	Discount ID
Customer1	Product1	D1
Customer1	Product2	D3
Customer2	Product1	D2
Customer3	Product3	D3

I would like to turn this into this form:

CustomerID	Product1	Product2	Product3
Customer1	D1	?	?
Customer1	?	D3	?
Customer2	D2	?	?
Customer3	?	?	D3

I have played around with the PIVOT operator but I am struggling to achieve this.

Thanks very much in advance!

Thomas_Ott · November 2016

Something like this?

<?xml version="1.0" encoding="UTF-8"?><process version="7.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.3.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.3.000" expanded="true" height="68" name="Retrieve Pivot Example" width="90" x="112" y="34">
        <parameter key="repository_entry" value="//Personal/data/Pivot Example"/>
      </operator>
      <operator activated="true" class="pivot" compatibility="7.3.000" expanded="true" height="82" name="Pivot" width="90" x="246" y="34">
        <parameter key="group_attribute" value="CustomerID "/>
        <parameter key="index_attribute" value="Product  "/>
        <parameter key="consider_weights" value="false"/>
      </operator>
      <connect from_op="Retrieve Pivot Example" from_port="output" to_op="Pivot" to_port="example set input"/>
      <connect from_op="Pivot" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Chiko · November 2016

Thanks T-Bone, not exactly in that format(see 2 rows for Customer1 in original post), thats how far I got using PIVOT operator, however in my case I'd need Customer1 to have 2 rows/examples; one for Product1 with with a value D1 while the rest are missing values and the second example with a value of D3 on Product2 with the rest as missing values. It is this bit of not being able to GROUP BY CustomerID which is making the whole thing complex.

I appreciate your efforts, thanks.

Telcontar120 · November 2016

Well @Chiko, this was an interesting problem. For starters, as already shown, it is quite easy to turn your original dataset into the following table structure, using just the pivot operator and then doing some renaming:

CustomerID Product1 Product2 Product3
Customer1 D1 D3 ?
Customer2 D2 ? ?
Customer3 ? ? D3

This has one row per customer id, and separate attributes for each product discount combination. But to get to the next version, to split it into separate rows for each customer id & product id combination, required several additional steps. The attached process file should do it for you (the initial read csv operator simply starts with your original data provided in the first format), yielding a final table with the structure you want.

However, this was a non-trivial exercise, and if you had a lot of different products, you would need to do more work to build it out. If I were you, I would think about whether the structure you are looking for is really necessary compared to the 2nd version above, which is quite simple to produce.

I hope this helps!

Chiko · November 2016

Thanks very much Telcontar120. Yes this is a hard one. My product run into thousands. I think I will prep the data outside RapidMiner using dynamic SQL.

Thank you very much!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Pivoting a table using Unique/Distinct Values

Answers