Merging double attributes and it's examples together

eldenosoeldenoso Member Posts: 65 Contributor I
edited December 2019 in Help

Hello,

I am pretty new to RapidMiner and thus can't find a solution the problem I have. 
My example set contains customer ID's, booking years and the booked hotel. Because within each year some of the customers are going on holiday twice or more, there are double ID's occuring. Is it possible with RapidMiner to somehow merge the belonging booked hotels of the ID's together in one example? To make it more cleary I give you an example of what I mean: 

Raw Data Year 20XX:
ID     BookedHotel

12     Laplaza

13     Greengarden

12     Ocean

15     Laplaza

Now the customer with the ID 12 is going on holiday twice this year. One time to Laplaza Hotel and the other time to the Ocean Hotel. Now what I want to achieve should look like this:

ID     BookedHotel

12     Laplaza; Ocean

13     Greengarden

15     Laplaza

 

So that if a customer books twice or more a year, the hotels are seperated by a semicolon in the same "cell". I already tried to achieve this by Pivot or generate Concatenation, but without success. 

Thank you for your help and sorry for any mistakes (I'm german).


Tagged:

Best Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi eldenoso,

     

    Aggregate is doing the job. concat(hotel) and group_by id. The default delimiter is | but you can of course replace it with a Replace operator.

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist
    Solution Accepted

    Hi Ingo,

     

    you can use this concat also for some fancy ticks. Since | is the or in regex you can extract concat(att) into a macro and use it in Select Attributes to select these attributes or in Filter Examples with a matches expression.

     

    Kudos to @hhomburg for this trick.

     

    ~Martin

     

    Example Process:

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="subprocess" compatibility="7.3.001" expanded="true" height="82" name="Subprocess" width="90" x="45" y="238">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="45" y="34">
    <list key="attribute_values">
    <parameter key="Keep" value="&quot;attribute_1&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="45" y="136">
    <list key="attribute_values">
    <parameter key="Keep" value="&quot;attribute_15&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="append" compatibility="7.3.001" expanded="true" height="103" name="Append" width="90" x="179" y="34"/>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Append" from_port="merged set" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Create a table of attributes to keep, can be stored in repo or taken from a DB</description>
    </operator>
    <operator activated="true" class="aggregate" compatibility="7.3.001" expanded="true" height="82" name="Aggregate" width="90" x="179" y="238">
    <list key="aggregation_attributes">
    <parameter key="Keep" value="concatenation"/>
    </list>
    </operator>
    <operator activated="true" class="extract_macro" compatibility="7.3.001" expanded="true" height="68" name="Extract Macro" width="90" x="313" y="238">
    <parameter key="macro" value="keep"/>
    <parameter key="macro_type" value="data_value"/>
    <parameter key="attribute_name" value="concat(Keep)"/>
    <parameter key="example_index" value="1"/>
    <list key="additional_macros"/>
    </operator>
    <operator activated="true" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="85">
    <parameter key="repository_entry" value="//Samples/data/Sonar"/>
    </operator>
    <operator activated="true" class="delay" compatibility="7.3.001" expanded="true" height="103" name="Delay" width="90" x="447" y="85">
    <parameter key="delay" value="none"/>
    <description align="center" color="transparent" colored="false" width="126">Ensure Execution Order</description>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="85">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="%{keep}"/>
    <description align="center" color="transparent" colored="false" width="126">The magic happens here!</description>
    </operator>
    <connect from_op="Subprocess" from_port="out 1" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Extract Macro" to_port="example set"/>
    <connect from_op="Extract Macro" from_port="example set" to_op="Delay" to_port="through 2"/>
    <connect from_op="Retrieve Sonar" from_port="output" to_op="Delay" to_port="through 1"/>
    <connect from_op="Delay" from_port="through 1" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    I must admit that I am very impressed.  I did not know about the concat function in the "Aggregate" operator and actually started to build a workflow of at least 10 operators to solve this when I saw your post.  This works like a charm and is so much more elegant!

     

    Here is a small example process showing how this works.

     

    Cheers,

    Ingo

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="112" y="34">
    <list key="attribute_values">
    <parameter key="ID" value="12"/>
    <parameter key="Hotel" value="&quot;Laplace&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification (2)" width="90" x="112" y="136">
    <list key="attribute_values">
    <parameter key="ID" value="13"/>
    <parameter key="Hotel" value="&quot;Greengarden&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification (3)" width="90" x="112" y="238">
    <list key="attribute_values">
    <parameter key="ID" value="12"/>
    <parameter key="Hotel" value="&quot;Ocean&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="generate_data_user_specification" compatibility="7.3.001" expanded="true" height="68" name="Generate Data by User Specification (4)" width="90" x="112" y="340">
    <list key="attribute_values">
    <parameter key="ID" value="15"/>
    <parameter key="Hotel" value="&quot;Laplace&quot;"/>
    </list>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="append" compatibility="7.3.001" expanded="true" height="145" name="Append" width="90" x="246" y="34"/>
    <operator activated="true" class="aggregate" compatibility="7.3.001" expanded="true" height="82" name="Aggregate" width="90" x="380" y="34">
    <list key="aggregation_attributes">
    <parameter key="Hotel" value="concatenation"/>
    </list>
    <parameter key="group_by_attributes" value="ID"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.3.001" expanded="true" height="82" name="Replace" width="90" x="514" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="concat(Hotel)"/>
    <parameter key="replace_what" value="\|"/>
    <parameter key="replace_by" value=";"/>
    </operator>
    <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 2"/>
    <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
    <connect from_op="Generate Data by User Specification (4)" from_port="output" to_op="Append" to_port="example set 4"/>
    <connect from_op="Append" from_port="merged set" to_op="Aggregate" to_port="example set input"/>
    <connect from_op="Aggregate" from_port="example set output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Nice one indeed :smileyvery-happy:

  • eldenosoeldenoso Member Posts: 65 Contributor I

    Thank you all for your help. The aggregation solution actually worked pretty good for my case :smileyvery-happy:

Sign In or Register to comment.