Options

Newbi - Append Examples to generate a new row

homero_merinohomero_merino Member Posts: 5 Contributor I
edited December 2018 in Help

Hi forum,

 

Actually I'm too new with RapidMiner but with quite some experience using Python.

 

The problem that I'm facing is that I have a file with data in this form:

Column                                                  Value

ContextDataValuesAgeValue            55to64
ContextDataValuesGenderValue     Female
ProductId                                               cb4d59cf-c48d-47ef-a943-50b2ae5d01ee
Rating                                                     5.0
RatingRange                                         5.0
SubmissionTime                                 2016-09-14T14:39:14.000+00:00
UserLocation                                       Southport, United Kingdom
ContextDataValuesAgeValue           45to54
ContextDataValuesGenderValue    Female
ProductId                                              cb4d59cf-c48d-47ef-a943-50b2ae5d01ee
Rating                                                    5.0
RatingRange                                        5.0
SubmissionTime                                 2017-11-10T09:31:42.000+00:00
UserLocation                                       London

 

What I need to do is to create a new file with each one of the "columns" and their corresponding value in one row for each group of columns. In this example I wrote 2 "groups" or new rows.

 

I have tried the PIVOT component but because the labels (text) of the Columns are the same (repeated in different rows) it throws an error of "Column name already exists". I also tried the Loop component but I don't know how to tell it "process the first 7 rows, pivot them, generate a new Example (row) and continue gropuing the rest of the file". I know is pretty simple but really can't find the way to do it.

 

I really appreciate all the help with this.

 

Thanks in advance!

 

Tagged:

Best Answer

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    If you have the same number of attributes every time, you can do this using Pivot, but you need to create a new index variable first.  You can do that by first generating a numeric ID and then using Generate Attributes and use modulus arithmetic to give you an number from 0 to 7 (using mod 8 function), and then group them.  You should then be able to use that as your index to Pivot your data.  Something like the attached process.

     <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\brian\Downloads\sample.txt"/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="false" breakpoints="after" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="136">
    <parameter key="text" value="Time&#10;Location&#10;Incident&#10;Oct 25th&#10;Tampa&#10;Robbery&#10;Oct 25th&#10;Miami&#10;Theft&#10;Oct 26th&#10;Brandon&#10;Assault"/>
    <description align="center" color="transparent" colored="false" width="126">This contains the type of data which this works on, with each attribute contained in a separate row but cycling through the same attributes in order.</description>
    </operator>
    <operator activated="true" class="generate_id" compatibility="8.0.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
    <operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
    <list key="function_descriptions">
    <parameter key="index" value="mod(id,3)"/>
    <parameter key="example" value="floor((id-1)/3)"/>
    </list>
    </operator>
    <operator activated="true" class="pivot" compatibility="8.0.001" expanded="true" height="82" name="Pivot" width="90" x="447" y="34">
    <parameter key="group_attribute" value="example"/>
    <parameter key="index_attribute" value="index"/>
    </operator>
    <operator activated="true" class="rename_by_example_values" compatibility="8.0.001" expanded="true" height="82" name="Rename by Example Values" width="90" x="581" y="34">
    <description align="center" color="transparent" colored="false" width="126">Used if the names of the attributes are in the first set of examples.</description>
    </operator>
    <operator activated="true" class="rename" compatibility="8.0.001" expanded="true" height="82" name="Rename" width="90" x="715" y="34">
    <parameter key="old_name" value="0.0"/>
    <parameter key="new_name" value="Example"/>
    <list key="rename_additional_attributes"/>
    <description align="center" color="transparent" colored="false" width="126">Used to rename attributes manually if needed.</description>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="849" y="34">
    <parameter key="attribute_name" value="Example"/>
    <parameter key="target_role" value="id"/>
    <list key="set_additional_roles"/>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_op="Pivot" to_port="example set input"/>
    <connect from_op="Pivot" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
    <connect from_op="Rename by Example Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
    <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    Hello @homero_merino,

     

    i think you just want to use the Transpose operator maybe followed by a guess types?

     

    What function would you use in pandas?

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    homero_merinohomero_merino Member Posts: 5 Contributor I

    Hi Martin,

    Thanks for your reply. The answer is yes and no.

    The problem with the TRANSPOSE function is that it raises a "Duplicate attribute name" error when the same "label" is repeated as stated in the example above.

     

    I want to group the attributes (its always the same number of attributes - 7) into one single row (TRANSPOSE) in a simple way.

     

    Thanks again, kind regards!

     

  • Options
    homero_merinohomero_merino Member Posts: 5 Contributor I

    Thank you for your reply Brian, your solution is correct.

     

    This kind of problem just need a common "id" for grouping the rows, and with the PIVOT component you just need to select the ID attribute.

     

    Thanks a lot, kind regards!

     

Sign In or Register to comment.