Newbi - Append Examples to generate a new row

homero_merino · January 2018

Hi forum,

Actually I'm too new with RapidMiner but with quite some experience using Python.

The problem that I'm facing is that I have a file with data in this form:

Column Value

ContextDataValuesAgeValue 55to64
ContextDataValuesGenderValue Female
ProductId cb4d59cf-c48d-47ef-a943-50b2ae5d01ee
Rating 5.0
RatingRange 5.0
SubmissionTime 2016-09-14T14:39:14.000+00:00
UserLocation Southport, United Kingdom
ContextDataValuesAgeValue 45to54
ContextDataValuesGenderValue Female
ProductId cb4d59cf-c48d-47ef-a943-50b2ae5d01ee
Rating 5.0
RatingRange 5.0
SubmissionTime 2017-11-10T09:31:42.000+00:00
UserLocation London

What I need to do is to create a new file with each one of the "columns" and their corresponding value in one row for each group of columns. In this example I wrote 2 "groups" or new rows.

I have tried the PIVOT component but because the labels (text) of the Columns are the same (repeated in different rows) it throws an error of "Column name already exists". I also tried the Loop component but I don't know how to tell it "process the first 7 rows, pivot them, generate a new Example (row) and continue gropuing the rest of the file". I know is pretty simple but really can't find the way to do it.

I really appreciate all the help with this.

Thanks in advance!

Telcontar120 · January 2018

If you have the same number of attributes every time, you can do this using Pivot, but you need to create a new index variable first. You can do that by first generating a numeric ID and then using Generate Attributes and use modulus arithmetic to give you an number from 0 to 7 (using mod 8 function), and then group them. You should then be able to use that as your index to Pivot your data. Something like the attached process.

 <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
        <parameter key="csv_file" value="C:\Users\brian\Downloads\sample.txt"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <list key="data_set_meta_data_information"/>
      </operator>
      <operator activated="false" breakpoints="after" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="136">
        <parameter key="text" value="Time&#10;Location&#10;Incident&#10;Oct 25th&#10;Tampa&#10;Robbery&#10;Oct 25th&#10;Miami&#10;Theft&#10;Oct 26th&#10;Brandon&#10;Assault"/>
        <description align="center" color="transparent" colored="false" width="126">This contains the type of data which this works on, with each attribute contained in a separate row but cycling through the same attributes in order.</description>
      </operator>
      <operator activated="true" class="generate_id" compatibility="8.0.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
      <operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="34">
        <list key="function_descriptions">
          <parameter key="index" value="mod(id,3)"/>
          <parameter key="example" value="floor((id-1)/3)"/>
        </list>
      </operator>
      <operator activated="true" class="pivot" compatibility="8.0.001" expanded="true" height="82" name="Pivot" width="90" x="447" y="34">
        <parameter key="group_attribute" value="example"/>
        <parameter key="index_attribute" value="index"/>
      </operator>
      <operator activated="true" class="rename_by_example_values" compatibility="8.0.001" expanded="true" height="82" name="Rename by Example Values" width="90" x="581" y="34">
        <description align="center" color="transparent" colored="false" width="126">Used if the names of the attributes are in the first set of examples.</description>
      </operator>
      <operator activated="true" class="rename" compatibility="8.0.001" expanded="true" height="82" name="Rename" width="90" x="715" y="34">
        <parameter key="old_name" value="0.0"/>
        <parameter key="new_name" value="Example"/>
        <list key="rename_additional_attributes"/>
        <description align="center" color="transparent" colored="false" width="126">Used to rename attributes manually if needed.</description>
      </operator>
      <operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="849" y="34">
        <parameter key="attribute_name" value="Example"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Pivot" to_port="example set input"/>
      <connect from_op="Pivot" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
      <connect from_op="Rename by Example Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
      <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

MartinLiebig · January 2018

Hello @homero_merino,

i think you just want to use the Transpose operator maybe followed by a guess types?

What function would you use in pandas?

Best,

Martin

homero_merino · January 2018

Hi Martin,

Thanks for your reply. The answer is yes and no.

The problem with the TRANSPOSE function is that it raises a "Duplicate attribute name" error when the same "label" is repeated as stated in the example above.

I want to group the attributes (its always the same number of attributes - 7) into one single row (TRANSPOSE) in a simple way.

Thanks again, kind regards!

homero_merino · January 2018

Thank you for your reply Brian, your solution is correct.

This kind of problem just need a common "id" for grouping the rows, and with the PIVOT component you just need to select the ID attribute.

Thanks a lot, kind regards!

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Newbi - Append Examples to generate a new row

Best Answer

Answers