Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Creating dummy variables from text columns

jaakko1jaakko1 Member Posts: 8 Contributor II
edited June 2019 in Help
Hello,

is there a more efficient way to create dummy variables using Rapidminer?

Eg, I have to following data
Col1;Col2
1;dog
2;cat
3;zebra
And I would like it to look like this
Col1;dog;cat;zebra
1;1;0;0
2;0;1;0
3;0;0;1
I know I can do this with Generate Attributes' if function, but as I need to create more than 100 dummies, I was wondering if there is a quicker method.

Thanks a lot!

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    I think either the Pivot or Transpose operators are what you are looking for. 
    Not sure which one at the moment as I'm a little sleepy.  Give both a try.  ::)
  • jaakko1jaakko1 Member Posts: 8 Contributor II
    Hi,

    thanks for the reply. However somehow I still cant manage to have the data converted into dummy variables using neither Pivot or Transpose.

    Maybe the problem is the text format of the variables?
  • frasfras Member Posts: 93 Contributor II
    For pivoting two attributes need to be nominal, one numeric (the value attribute,
    compare with example process).
    If the value attribute is missing, you have to generate it:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_csv" compatibility="6.0.003" expanded="true" height="60" name="Read CSV" width="90" x="45" y="210">
            <parameter key="csv_file" value="http://pastebin.com/raw.php?i=ASgMba0T"/>
            <parameter key="column_separators" value="\s+"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="encoding" value="windows-1252"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Attribute01.true.polynominal.attribute"/>
              <parameter key="1" value="Attribute02.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" breakpoints="after" class="generate_attributes" compatibility="6.0.008" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="179" y="210">
            <list key="function_descriptions">
              <parameter key="quantity" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="pivot" compatibility="6.0.008" expanded="true" height="76" name="Pivot" width="90" x="313" y="210">
            <parameter key="group_attribute" value="Attribute01"/>
            <parameter key="index_attribute" value="Attribute02"/>
            <parameter key="consider_weights" value="false"/>
            <parameter key="weight_aggregation" value="count"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="6.0.008" expanded="true" height="76" name="Rename by Replacing" width="90" x="447" y="210">
            <parameter key="replace_what" value="quantity_(.*)"/>
            <parameter key="replace_by" value="$1"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="6.0.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="581" y="210">
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="0"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

Sign In or Register to comment.