RapidMiner

binomalize multiple polynominal columns together

Learner I ybird
Learner I

binomalize multiple polynominal columns together

I have many columns A1-A10 (10 in this example)

Each column got polynominal values.

 

I want to binomalize the columns, but not each on its own, instead i want one attribute for each value, which appears in any of these columns A1-A10?

 

Example:

 

Input:

A1          A2       A3

"green"  "red"    ?

"red"      ?         "blue"

 

Output:

A = green  A = red   A = blue

True          True       False

False         True       True

 

 

 

5 REPLIES
RM Certified Expert
RM Certified Expert

Re: binomalize multiple polynominal columns together

Use the Nominal to Numerical operator and leave the default parameter of Dummy Coding set.

Learner I ybird
Learner I

Re: binomalize multiple polynominal columns together

hi thank but i still get 

 

A1 = green A2 = green A1 = red A2 = red

 

i want to something like:

 

A = green A = red 

Learner I ybird
Learner I

Re: binomalize multiple polynominal columns together

I found a solution. First create an id for each example. Then do unpivot on all attributes A1-A10 into one attribute A. Then do Nom-to-Bin on A. Then do an pivot with group set to your id and index set to the index of unpivot.

RM Staff
RM Staff

Re: binomalize multiple polynominal columns together

I think that the combination of Nominal to Numerical and Generate Aggregation is an elegant solution.

 

Have a look at this process (an artificial dataset is generated using a R script):

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="112" y="34">
        <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10;    cat1 &lt;- c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, NA)&#10;&#9;&#10;    df &lt;- data.frame(sample(cat1, 2500, replace = T),&#10;&#9;                 sample(cat1, 2500, replace = T),&#10;&#9;                 sample(cat1, 2500, replace = T))&#10;&#9;&#10;    colnames(df) &lt;- paste(&quot;Att&quot;, 1:3)&#10;&#10;    # connect 2 output ports to see the results&#10;    return(df)&#10;}&#10;"/>
      </operator>
      <operator activated="true" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="380" y="34">
        <list key="comparison_groups"/>
      </operator>
      <operator activated="true" class="generate_aggregation" compatibility="7.5.003" expanded="true" height="82" name="Generate Aggregation" width="90" x="581" y="34">
        <parameter key="attribute_name" value="HasA"/>
        <parameter key="attribute_filter_type" value="regular_expression"/>
        <parameter key="regular_expression" value=".*A"/>
        <parameter key="aggregation_function" value="maximum"/>
      </operator>
      <connect from_op="Execute R" from_port="output 1" to_op="Nominal to Numerical" to_port="example set input"/>
      <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
      <connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Here are the parameters of the Generate Aggregation for a quick view:

 

test.png

 

Best,

Sebastian

 

Edit: Here is a version that uses Loop Values

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="112" y="34">
        <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10;    cat1 &lt;- c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, NA)&#10;&#9;&#10;    df &lt;- data.frame(sample(cat1, 2500, replace = T),&#10;&#9;                 sample(cat1, 2500, replace = T),&#10;&#9;                 sample(cat1, 2500, replace = T))&#10;&#9;&#10;    colnames(df) &lt;- paste(&quot;Att&quot;, 1:3)&#10;&#10;    # connect 2 output ports to see the results&#10;    return(df)&#10;}&#10;"/>
      </operator>
      <operator activated="true" class="concurrency:loop_values" compatibility="7.5.003" expanded="true" height="82" name="Loop Values" width="90" x="447" y="34">
        <parameter key="attribute" value="Att 1"/>
        <parameter key="reuse_results" value="true"/>
        <process expanded="true">
          <operator activated="true" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="246" y="34">
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="generate_aggregation" compatibility="7.5.003" expanded="true" height="82" name="Generate Aggregation" width="90" x="514" y="34">
            <parameter key="attribute_name" value="Has%{loop_value}"/>
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="regular_expression" value=".*%{loop_value}"/>
            <parameter key="aggregation_function" value="maximum"/>
          </operator>
          <connect from_port="input 1" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
          <connect from_op="Generate Aggregation" from_port="example set output" to_port="output 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Execute R" from_port="output 1" to_op="Loop Values" to_port="input 1"/>
      <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Learner I ybird
Learner I

Re: binomalize multiple polynominal columns together

edit: you have to do aggregation with the attribute id instead of the last pivot step and select all A rows with RegEx and set to only occuring as type