binomalize multiple polynominal columns together

ybirdybird Member Posts: 4 Contributor I
edited December 2018 in Help

I have many columns A1-A10 (10 in this example)

Each column got polynominal values.

 

I want to binomalize the columns, but not each on its own, instead i want one attribute for each value, which appears in any of these columns A1-A10?

 

Example:

 

Input:

A1          A2       A3

"green"  "red"    ?

"red"      ?         "blue"

 

Output:

A = green  A = red   A = blue

True          True       False

False         True       True

 

 

 

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Use the Nominal to Numerical operator and leave the default parameter of Dummy Coding set.

  • ybirdybird Member Posts: 4 Contributor I

    hi thank but i still get 

     

    A1 = green A2 = green A1 = red A2 = red

     

    i want to something like:

     

    A = green A = red 

  • ybirdybird Member Posts: 4 Contributor I

    I found a solution. First create an id for each example. Then do unpivot on all attributes A1-A10 into one attribute A. Then do Nom-to-Bin on A. Then do an pivot with group set to your id and index set to the index of unpivot.

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    I think that the combination of Nominal to Numerical and Generate Aggregation is an elegant solution.

     

    Have a look at this process (an artificial dataset is generated using a R script):

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="112" y="34">
    <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10; cat1 &lt;- c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, NA)&#10;&#9;&#10; df &lt;- data.frame(sample(cat1, 2500, replace = T),&#10;&#9; sample(cat1, 2500, replace = T),&#10;&#9; sample(cat1, 2500, replace = T))&#10;&#9;&#10; colnames(df) &lt;- paste(&quot;Att&quot;, 1:3)&#10;&#10; # connect 2 output ports to see the results&#10; return(df)&#10;}&#10;"/>
    </operator>
    <operator activated="true" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="380" y="34">
    <list key="comparison_groups"/>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="7.5.003" expanded="true" height="82" name="Generate Aggregation" width="90" x="581" y="34">
    <parameter key="attribute_name" value="HasA"/>
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value=".*A"/>
    <parameter key="aggregation_function" value="maximum"/>
    </operator>
    <connect from_op="Execute R" from_port="output 1" to_op="Nominal to Numerical" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Here are the parameters of the Generate Aggregation for a quick view:

     

    test.png

     

    Best,

    Sebastian

     

    Edit: Here is a version that uses Loop Values

    <?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Execute R" width="90" x="112" y="34">
    <parameter key="script" value="# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;rm_main = function()&#10;{&#10; cat1 &lt;- c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, NA)&#10;&#9;&#10; df &lt;- data.frame(sample(cat1, 2500, replace = T),&#10;&#9; sample(cat1, 2500, replace = T),&#10;&#9; sample(cat1, 2500, replace = T))&#10;&#9;&#10; colnames(df) &lt;- paste(&quot;Att&quot;, 1:3)&#10;&#10; # connect 2 output ports to see the results&#10; return(df)&#10;}&#10;"/>
    </operator>
    <operator activated="true" class="concurrency:loop_values" compatibility="7.5.003" expanded="true" height="82" name="Loop Values" width="90" x="447" y="34">
    <parameter key="attribute" value="Att 1"/>
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="nominal_to_numerical" compatibility="7.5.003" expanded="true" height="103" name="Nominal to Numerical" width="90" x="246" y="34">
    <list key="comparison_groups"/>
    </operator>
    <operator activated="true" class="generate_aggregation" compatibility="7.5.003" expanded="true" height="82" name="Generate Aggregation" width="90" x="514" y="34">
    <parameter key="attribute_name" value="Has%{loop_value}"/>
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value=".*%{loop_value}"/>
    <parameter key="aggregation_function" value="maximum"/>
    </operator>
    <connect from_port="input 1" to_op="Nominal to Numerical" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Generate Aggregation" to_port="example set input"/>
    <connect from_op="Generate Aggregation" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Execute R" from_port="output 1" to_op="Loop Values" to_port="input 1"/>
    <connect from_op="Loop Values" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • ybirdybird Member Posts: 4 Contributor I

    edit: you have to do aggregation with the attribute id instead of the last pivot step and select all A rows with RegEx and set to only occuring as type

Sign In or Register to comment.