Options

[SOLVED] Convert binominal to numeric?

wesselwessel Member Posts: 537 Maven
edited November 2018 in Help
Dear All,

I have several binomial attributes, on which I wish to run linear regression.
So I must convert these binomial attributes with values "true" and "false" to real attributes with values "1" and "0".
How can I do this?

I tried the generate attributes operator but this did not work.
I used the following settings:
attribute name: myNewAtt    
functional expression: if(myAtt == true, 1, 0)

Even though this expression is functionally correct, it always returns 0.

Best regards,

Wessel

Answers

  • Options
    wesselwessel Member Posts: 537 Maven
    A process that does work is the following:
    using operators
    1. replace (replace all true values to 1)
    2. replace (replace all false values to 0)
    3. parse numbers

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
       <process expanded="true" height="642" width="778">
         <operator activated="true" class="replace" compatibility="5.1.017" expanded="true" height="76" name="Replace" width="90" x="59" y="140">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|cluster_2|cluster_1|cluster_0"/>
           <parameter key="replace_what" value="true"/>
           <parameter key="replace_by" value="1"/>
         </operator>
         <operator activated="true" class="replace" compatibility="5.1.017" expanded="true" height="76" name="Replace (2)" width="90" x="187" y="85">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|cluster_2|cluster_1|cluster_0"/>
           <parameter key="replace_what" value="false"/>
           <parameter key="replace_by" value="0"/>
         </operator>
         <operator activated="true" class="parse_numbers" compatibility="5.1.017" expanded="true" height="76" name="Parse Numbers" width="90" x="315" y="30">
           <parameter key="attribute_filter_type" value="subset"/>
           <parameter key="attributes" value="|cluster_2|cluster_1|cluster_0"/>
         </operator>
         <connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
         <connect from_op="Replace (2)" from_port="example set output" to_op="Parse Numbers" to_port="example set input"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
       </process>
     </operator>
    </process>
  • Options
    earmijoearmijo Member Posts: 270 Unicorn
    Hi Wessel:

    Two additional solutions:

    1) Use Weka's Linear Regression Operator. It will code the binomial attributes for you automatically. This is sooooo convenient.

    2) Use the "Nominal to Numerical" Operator and select Dummy Coding. You have to define then for each binomial variable a "comparison group" which will get coded 0. According to your message, the comparison group will be false.

    Regards,

    \E

    Here's a example that uses the Golf dataset:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.017">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.017" expanded="true" name="Process">
        <process expanded="true" height="637" width="950">
          <operator activated="true" class="retrieve" compatibility="5.1.017" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.1.017" expanded="true" height="94" name="Nominal to Numerical" width="90" x="182" y="72">
            <parameter key="coding_type" value="dummy coding"/>
            <parameter key="use_comparison_groups" value="true"/>
            <list key="comparison_groups">
              <parameter key="Wind" value="false"/>
              <parameter key="Outlook" value="sunny"/>
            </list>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.