Options

[SOLVED] normailze rows and columns by the same ratio

DerGaertnerDerGaertner Member Posts: 3 Contributor I
edited November 2018 in Help
Hello,

i want to normalze my table (range transforamtion), but there are different zero values. Before transformation there are only zeros in the first row, after tranformation i realized that there are values like:

0.030 ; 0.028 ; 0.031

Of course there are some negative values in the source table and so the zero isnt the zero anymore. I want that for each row and each column the value transformation is bijective.

Im able to write some groovy script to fix this, but i hope there is another way to do this.

Thanks for help!


edit:

Maybe this example demonstrate my problem.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="generate_data" compatibility="5.3.013" expanded="true" height="60" name="Generate Data" width="90" x="112" y="75"/>
     <operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="246" y="75">
       <parameter key="script" value="ExampleSet exampleSet = input[0];&#10;Attributes attributes = exampleSet.getAttributes();&#10;&#10;&#10;int count = 0;&#10;&#10;Attribute att1 = attributes.get(&quot;att1&quot;);&#10;Attribute att2 = attributes.get(&quot;att2&quot;);&#10;Attribute att3 = attributes.get(&quot;att3&quot;);&#10;Attribute att4 = attributes.get(&quot;att4&quot;);&#10;Attribute att5 = attributes.get(&quot;att5&quot;);&#10;&#10;exampleSet.getExample(0).setValue(att1, count);&#10;exampleSet.getExample(0).setValue(att2, count);&#10;exampleSet.getExample(0).setValue(att3, count);&#10;exampleSet.getExample(0).setValue(att4, count);&#10;exampleSet.getExample(0).setValue(att5, count);&#10;&#10;&#10;return exampleSet;"/>
     </operator>
     <operator activated="true" breakpoints="before" class="normalize" compatibility="5.3.013" expanded="true" height="94" name="Normalize" width="90" x="380" y="75">
       <parameter key="method" value="range transformation"/>
     </operator>
     <connect from_op="Generate Data" from_port="output" to_op="Execute Script" to_port="input 1"/>
     <connect from_op="Execute Script" from_port="output 1" to_op="Normalize" to_port="example set input"/>
     <connect from_op="Normalize" from_port="example set output" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • Options
    homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi,

    what exactly do you mean by "different zeros"? Applying the range transformation will of course shift your zero to something near by 0.5 since all original values are distributed in a range from approx. -10 to 10. For each attribute normalization parameters are individually adapted due to their very own distribution. If you want to scale all of them using the same function you may have a look at operators like "Generate Attributes" where you can define a custom function and apply it to different attributes.

    Cheers,
    Helge
  • Options
    DerGaertnerDerGaertner Member Posts: 3 Contributor I
    Hello Helge,
    For each attribute normalization parameters are individually adapted due to their very own distribution
    I want that these attributes got all the same distribution. Of course the zero is transformed to something near by 0.5, but i want that these zeros transformed to the exactly same number. For the same attribute is this no problem with the "Normalize" Operator, but different attribute zeros are not transformed to the same number. I want that RapidMiner extract an global max and min for the transformation!

    This is my Groovy-Code which does exactly what i need. For this "max" and "min" are the gloabal maximum and minimum.

    for (Attribute attribute : exampleSet.getAttributes()) {
        String name = attribute.getName();
        for (Example example : exampleSet) {
        if(example[name].getClass().getName().equals( "java.lang.String" )){ //this could be the ID}
        else{
        example[name] = (example[name] + min) / max;
        }
         
        }
    }
    "Generate Attribute" could do the same, but i had to do this with every Attribute and my table has 2500 of them.

    Thanks
  • Options
    homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi,

    you can even use the standard normalization. Creative misusage of the pivoting functionality can do the trick:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.008">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.0.008" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="6.0.008" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="120">
           <parameter key="repository_entry" value="//Samples/data/Sonar"/>
         </operator>
         <operator activated="true" class="select_attributes" compatibility="6.0.008" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="120">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="class"/>
           <parameter key="invert_selection" value="true"/>
           <parameter key="include_special_attributes" value="true"/>
         </operator>
         <operator activated="true" class="generate_id" compatibility="6.0.008" expanded="true" height="76" name="Generate ID" width="90" x="313" y="120"/>
         <operator activated="true" class="numerical_to_real" compatibility="6.0.008" expanded="true" height="76" name="Numerical to Real" width="90" x="447" y="120">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="id"/>
           <parameter key="include_special_attributes" value="true"/>
         </operator>
         <operator activated="true" class="de_pivot" compatibility="6.0.008" expanded="true" height="76" name="De-Pivot" width="90" x="581" y="120">
           <list key="attribute_name">
             <parameter key="value" value="attribute.*"/>
           </list>
           <parameter key="index_attribute" value="index"/>
           <parameter key="create_nominal_index" value="true"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="6.0.008" expanded="true" height="94" name="Normalize" width="90" x="715" y="120">
           <parameter key="attribute_filter_type" value="single"/>
           <parameter key="attribute" value="value"/>
           <parameter key="method" value="range transformation"/>
         </operator>
         <operator activated="true" class="pivot" compatibility="6.0.008" expanded="true" height="76" name="Pivot" width="90" x="849" y="120">
           <parameter key="group_attribute" value="id"/>
           <parameter key="index_attribute" value="index"/>
           <parameter key="consider_weights" value="false"/>
           <parameter key="skip_constant_attributes" value="false"/>
         </operator>
         <connect from_op="Retrieve Sonar" from_port="output" to_op="Select Attributes" to_port="example set input"/>
         <connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
         <connect from_op="Generate ID" from_port="example set output" to_op="Numerical to Real" to_port="example set input"/>
         <connect from_op="Numerical to Real" from_port="example set output" to_op="De-Pivot" to_port="example set input"/>
         <connect from_op="De-Pivot" from_port="example set output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Pivot" to_port="example set input"/>
         <connect from_op="Pivot" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Cheers,
    Helge
  • Options
    DerGaertnerDerGaertner Member Posts: 3 Contributor I
    Helge, Thank you! That is exactly what i want!
Sign In or Register to comment.