Normalization

DocMusherDocMusher Member Posts: 333 Unicorn
edited November 2018 in Help
Hi,

A dataset of 20 samples from 3 time points (54 attributes per time point)
Normalization should be performed on the subset from each time point right? or on the entire data set?
The aim is to analyse changes of attributes between 3 points.

Which python or R plot best illustrates changes for that few samples vs attributes? Anyone a better way to illustrate?
https://beckmw.wordpress.com/2013/04/01/a-nifty-line-plot-to-visualize-multivariate-time-series/

Cheers
Sven

Answers

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 284 RM Research
    Hi Sven,

    when to normalize should really depend on what you want to compare.
    If you compare within your sampling, then normalization within the sample is fine.
    But if comparing different samples, then this might give false impressions.
    The process below shows the differences.
    I hope this clearifies your question.

    BTW I like your plots a lot and they are good example of data visualization and to tell the story of these data. I would only option to choose only a few species, with all 25 (as in the first example) it is simply too much information in single plot.

    Best,
    David
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process >
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="7.0.000-SNAPSHOT" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="true" class="retrieve" compatibility="7.0.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="34">
           <parameter key="repository_entry" value="//Samples/data/Sonar"/>
         </operator>
         <operator activated="true" class="subprocess" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Subprocess" width="90" x="179" y="34">
           <process expanded="true">
             <operator activated="true" class="generate_id" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Generate ID" width="90" x="45" y="34"/>
             <operator activated="true" class="select_attributes" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="34">
               <parameter key="attribute_filter_type" value="single"/>
               <parameter key="attribute" value="attribute_1"/>
             </operator>
             <connect from_port="in 1" to_op="Generate ID" to_port="example set input"/>
             <connect from_op="Generate ID" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
             <connect from_op="Select Attributes" from_port="example set output" to_port="out 1"/>
             <portSpacing port="source_in 1" spacing="0"/>
             <portSpacing port="source_in 2" spacing="0"/>
             <portSpacing port="sink_out 1" spacing="0"/>
             <portSpacing port="sink_out 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="multiply" compatibility="7.0.000-SNAPSHOT" expanded="true" height="103" name="Multiply" width="90" x="346" y="34"/>
         <operator activated="true" class="sample" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Sample" width="90" x="514" y="34">
           <parameter key="sample" value="relative"/>
           <list key="sample_size_per_class"/>
           <list key="sample_ratio_per_class"/>
           <list key="sample_probability_per_class"/>
           <parameter key="use_local_random_seed" value="true"/>
           <parameter key="local_random_seed" value="5"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="7.0.000-SNAPSHOT" expanded="true" height="103" name="Normalize" width="90" x="648" y="34"/>
         <operator activated="true" class="rename" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Rename" width="90" x="782" y="34">
           <parameter key="old_name" value="attribute_1"/>
           <parameter key="new_name" value="Normalized after sampling_Z-Transformation"/>
           <list key="rename_additional_attributes"/>
         </operator>
         <operator activated="true" class="materialize_data" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Materialize Data" width="90" x="514" y="238"/>
         <operator activated="true" class="normalize" compatibility="7.0.000-SNAPSHOT" expanded="true" height="103" name="Normalize (2)" width="90" x="648" y="238"/>
         <operator activated="true" class="sample" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Sample (2)" width="90" x="782" y="238">
           <parameter key="sample" value="relative"/>
           <list key="sample_size_per_class"/>
           <list key="sample_ratio_per_class"/>
           <list key="sample_probability_per_class"/>
           <parameter key="use_local_random_seed" value="true"/>
           <parameter key="local_random_seed" value="5"/>
         </operator>
         <operator activated="true" class="rename" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Rename (2)" width="90" x="849" y="136">
           <parameter key="old_name" value="attribute_1"/>
           <parameter key="new_name" value="Normalized before sampling Z-Transformation"/>
           <list key="rename_additional_attributes"/>
         </operator>
         <operator activated="true" class="join" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Join" width="90" x="916" y="34">
           <list key="key_attributes"/>
         </operator>
         <operator activated="true" class="sample" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Sample (3)" width="90" x="514" y="442">
           <parameter key="sample" value="relative"/>
           <list key="sample_size_per_class"/>
           <list key="sample_ratio_per_class"/>
           <list key="sample_probability_per_class"/>
           <parameter key="use_local_random_seed" value="true"/>
           <parameter key="local_random_seed" value="5"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="7.0.000-SNAPSHOT" expanded="true" height="103" name="Normalize (4)" width="90" x="648" y="442">
           <parameter key="method" value="range transformation"/>
         </operator>
         <operator activated="true" class="rename" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Rename (4)" width="90" x="782" y="442">
           <parameter key="old_name" value="attribute_1"/>
           <parameter key="new_name" value="Normalized after sampling 0-1-Transformation"/>
           <list key="rename_additional_attributes"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="7.0.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Sonar (2)" width="90" x="112" y="646">
           <parameter key="repository_entry" value="//Samples/data/Sonar"/>
         </operator>
         <operator activated="true" class="subprocess" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Subprocess (2)" width="90" x="313" y="646">
           <process expanded="true">
             <operator activated="true" class="generate_id" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Generate ID (2)" width="90" x="45" y="34"/>
             <operator activated="true" class="select_attributes" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="34">
               <parameter key="attribute_filter_type" value="single"/>
               <parameter key="attribute" value="attribute_1"/>
             </operator>
             <connect from_port="in 1" to_op="Generate ID (2)" to_port="example set input"/>
             <connect from_op="Generate ID (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
             <connect from_op="Select Attributes (2)" from_port="example set output" to_port="out 1"/>
             <portSpacing port="source_in 1" spacing="0"/>
             <portSpacing port="source_in 2" spacing="0"/>
             <portSpacing port="sink_out 1" spacing="0"/>
             <portSpacing port="sink_out 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="materialize_data" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Materialize Data (2)" width="90" x="514" y="646">
           <parameter key="datamanagement" value="float_array"/>
         </operator>
         <operator activated="true" class="normalize" compatibility="7.0.000-SNAPSHOT" expanded="true" height="103" name="Normalize (3)" width="90" x="648" y="646">
           <parameter key="method" value="range transformation"/>
         </operator>
         <operator activated="true" class="sample" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Sample (4)" width="90" x="782" y="646">
           <parameter key="sample" value="relative"/>
           <list key="sample_size_per_class"/>
           <list key="sample_ratio_per_class"/>
           <list key="sample_probability_per_class"/>
           <parameter key="use_local_random_seed" value="true"/>
           <parameter key="local_random_seed" value="5"/>
         </operator>
         <operator activated="true" class="rename" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Rename (3)" width="90" x="916" y="544">
           <parameter key="old_name" value="attribute_1"/>
           <parameter key="new_name" value="Normalized before sampling 0-1 Transformation"/>
           <list key="rename_additional_attributes"/>
         </operator>
         <operator activated="true" class="join" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Join (2)" width="90" x="983" y="442">
           <list key="key_attributes"/>
         </operator>
         <operator activated="true" class="join" compatibility="7.0.000-SNAPSHOT" expanded="true" height="82" name="Join (3)" width="90" x="1001" y="238">
           <list key="key_attributes"/>
         </operator>
         <connect from_op="Retrieve Sonar" from_port="output" to_op="Subprocess" to_port="in 1"/>
         <connect from_op="Subprocess" from_port="out 1" to_op="Multiply" to_port="input"/>
         <connect from_op="Multiply" from_port="output 1" to_op="Sample (3)" to_port="example set input"/>
         <connect from_op="Multiply" from_port="output 2" to_op="Sample" to_port="example set input"/>
         <connect from_op="Sample" from_port="example set output" to_op="Normalize" to_port="example set input"/>
         <connect from_op="Sample" from_port="original" to_op="Materialize Data" to_port="example set input"/>
         <connect from_op="Normalize" from_port="example set output" to_op="Rename" to_port="example set input"/>
         <connect from_op="Rename" from_port="example set output" to_op="Join" to_port="left"/>
         <connect from_op="Materialize Data" from_port="example set output" to_op="Normalize (2)" to_port="example set input"/>
         <connect from_op="Normalize (2)" from_port="example set output" to_op="Sample (2)" to_port="example set input"/>
         <connect from_op="Sample (2)" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
         <connect from_op="Rename (2)" from_port="example set output" to_op="Join" to_port="right"/>
         <connect from_op="Join" from_port="join" to_op="Join (3)" to_port="left"/>
         <connect from_op="Sample (3)" from_port="example set output" to_op="Normalize (4)" to_port="example set input"/>
         <connect from_op="Normalize (4)" from_port="example set output" to_op="Rename (4)" to_port="example set input"/>
         <connect from_op="Rename (4)" from_port="example set output" to_op="Join (2)" to_port="left"/>
         <connect from_op="Retrieve Sonar (2)" from_port="output" to_op="Subprocess (2)" to_port="in 1"/>
         <connect from_op="Subprocess (2)" from_port="out 1" to_op="Materialize Data (2)" to_port="example set input"/>
         <connect from_op="Materialize Data (2)" from_port="example set output" to_op="Normalize (3)" to_port="example set input"/>
         <connect from_op="Normalize (3)" from_port="example set output" to_op="Sample (4)" to_port="example set input"/>
         <connect from_op="Sample (4)" from_port="example set output" to_op="Rename (3)" to_port="example set input"/>
         <connect from_op="Rename (3)" from_port="example set output" to_op="Join (2)" to_port="right"/>
         <connect from_op="Join (2)" from_port="join" to_op="Join (3)" to_port="right"/>
         <connect from_op="Join (3)" from_port="join" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • DocMusherDocMusher Member Posts: 333 Unicorn
    David,
    thanks!
    Sven
  • DocMusherDocMusher Member Posts: 333 Unicorn
    Per time point a sample generate features having different units. Different time points are compared. Normalization per time point or on the complete dataset, what is the right choice.
    Thanks
    Sven
Sign In or Register to comment.