[SOLVED] Normalise the training set; also normalise test set?

ben_hben_h Member Posts: 17 Contributor II
edited November 2018 in Help
I normalise all of the attributes in a training set, including the Labelled attribute [is this recommended or not?]
I train a linear regression model using the training set.
I then apply the model to a dataset (the test set). Does this data set need to have the same normalisations as the training set?
If the Labelled attribute has been normalised in the test set, how do I revert to de-normalised data?
The documentation of the "De-Normalise" operator is empty, and I don't understand it.

Cheers,
Ben

Answers

  • venkateshvenkatesh Member Posts: 15 Contributor II
    Yes, you need to use the same preprocessing model with the test and the training set. Here is a sample process of how you do it.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Root">
        <description>This learner creates a linear regression model allowing numerical predictions for the loaded data set.</description>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="../../data/Polynomial"/>
          </operator>
          <operator activated="true" class="normalize" compatibility="5.3.007" expanded="true" height="94" name="Normalize" width="90" x="179" y="165"/>
          <operator activated="true" class="linear_regression" compatibility="5.3.007" expanded="true" height="94" name="LinearRegression" width="90" x="380" y="120"/>
          <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="300">
            <parameter key="repository_entry" value="../../data/Polynomial"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.007" expanded="true" height="76" name="Apply Model" width="90" x="380" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.3.007" expanded="true" height="76" name="Apply Model (2)" width="90" x="581" y="165">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Normalize" to_port="example set input"/>
          <connect from_op="Normalize" from_port="example set output" to_op="LinearRegression" to_port="training set"/>
          <connect from_op="Normalize" from_port="preprocessing model" to_op="Apply Model" to_port="model"/>
          <connect from_op="LinearRegression" from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • ben_hben_h Member Posts: 17 Contributor II
    Thanks Venki, I had no idea.

    Ben.
Sign In or Register to comment.