[SOLVED] Unexpected output for linear regression operator

dysprosiumdysprosium Member Posts: 7 Contributor II
edited August 2019 in Help
I'm quite new to Rapid Miner and trying the linear regression for the first time. I’m applying the Linear Regression operator on a training data set, then outputting a regression model which is input to the Apply Model operator. Then I will apply the Model to an unlabelled data set.
There is one special attribute (label) and 3 regular attributes in the training data set. It has only 7 examples at the moment (easier for me to see what’s happening). The attributes are all integers.
In the attribute weights output from the linear regression, I’m expecting all of the 3 regular attributes to have a weight greater than zero. However, when I run the process only one attribute has a weight greater than 0. Its weight is 0.268. The other two attributes have a weight of 0. It seems as if the linear regression operator is ignoring those two attributes. Why?
The reason I expect all of the weights from the Linear Regression to be non-zero is because when I input exactly the same training set to the Vector Linear Regression operator, I get either positive or negative weights for all three regular attributes.

<process version="6.0.002">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" breakpoints="after" class="retrieve" compatibility="6.0.002" expanded="true" height="60" name="Retrieve regression data Oct 13 training set 1 special (label) attribute 3 regular attributes" width="90" x="112" y="75">
       <parameter key="repository_entry" value="../data/regression data Oct 13 training set 1 special (label) attribute 3 regular attributes"/>
     </operator>
     <operator activated="true" breakpoints="after" class="linear_regression" compatibility="6.0.002" expanded="true" height="94" name="Linear Regression" width="90" x="313" y="30"/>
     <operator activated="true" breakpoints="after" class="retrieve" compatibility="6.0.002" expanded="true" height="60" name="Retrieve regression data Oct 13 UNLABELLED set 1 special (label) attribute 3 regular attributes" width="90" x="112" y="255">
       <parameter key="repository_entry" value="../data/regression data Oct 13 UNLABELLED set 1 special (label) attribute 3 regular attributes"/>
     </operator>
     <operator activated="true" breakpoints="after" class="apply_model" compatibility="6.0.002" expanded="true" height="76" name="Apply Model" width="90" x="514" y="165">
       <list key="application_parameters"/>
     </operator>
     <connect from_op="Retrieve regression data Oct 13 training set 1 special (label) attribute 3 regular attributes" from_port="output" to_op="Linear Regression" to_port="training set"/>
     <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
     <connect from_op="Linear Regression" from_port="weights" to_port="result 2"/>
     <connect from_op="Retrieve regression data Oct 13 UNLABELLED set 1 special (label) attribute 3 regular attributes" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
     <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>
Tagged:

Answers

  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Dy,

    the Linear Regression in RapidMiner offers a few built-in features such as feature selection or colinear feature elimination. Please set feature selection to none (the default is M5 prime) and disable the "eliminate colinear features" check box. No the algorithm shall use all of your three attributes.

    Cheers,
    Helge
  • dysprosiumdysprosium Member Posts: 7 Contributor II
    Hi Helge,
    I corrected the feature settings and now getting weights for all the attributes.
    Just one more question .... the results I get (for this data set) from the linear regression operator are exactly the same (except formatted differently) as the results from vector linear regression. How are the two algorithms different? 
    Thanks!
    Dy
  • homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Dy,

    the algorithms only differ in those feature selection options you just disabled. The vector version does the same job as the linear regression but for a vector label (in this case serveral numerical attributes). If you input one label you will receive more or less the same model.

    Cheers,
        Helge
  • dysprosiumdysprosium Member Posts: 7 Contributor II
    I'm glad to hear that's the case, as I prefer the equation that I get from the vector linear regression - more intuitive than the table from linear regression. I'll get on with applying the model now.

    Cheers,
    Dy





Sign In or Register to comment.