Logistic Regression - Normalization does not change Attribute Weights

cem_akyuzcem_akyuz Member Posts: 5 Contributor I
edited December 2018 in Help

Hello,

I am new here and in general with statistics and data mining. Apologies if I am asking a really stupid question. 

My question is about logistic regression and normalizing data. I have a data set with some columns skewed and have different scales. So I wanted to apply normalization (including centering, scaling and Box Cox transformation for skewness) prior to logistic regression. But instead I wanted to check to what extent normalization changes the results. 

I see that normalization prior to logistic regression changes the coefficients however attribute weights are exactly same with and without normalization. Am I missing something here?

Attached you can find my design for the analysis. (Logistic Regression and Normalization added with default settings)

Best Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
    Solution Accepted

    Try outputting the PRE port on the Normalization operator, that will tell you how it's normalizing the data.

  • earmijoearmijo Member Posts: 270 Unicorn
    Solution Accepted

    By default the operator Logistic Regression normalizes the data (but uses the word standardize instead of normalize). Uncheck the option 'standardize'.  It does make a difference to the coefficients whether you normalize or not.  Check the process below

     

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="246" y="187">
    <parameter key="repository_entry" value="//Samples/data/Sonar"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="187"/>
    <operator activated="true" class="normalize" compatibility="8.0.001" expanded="true" height="103" name="Normalize" width="90" x="648" y="340"/>
    <operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression (2)" width="90" x="849" y="340">
    <parameter key="standardize" value="false"/>
    </operator>
    <operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression" width="90" x="849" y="187">
    <parameter key="standardize" value="false"/>
    </operator>
    <connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Logistic Regression" to_port="training set"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Normalize" to_port="example set input"/>
    <connect from_op="Normalize" from_port="example set output" to_op="Logistic Regression (2)" to_port="training set"/>
    <connect from_op="Logistic Regression (2)" from_port="model" to_port="result 2"/>
    <connect from_op="Logistic Regression" from_port="model" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>

Answers

  • cem_akyuzcem_akyuz Member Posts: 5 Contributor I

    Thanks a lot, when I removed normalize box (which I do not need anymore as logistic regression has standardize in it) I could repeat the process with and without standardize option. Then I can see that attribute weights changed in each iteration.

     

    Thanks a lot!

    Cem

  • wassdullullwassdullull Member Posts: 12 Contributor I

    Hi, i wanted to have an explanation on logistic regression results from rapidminer. I wanted to know whether the p-values can be used to calculate odd ratios and how can it be interpreted.

Sign In or Register to comment.