Linear Regression Coefficients problem

mattmitchell73mattmitchell73 Member Posts: 2 Contributor I
edited December 2018 in Help

Hi, I'm relatively new to RapidMiner and have come across something that I do not understand in a linear regression model.


The issue is on the output - the model has 4 predictor variables (Population, Births, Wine Consumption, Liquor Consumption) and the output variable of Cirrhosis_DeathRate. The Cirrhosis_DeathRate is selected as a label in the Select Attributes operator. However, on running Rapid Miner only produces coefficients for Births, Wine Consumption and Liquor Consumption but not for Population.


I've run the same analysis in the data analysis pack in Excel and whilst the p-value for Population is not significant its not worse than liquor consumption which is showing in the RM output. Subsequently I'm at a bit of a loss as to why the population coefficient is not being calcuated. In addition population ~ Cirrhosis DeathRate is showing a relatively strong (0.7569) showing in the correlation matrix.


Any suggestions would be gratefully accpeted.





<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
<operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve death by wine1" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/processes/death by wine1"/>
<operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="Cirrhosis_DeathRate"/>
<list key="set_additional_roles">
<parameter key="Cirrhosis_DeathRate" value="label"/>
<parameter key="Obs" value="id"/>
<operator activated="true" class="split_data" compatibility="8.2.001" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.6"/>
<parameter key="ratio" value="0.4"/>
<operator activated="true" class="linear_regression" compatibility="8.2.001" expanded="true" height="103" name="Linear Regression" width="90" x="648" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="648" y="238">
<list key="application_parameters"/>
<operator activated="true" class="performance_regression" compatibility="8.2.001" expanded="true" height="82" name="Performance" width="90" x="916" y="34">
<parameter key="squared_correlation" value="true"/>
<connect from_op="Retrieve death by wine1" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>

Best Answer

  • Options
    earmijoearmijo Member Posts: 271 Unicorn
    Solution Accepted

    By default, Rapidminer tries to do some feature selection. Some of the variables may be dropped. That's what's occuring to you. In "Feature Selection" choose "None". Then you'll get coefficients for all variables. 


    Screen Shot 2018-06-29 at 5.32.27 PM.png


  • Options
    mattmitchell73mattmitchell73 Member Posts: 2 Contributor I

    Perfect. Many thanks for that. Much appreciated!


Sign In or Register to comment.