RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Linear regression prediction don't match with the model

PlatyQPlatyQ Member Posts: 4 Contributor I
edited December 2018 in Help

Hello

This time I have a question about linear regression operator.

There is my process: I want predict a value (AverageW) with 3 parameters (Layers, WFS, TS) and observe the model choose by the operator.

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<parameter key="random_seed" value="-1"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve TS50+80" width="90" x="45" y="34">
<parameter key="repository_entry" value="../data/TS50+80"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes AW" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Layers|TS|WFS|AverageW"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role AW" width="90" x="313" y="34">
<parameter key="attribute_name" value="AverageW"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.75"/>
<parameter key="ratio" value="0.25"/>
</enumeration>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="8.2.000" expanded="true" height="103" name="Linear Regression" width="90" x="648" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="648" y="238">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve TS50+80" from_port="output" to_op="Select Attributes AW" to_port="example set input"/>
<connect from_op="Select Attributes AW" from_port="example set output" to_op="Set Role AW" to_port="example set input"/>
<connect from_op="Set Role AW" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<connect from_op="Apply Model" from_port="model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

The problem is that I obtain a prediction that don't match with the result that I obtain with coefficients give by the model.

For example to coefficients:

coef_Layer = 0.150;

coef_TS = -0.045;

coef_WFS = 1.150;

intercept = 2.488 ;

And example Layer=2 ; TS= 50; WFS= 3
I compute Layer*coef_Layer + coef_TS*TS + coef_WFS*WFS + intercept = 3,988

but the model predict 3.968 to this example!

It is not a big difference but I need to understand if I forget a parameter "epsilon" or other.

I hope somebody can help me because I don't find a answer in documentation (And it is not the first time I have question about documentation)

My data are in the table below if there are problems with csv file:

(I remove not used column so select attributes is not useful)

Spoiler
Layers WFS TS AverageW
1 3 50,0 3,0
1 4 50,0 4,0
1 5 50,0 4,1
1 6 50 7,2
2 3 50,0 3,9
2 4 50,0 4,9
2 5 50,0 5,3
2 6 50,0 7,5
3 3 50 4,3
3 4 50,0 5,4
3 5 50,0 5,8
3 6 50,0 7,6
5 3 50,0 4,5
5 4 50 6,3
5 5 50,0 6,9
5 6 50,0 10,8
10 3 50,0 5,0
10 4 50,0 6,7
10 5 50 8,1
20 3 50,0 5,5
20 4 50,0 7,3
20 5 50,0 9,1
1 3 80,0 2,4
1 4 80,0 3,7
1 5 80,0 3,7
1 6 80,0 4,7
2 3 80,0 3,1
2 4 80,0 4,1
2 5 80,0 4,1
2 6 80,0 5,8
3 3 80,0 3,3
3 4 80,0 4,0
3 5 80,0 4,5
3 6 80,0 6,9
5 3 80,0 3,7
5 4 80,0 4,6
5 5 80,0 5,1
5 6 80,0 6,9
10 3 80,0 3,8
10 4 80,0 5,2
10 5 80,0 6,4

Thank you in advance

 

Tagged:

Best Answer

Answers

  • PlatyQPlatyQ Member Posts: 4 Contributor I

    @lionelderkrikor

    thank, I never running the mouse over the table enough time to see precise numbers.

    Thank you!

    sgenzer
Sign In or Register to comment.