Linear regression prediction don't match with the model
Hello
This time I have a question about linear regression operator.
There is my process: I want predict a value (AverageW) with 3 parameters (Layers, WFS, TS) and observe the model choose by the operator.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<parameter key="random_seed" value="-1"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve TS50+80" width="90" x="45" y="34">
<parameter key="repository_entry" value="../data/TS50+80"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes AW" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Layers|TS|WFS|AverageW"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role AW" width="90" x="313" y="34">
<parameter key="attribute_name" value="AverageW"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.75"/>
<parameter key="ratio" value="0.25"/>
</enumeration>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="8.2.000" expanded="true" height="103" name="Linear Regression" width="90" x="648" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="648" y="238">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve TS50+80" from_port="output" to_op="Select Attributes AW" to_port="example set input"/>
<connect from_op="Select Attributes AW" from_port="example set output" to_op="Set Role AW" to_port="example set input"/>
<connect from_op="Set Role AW" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<connect from_op="Apply Model" from_port="model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
The problem is that I obtain a prediction that don't match with the result that I obtain with coefficients give by the model.
For example to coefficients:
coef_Layer = 0.150;
coef_TS = -0.045;
coef_WFS = 1.150;
intercept = 2.488 ;
And example Layer=2 ; TS= 50; WFS= 3
I compute Layer*coef_Layer + coef_TS*TS + coef_WFS*WFS + intercept = 3,988
but the model predict 3.968 to this example!
It is not a big difference but I need to understand if I forget a parameter "epsilon" or other.
I hope somebody can help me because I don't find a answer in documentation (And it is not the first time I have question about documentation)
My data are in the table below if there are problems with csv file:
(I remove not used column so select attributes is not useful)
Layers | WFS | TS | AverageW |
1 | 3 | 50,0 | 3,0 |
1 | 4 | 50,0 | 4,0 |
1 | 5 | 50,0 | 4,1 |
1 | 6 | 50 | 7,2 |
2 | 3 | 50,0 | 3,9 |
2 | 4 | 50,0 | 4,9 |
2 | 5 | 50,0 | 5,3 |
2 | 6 | 50,0 | 7,5 |
3 | 3 | 50 | 4,3 |
3 | 4 | 50,0 | 5,4 |
3 | 5 | 50,0 | 5,8 |
3 | 6 | 50,0 | 7,6 |
5 | 3 | 50,0 | 4,5 |
5 | 4 | 50 | 6,3 |
5 | 5 | 50,0 | 6,9 |
5 | 6 | 50,0 | 10,8 |
10 | 3 | 50,0 | 5,0 |
10 | 4 | 50,0 | 6,7 |
10 | 5 | 50 | 8,1 |
20 | 3 | 50,0 | 5,5 |
20 | 4 | 50,0 | 7,3 |
20 | 5 | 50,0 | 9,1 |
1 | 3 | 80,0 | 2,4 |
1 | 4 | 80,0 | 3,7 |
1 | 5 | 80,0 | 3,7 |
1 | 6 | 80,0 | 4,7 |
2 | 3 | 80,0 | 3,1 |
2 | 4 | 80,0 | 4,1 |
2 | 5 | 80,0 | 4,1 |
2 | 6 | 80,0 | 5,8 |
3 | 3 | 80,0 | 3,3 |
3 | 4 | 80,0 | 4,0 |
3 | 5 | 80,0 | 4,5 |
3 | 6 | 80,0 | 6,9 |
5 | 3 | 80,0 | 3,7 |
5 | 4 | 80,0 | 4,6 |
5 | 5 | 80,0 | 5,1 |
5 | 6 | 80,0 | 6,9 |
10 | 3 | 80,0 | 3,8 |
10 | 4 | 80,0 | 5,2 |
10 | 5 | 80,0 | 6,4 |
Thank you in advance
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi @PlatyQ,
In reality the displayed coefficients are rounded :
So if you perform the calculus with the precise numbers, you obtain 3.968 :
Regards,
Lionel
1
Answers
@lionelderkrikor
thank, I never running the mouse over the table enough time to see precise numbers.
Thank you!