The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Linear regression prediction don't match with the model

PlatyQPlatyQ Member Posts: 4 Learner III
edited December 2018 in Help

Hello

This time I have a question about linear regression operator.

There is my process: I want predict a value (AverageW) with 3 parameters (Layers, WFS, TS) and observe the model choose by the operator.

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<parameter key="random_seed" value="-1"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve TS50+80" width="90" x="45" y="34">
<parameter key="repository_entry" value="../data/TS50+80"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.2.000" expanded="true" height="82" name="Select Attributes AW" width="90" x="179" y="34">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Layers|TS|WFS|AverageW"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role AW" width="90" x="313" y="34">
<parameter key="attribute_name" value="AverageW"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.75"/>
<parameter key="ratio" value="0.25"/>
</enumeration>
<parameter key="sampling_type" value="stratified sampling"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="1"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="8.2.000" expanded="true" height="103" name="Linear Regression" width="90" x="648" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="648" y="238">
<list key="application_parameters"/>
</operator>
<connect from_op="Retrieve TS50+80" from_port="output" to_op="Select Attributes AW" to_port="example set input"/>
<connect from_op="Select Attributes AW" from_port="example set output" to_op="Set Role AW" to_port="example set input"/>
<connect from_op="Set Role AW" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<connect from_op="Apply Model" from_port="model" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

The problem is that I obtain a prediction that don't match with the result that I obtain with coefficients give by the model.

For example to coefficients:

coef_Layer = 0.150;

coef_TS = -0.045;

coef_WFS = 1.150;

intercept = 2.488 ;

And example Layer=2 ; TS= 50; WFS= 3
I compute Layer*coef_Layer + coef_TS*TS + coef_WFS*WFS + intercept = 3,988

but the model predict 3.968 to this example!

It is not a big difference but I need to understand if I forget a parameter "epsilon" or other.

I hope somebody can help me because I don't find a answer in documentation (And it is not the first time I have question about documentation)

My data are in the table below if there are problems with csv file:

(I remove not used column so select attributes is not useful)

Spoiler
Layers WFS TS AverageW
1 3 50,0 3,0
1 4 50,0 4,0
1 5 50,0 4,1
1 6 50 7,2
2 3 50,0 3,9
2 4 50,0 4,9
2 5 50,0 5,3
2 6 50,0 7,5
3 3 50 4,3
3 4 50,0 5,4
3 5 50,0 5,8
3 6 50,0 7,6
5 3 50,0 4,5
5 4 50 6,3
5 5 50,0 6,9
5 6 50,0 10,8
10 3 50,0 5,0
10 4 50,0 6,7
10 5 50 8,1
20 3 50,0 5,5
20 4 50,0 7,3
20 5 50,0 9,1
1 3 80,0 2,4
1 4 80,0 3,7
1 5 80,0 3,7
1 6 80,0 4,7
2 3 80,0 3,1
2 4 80,0 4,1
2 5 80,0 4,1
2 6 80,0 5,8
3 3 80,0 3,3
3 4 80,0 4,0
3 5 80,0 4,5
3 6 80,0 6,9
5 3 80,0 3,7
5 4 80,0 4,6
5 5 80,0 5,1
5 6 80,0 6,9
10 3 80,0 3,8
10 4 80,0 5,2
10 5 80,0 6,4

Thank you in advance

 

Tagged:

Best Answer

Answers

  • PlatyQPlatyQ Member Posts: 4 Learner III

    @lionelderkrikor

    thank, I never running the mouse over the table enough time to see precise numbers.

    Thank you!

Sign In or Register to comment.