The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Linear regression model : bug ? in squared error value
lionelderkrikor
RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi,
It seems there is a bug on the squared error value.
with a classic validation (no X-validation), squared value = xx +/- yy
the dataset is in attached file.
Here the process :
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Predictive_Analytics_and_Data_Mining\Dec 15 2014\05_Regression_5.1_bos_housing.csv"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.0.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="MEDV"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="380" y="187"/>
<operator activated="true" class="linear_regression" compatibility="8.0.001" expanded="true" height="103" name="Linear Regression" width="90" x="447" y="34">
<parameter key="feature_selection" value="none"/>
<parameter key="eliminate_colinear_features" value="false"/>
</operator>
<operator activated="true" class="apply_model" compatibility="8.0.001" expanded="true" height="82" name="Apply Model" width="90" x="581" y="238">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="8.0.001" expanded="true" height="82" name="Performance" width="90" x="715" y="289">
<parameter key="root_relative_squared_error" value="true"/>
<parameter key="squared_error" value="true"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
</operator>
<operator activated="true" class="write_excel" compatibility="8.0.001" expanded="true" height="82" name="Write Excel" width="90" x="916" y="340">
<parameter key="excel_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Linear_regression_MSE.xlsx"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<connect from_op="Performance" from_port="performance" to_port="result 3"/>
<connect from_op="Performance" from_port="example set" to_op="Write Excel" to_port="input"/>
<connect from_op="Write Excel" from_port="through" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
Thanks you,
regards,
Lionel
Tagged:
0
Comments
Good catch! There should be no standard deviation for the performance metrics if there is no X-validation.
We are fixing it now. Thanks for the feedback!
reported by @yyhuang to dev team.
SG