The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Linear Regression Coefficients problem

mattmitchell73mattmitchell73 Member Posts: 2 Learner III
edited December 2018 in Help

Hi, I'm relatively new to RapidMiner and have come across something that I do not understand in a linear regression model.

 

The issue is on the output - the model has 4 predictor variables (Population, Births, Wine Consumption, Liquor Consumption) and the output variable of Cirrhosis_DeathRate. The Cirrhosis_DeathRate is selected as a label in the Select Attributes operator. However, on running Rapid Miner only produces coefficients for Births, Wine Consumption and Liquor Consumption but not for Population.

 

I've run the same analysis in the data analysis pack in Excel and whilst the p-value for Population is not significant its not worse than liquor consumption which is showing in the RM output. Subsequently I'm at a bit of a loss as to why the population coefficient is not being calcuated. In addition population ~ Cirrhosis DeathRate is showing a relatively strong (0.7569) showing in the correlation matrix.

 

Any suggestions would be gratefully accpeted.

 

Thanks


Matt

 

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve death by wine1" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/processes/death by wine1"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="Cirrhosis_DeathRate"/>
<list key="set_additional_roles">
<parameter key="Cirrhosis_DeathRate" value="label"/>
<parameter key="Obs" value="id"/>
</list>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.001" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.6"/>
<parameter key="ratio" value="0.4"/>
</enumeration>
</operator>
<operator activated="true" class="linear_regression" compatibility="8.2.001" expanded="true" height="103" name="Linear Regression" width="90" x="648" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="648" y="238">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="8.2.001" expanded="true" height="82" name="Performance" width="90" x="916" y="34">
<parameter key="squared_correlation" value="true"/>
</operator>
<connect from_op="Retrieve death by wine1" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>

Best Answer

  • earmijoearmijo Member Posts: 271 Unicorn
    Solution Accepted

    By default, Rapidminer tries to do some feature selection. Some of the variables may be dropped. That's what's occuring to you. In "Feature Selection" choose "None". Then you'll get coefficients for all variables. 

     

    Screen Shot 2018-06-29 at 5.32.27 PM.png

Answers

Sign In or Register to comment.