Linear Regression Coefficients problem
Hi, I'm relatively new to RapidMiner and have come across something that I do not understand in a linear regression model.
The issue is on the output - the model has 4 predictor variables (Population, Births, Wine Consumption, Liquor Consumption) and the output variable of Cirrhosis_DeathRate. The Cirrhosis_DeathRate is selected as a label in the Select Attributes operator. However, on running Rapid Miner only produces coefficients for Births, Wine Consumption and Liquor Consumption but not for Population.
I've run the same analysis in the data analysis pack in Excel and whilst the p-value for Population is not significant its not worse than liquor consumption which is showing in the RM output. Subsequently I'm at a bit of a loss as to why the population coefficient is not being calcuated. In addition population ~ Cirrhosis DeathRate is showing a relatively strong (0.7569) showing in the correlation matrix.
Any suggestions would be gratefully accpeted.
Thanks
Matt
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.2.001" expanded="true" height="68" name="Retrieve death by wine1" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/processes/death by wine1"/>
</operator>
<operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="246" y="34">
<parameter key="attribute_name" value="Cirrhosis_DeathRate"/>
<list key="set_additional_roles">
<parameter key="Cirrhosis_DeathRate" value="label"/>
<parameter key="Obs" value="id"/>
</list>
</operator>
<operator activated="true" class="split_data" compatibility="8.2.001" expanded="true" height="103" name="Split Data" width="90" x="447" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.6"/>
<parameter key="ratio" value="0.4"/>
</enumeration>
</operator>
<operator activated="true" class="linear_regression" compatibility="8.2.001" expanded="true" height="103" name="Linear Regression" width="90" x="648" y="34"/>
<operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="648" y="238">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="8.2.001" expanded="true" height="82" name="Performance" width="90" x="916" y="34">
<parameter key="squared_correlation" value="true"/>
</operator>
<connect from_op="Retrieve death by wine1" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Best Answer
-
earmijo Member Posts: 271 Unicorn
By default, Rapidminer tries to do some feature selection. Some of the variables may be dropped. That's what's occuring to you. In "Feature Selection" choose "None". Then you'll get coefficients for all variables.
1
Answers
Perfect. Many thanks for that. Much appreciated!