Logistic Regression - Normalization does not change Attribute Weights
Hello,
I am new here and in general with statistics and data mining. Apologies if I am asking a really stupid question.
My question is about logistic regression and normalizing data. I have a data set with some columns skewed and have different scales. So I wanted to apply normalization (including centering, scaling and Box Cox transformation for skewness) prior to logistic regression. But instead I wanted to check to what extent normalization changes the results.
I see that normalization prior to logistic regression changes the coefficients however attribute weights are exactly same with and without normalization. Am I missing something here?
Attached you can find my design for the analysis. (Logistic Regression and Normalization added with default settings)
Best Answers
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
Try outputting the PRE port on the Normalization operator, that will tell you how it's normalizing the data.
2 -
earmijo Member Posts: 271 Unicorn
By default the operator Logistic Regression normalizes the data (but uses the word standardize instead of normalize). Uncheck the option 'standardize'. It does make a difference to the coefficients whether you normalize or not. Check the process below
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="246" y="187">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="multiply" compatibility="8.0.001" expanded="true" height="103" name="Multiply" width="90" x="447" y="187"/>
<operator activated="true" class="normalize" compatibility="8.0.001" expanded="true" height="103" name="Normalize" width="90" x="648" y="340"/>
<operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression (2)" width="90" x="849" y="340">
<parameter key="standardize" value="false"/>
</operator>
<operator activated="true" class="h2o:logistic_regression" compatibility="7.6.001" expanded="true" height="124" name="Logistic Regression" width="90" x="849" y="187">
<parameter key="standardize" value="false"/>
</operator>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Logistic Regression" to_port="training set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Logistic Regression (2)" to_port="training set"/>
<connect from_op="Logistic Regression (2)" from_port="model" to_port="result 2"/>
<connect from_op="Logistic Regression" from_port="model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>2
Answers
Thanks a lot, when I removed normalize box (which I do not need anymore as logistic regression has standardize in it) I could repeat the process with and without standardize option. Then I can see that attribute weights changed in each iteration.
Thanks a lot!
Cem
Hi, i wanted to have an explanation on logistic regression results from rapidminer. I wanted to know whether the p-values can be used to calculate odd ratios and how can it be interpreted.