The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
question marks in linear regression output
I ran a linear regression model with 18 independent variables and feature selection turned off. For some of the independent variables there were question marks for the standard error of the estimate, and therefore for the t-statistic and p-value for the coefficient. I ran the mode again with feature selection turned on and got the same question marks. What do these question marks mean? Thay cannot have anything to do with missing values as the regression would not have run to completion in that case. I am baffled about what these "?" symbols might mean. Help.....
Tagged:
0
Best Answers
-
varunm1 Member Posts: 1,207 UnicornHello @sgenzer and @AD2019
I tried to look at H2O documentation on linear regression, unfortunately, I found none. For GLM to provide p-values, there is a mandatory parameter selection that H2O recommends to get values without "?" (Unknown)
1. You should uncheck the " Use Regularization" option.
2. You should select "Add intercept"
3. You should select " compute p-values"
4. You should select " remove collinear columns"
If these are set then you will get the p values, std.error, etc without question marks. You will get question marks in this case only when the coefficient is 0.
I will see if I can find any information on linear regression.Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
8
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
So the simple answer is that ? marks are used in RapidMiner when values are missing. The better question is why are they missing...my educated guess here (pls correct me @varunm1 @mschmitz if my stats are wrong here) is that there can be no std coefficient or tolerance for an intercept of a LinReg model as it's a computed value. All of your actual data (the other attributes) have std coefficients which make sense. But my stats are a wee bit rusty so I look to these other smart folks to correct me.
Scott
Ah I understand. Good point. It's been a while since I've played with all of this (we normally use the GLM modeler instead of LinReg as it is far more versatile and robust). Let me investigate.
Scott