The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Running out of features during feature selection
Hi,
I am stumbling upon the same error again and again while using FEATURE SELECTION operator with GLM learner inside. It starts with 56 features and pretty fast literally runs out of features each time I am trying to run the process.
These are GLM settings:
These are feature selection settings:
Please advise. I can also provide any additional information if needed.
Thanks!
I am stumbling upon the same error again and again while using FEATURE SELECTION operator with GLM learner inside. It starts with 56 features and pretty fast literally runs out of features each time I am trying to run the process.
These are GLM settings:
These are feature selection settings:
Please advise. I can also provide any additional information if needed.
Thanks!
Tagged:
1
Best Answer
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderAre some of your features constant or highly correlated? The GLM learner removes those unfortunately. The error messaging coming from H2O is a bit weird because actually it WAS presented with features (the FS makes sure that there is always at least one input column) BEFORE it itself removed those :-) If the collinear features are the problem, you can uncheck the setting in the GLM parameters. If you have constants in your data it is best to remove them already before you start the feature selection to void the problem.Hope that helps,
Ingo6
Answers
I think I shouldn't have constant features as those were removed beforehand while cleaning the data. As for collinearity, I need to re-check this once again; anyway I will also try to uncheck the corresponding option as well.
Vladimir
http://whatthefraud.wtf
I am getting back to this thread as I have faced the problem again.
Previously I have disabled removing collinear columns by nested GLM and this helped, so it helped and the process worked OK.
This time I have run into it again and found out that there was actually one constant column in my data after filtering the smaller subset for feature selection.
Hence my question, can't feature selection operator just ignore such columns, as it can happen eventually as in my case, but the error message itself seems too confusing actually?
Thanks!
Vladimir
http://whatthefraud.wtf
Yeah, the error message is bad indeed. Unfortunately there is nothing we can do about this because we do not "own" that particular part of the code... :-( I am personally a bit torn on the constant handling here though. If we just keep it in, we avoid the error in this particular case but it kind of bugs me that a feature selection, which is supposed to get rid of the weak features, forces to keep constant column in. It kind of defeats the purpose.... also because it is really undocumented / special behavior of the H2O learner here we would need to work around...
So I actually would prefer to keep it the way it is but that would require you to use a Remove Useless Attributes operator before. Last option would be to remove all constant features automatically BEFORE we start the feature selection (and throw an error if that removes all columns), but that makes this a bit implicit which is not great either...
Any opinions on this?
https://community.rapidminer.com/discussion/55910/forward-selection-error-thrown#latest
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing