The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Concerning Feature Selection Implementation"
I was wondering how the feature selection is implemented in RapidMiner. In the documentation of the feature selection operator I found the step "Evaluate the attribute sets and select only the best k." (in the forward selection description). Does that mean for each attribute set a classification is performed and depending on the performance the best attribute sets are chosen? Or is there another additional criterion used before, such as information gain?
And does the Feature Selection Operator really remove all redundant attributes? I found a hint in the Javadoc, however, I would like to know how this is done, just to be sure.
And does the Feature Selection Operator really remove all redundant attributes? I found a hint in the Javadoc, however, I would like to know how this is done, just to be sure.
Tagged:
0
Answers
both methods are available in RapidMiner. The FeatureSelection Operator represents what is called the "wrapper" approach, if I remember correctly. As you said, for each combination at least one learning step is involved, since you normally will use a cross-validation for estimating the performance. The filter approach uses heuristics like information gain to select a number of attributes which seem to be best. But this must not match the learners capabilities. If you have the computational power, I would recommend the wrapper approach.
And no, as far as I know, there's no extra code removing the redundant attributes deterministically. They will probably be removed by the forward selection, since the information is already know. At least this holds in theory, unless you have a learner like Naive Bayes, which might profit from the double occurrence of the same information, because it reweights the attributes in some way. So you never really know, what's redundant
But if you regard highly correlated attributes as redundant, you might start with a RemoveUselessAttributes and a RemoveCorrelatedAttributes operator. This will at least lower the computational costs for a following featureSelection, since the number of attributes is reduced...
Greetings,
Sebastian
Kind regards,
Anne
that's of course correct: Correlation is no causality. And vice versa highly correlated features does not have to be redundant. But if two attributes are only linear combinations of each other, some learner like the LinearRegression will not use both.
Greetings,
Sebastian