"Concerning Feature Selection Implementation"

AnneG · November 2009

I was wondering how the feature selection is implemented in RapidMiner. In the documentation of the feature selection operator I found the step "Evaluate the attribute sets and select only the best k." (in the forward selection description). Does that mean for each attribute set a classification is performed and depending on the performance the best attribute sets are chosen? Or is there another additional criterion used before, such as information gain?
And does the Feature Selection Operator really remove all redundant attributes? I found a hint in the Javadoc, however, I would like to know how this is done, just to be sure.

land · November 2009

Hi Anne,
both methods are available in RapidMiner. The FeatureSelection Operator represents what is called the "wrapper" approach, if I remember correctly. As you said, for each combination at least one learning step is involved, since you normally will use a cross-validation for estimating the performance. The filter approach uses heuristics like information gain to select a number of attributes which seem to be best. But this must not match the learners capabilities. If you have the computational power, I would recommend the wrapper approach.

And no, as far as I know, there's no extra code removing the redundant attributes deterministically. They will probably be removed by the forward selection, since the information is already know. At least this holds in theory, unless you have a learner like Naive Bayes, which might profit from the double occurrence of the same information, because it reweights the attributes in some way. So you never really know, what's redundant

But if you regard highly correlated attributes as redundant, you might start with a RemoveUselessAttributes and a RemoveCorrelatedAttributes operator. This will at least lower the computational costs for a following featureSelection, since the number of attributes is reduced...

Greetings,
Sebastian

AnneG · November 2009

Thanx a lot for your quick answer. I am quite new to the topic of feature selection and in recent papers they write that correlation does not necessarily mean that attributes are redundant. I will explore the operators you named and see if this might help me. Once again, thank you.
Kind regards,
Anne

land · November 2009

Hi,
that's of course correct: Correlation is no causality. And vice versa highly correlated features does not have to be redundant. But if two attributes are only linear combinations of each other, some learner like the LinearRegression will not use both.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Concerning Feature Selection Implementation"

Answers