"Genetic algorithm for feature selection"
Dear community
I want to use The operator Genetic algorithm optimize selection (Evolutionary) for feature selection in a data set with numeric attributes. I would like to know if it is possible how exactly this algorithm works theoretically. More particulary, features are selected independently from the accuracy of the classifier or a subset of features is selected so as not to degrade the performance of the classifier? I think is the second but i want to be sure.
The xml of my proccess is atttached.
Best Regards
Konstantinos
Best Answers
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Hi Konstantinos,
I think my PhD might actually be a good point to read up on the feature selection part of RapidMiner. The middle part covers both the single- as well as multi-objective evolutionary optimization approach. I typically recommend to go with a multi-objective approach where you try to optimize for the prediction accuracy on one hand and try to minimize the number of features on the other hand.
The link to my PhD is here: http://www-ai.cs.uni-dortmund.de/PublicPublicationFiles/mierswa_2008a.pdf
If the Optimize Selection operator makes the selection dependent on a generic feature relevance scheme or on a specific learner depends on how you build the process. If you put a cross-validation with a certain learner, let's say Naive Bayes, inside of the Optimize Selection operator, then the feature selection is optimized for the accuracy of this particular learner. This process in the Sample repository delivered with RapidMiner shows how this works in general:
//Samples/processes/04_Attributes/10_EvolutionaryFeatureSelection
Hope that helps,
Ingo
2
Answers
Ingo
i have just started to dive into your thesis and wanted to congratutlate you for the evidently very thurough work. Eventhough my background in statistics isn´t that good (i just learned what it needed to complete my phd in demograpny/economics) I am often disappointed by the way many machine learning experts fail to link the alrogithms with mathematical and statistical fundamentals. That doen´t narrow the the bridge between ML and traditional stats/econometrics. And of course the cherry on the pie is your statement about meaningless statistical facts on page 7.. Tks for sharing this with us!