The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
feature selection
ramzanzadeh72
Member Posts: 14 Learner III
hi
i have data set with 46 attribute and i want to select feature set that have :
1) maximum relevance to class attribute
2) minimum redundancy
3) minimum number of feature
4) best performance (e.g accuracy + f_measure + AUC)
What should I do for this?
0
Answers
You may want to take a look at this tutorial here, written by @Thomas_Ott.
It gives a good introduction to feature selection in RM, with a focus on the two standard methods: forward selection and backward elimination.
Hi @ramzanzadeh72,
In addition to the tutorial of Thomas, you can take a look at this thread.
Regards,
Lionel
Dear FBT
Thanks for your answer
i use mRMR method but this method just consider Relevance-Redundancy or Relevance/Redundancy, i.e maybe feature set have maximum redundancy but selected by method because have best Relevance-Redundancy or Relevance/Rednudancy, i should select feature set which have most relevance and minimum Redundancy and best performance with mRMR algorithm and beside minimum number of features
Is the order of your requirements in your original post sorted by importance? I.e. do you care more about model training time (i.e. minimum number of feartures) or accuracy? I would have thought that 46 attributes are actually not that much in terms of compute time, but this of course depends on the wider context.
Having said that, the both responses you received will point you in the right direction. Both operators (Backward Elimination and Forward Selection) basically allow you to define your maximum number of attributes. Hence, running any of the two within a Cross Validation will satisfy your requirements 3 and 4 (although you have to manually decide, based on the results, what a sensible number for minimum features is.)
In terms of your requirements 1 and 2, I would probably build in a Log operator to see in detail the effect of a specific feature and parameter selection.
Once you get the basic understanding of how your features affect your model, you'll than need to play around a bit and tweak parameters to try to find the optimum for your circumstances.
actually im working on twitter data set and 46 attribute in huge data set like twitter mybe consume more time, so minimum number of feature set is important here; also not just accuracy, but accuracy+f_measure+AUC is important here, because im working in detection bot account in twitter,so performance important here too; in order to select minimum nember of feature for detection we should consider relevance+redundancy+ performance metric, so in this area what can i do?