The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Attributes with too many possible values
I am a beginner and I am not quite familiar with all the operators.
I have a dataset where there is an attribute x (the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in.
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attribute x where x's absolute count is greater than say 50. Is that possible? (or get records with only the top best y absolute count)
I have a dataset where there is an attribute x (the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in.
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attribute x where x's absolute count is greater than say 50. Is that possible? (or get records with only the top best y absolute count)
1
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi @Sarah01 ,the operator toolbox extension as an operator Replace Rare values which does exactly this.Best,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany3