The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
simple operator or method for combining nominal categories?
Telcontar120
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
in Help
Is there some easy way to combine nominal categories together based on frequency? For example, if I have a nominal attribute with 10 different possible values, but I only want to keep the top 5 (by frequency) and then put the rest into an "Other" category.
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind. Is there some operator for this that I am forgetting? Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values.
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind. Is there some operator for this that I am forgetting? Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values.
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,
Replace Rare Values in Operator Toolbox is your friend
BR,
Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany2
Answers
If I understood your problem well, I would do something like this:
- Generate a new field containing the frequency, alongside your category.
- Generate a second field doing some discretization on the frequency, not the params.
- Generate a third field with some code: if(frequency > 50;[Category];"Other").
- Use the third field with the "combined" target.
But now I'm wondering if there is anything I missed about the whole question, as my solution sounds too simplistic to me at least.All the best, sensei!
Rodrigo.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dortmund, Germany
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
On a slightly humorous note: Yes, I have to think before reacting when someone says I am "as rare as a Unicorn", because my first instinct usually tells me that I am "as weird as a Unicorn".
Dortmund, Germany