simple operator or method for combining nominal categories?


in Help
Is there some easy way to combine nominal categories together based on frequency? For example, if I have a nominal attribute with 10 different possible values, but I only want to keep the top 5 (by frequency) and then put the rest into an "Other" category.
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind. Is there some operator for this that I am forgetting? Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values.
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!
This is obviously possible using some manual recoding logic, but I feel like there is a better way that is slipping my mind. Is there some operator for this that I am forgetting? Discretize operators aren't ideal because they only work on numerical attributes so that would require recoding and loses the underlying nominal values.
I have to do this with a large number of attributes/categories so I am looking for a solution that doesn't require manual recoding of the categories.
Thanks in advance!
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,453
RM Data Scientist
Answers
If I understood your problem well, I would do something like this:
- Generate a new field containing the frequency, alongside your category.
- Generate a second field doing some discretization on the frequency, not the params.
- Generate a third field with some code: if(frequency > 50;[Category];"Other").
- Use the third field with the "combined" target.
But now I'm wondering if there is anything I missed about the whole question, as my solution sounds too simplistic to me at least.All the best, sensei!
Rodrigo.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dortmund, Germany
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
On a slightly humorous note: Yes, I have to think before reacting when someone says I am "as rare as a Unicorn", because my first instinct usually tells me that I am "as weird as a Unicorn".
Dortmund, Germany