Attributes with too many possible values

Sarah01 · February 2020

I am a beginner and I am not quite familiar with all the operators.
I have a dataset where there is an attribute x (the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in.
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attribute x where x's absolute count is greater than say 50. Is that possible? (or get records with only the top best y absolute count)

MartinLiebig · February 2020

Hi @Sarah01 ,

the operator toolbox extension as an operator Replace Rare values which does exactly this.

Best,

Martin

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Attributes with too many possible values

Best Answer