RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Attributes with too many possible values
I have a dataset where there is an attribute x (the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in.
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attribute x where x's absolute count is greater than say 50. Is that possible? (or get records with only the top best y absolute count)