RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

Attributes with too many possible values

Sarah01Sarah01 Member Posts: 1 Newbie
edited February 24 in Help
I am a beginner and I am not quite familiar with all the operators. 
I have a dataset where there is an attribute x (the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in. 
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attribute x where x's absolute count is greater than say 50. Is that possible? (or get records with only the top best y absolute count) 
Jasmine_

Best Answer

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,509  RM Data Scientist
    Solution Accepted
    the operator toolbox extension as an operator Replace Rare values which does exactly this.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzerSarah01Jasmine_
Sign In or Register to comment.