Options

Attributes with too many possible values

Sarah01Sarah01 Member Posts: 1 Newbie
edited February 2020 in Help
I am a beginner and I am not quite familiar with all the operators. 
I have a dataset where there is an attribute x (the attribute that I want to predict using some classification technique) that has over a 1000 possible values, which is just too much. The top best ten values with the highest absolute count are the ones I am interested in. 
So, my question is how can I get a subset of the data somehow that I only have the records that have values of attribute x where x's absolute count is greater than say 50. Is that possible? (or get records with only the top best y absolute count) 

Best Answer

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Solution Accepted
    the operator toolbox extension as an operator Replace Rare values which does exactly this.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.