Creating model to categorize data

MarlaBotMarlaBot The Friendly RapidMiner Dog BotAdministrator, Moderator, Employee, Member Posts: 21  Community Manager
edited February 8 in Help
A RapidMiner user wants to know the answer to this question: "I have a list of about 120 values that serve as categories. I have to be able to predict what category a value belongs to based on it's other attribute. The values that I am training on are associated with one of these items. I need to create a model that will categorize the combination of values from other columns and predict what category it belongs in. I have tried to use a decision tree and it does not seem to be doing very well. There are too many categories and it keeps making poor predictions. Any suggestions? Thank you."

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 1,979  RM Data Scientist
    Hi,
    is there any way to use a taxonomy between the 120 classes?

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,088   Unicorn
    There are probably too many categories and not enough cases in many of them for the algorithm to detect patterns all at once.  You have a couple of options:
    • Create groupings of these categories (this is the taxonomy that Martin mentioned above) so you end up with a much smaller number of super-categories and try to build a model to predict those.  Ideally you would have pretty robust counts in each of the super-categories and not too many of them (e.g., 12 would be much better than 120!).
    • Find the dominant categories (once again by count) and create a series of "one vs all other" models.  This would require you to build multiple models but will give you more control over the specific categories selected.
    • Or you could do a hybrid of the two methods above.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    sgenzer
Sign In or Register to comment.