🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance

CLICK HERE TO DOWNLOAD

Handling multiple nominal values in one category

e4glee4gle Member Posts: 12 Contributor II
edited July 2019 in Help
Hello,
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?

Thank You for Your help
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi,
    sorry, but I don't understand your question. Could you give an example for that? What do you understand under category?

    Greetings,
      Sebastian
  • e4glee4gle Member Posts: 12 Contributor II
    Well, maybe the usage of word "cattegory" was unfortunate.

    Let's say i have some files described by some atrributes, like "name" "category" "location" and "tags".

    I want to know if i can somehow handle this last attribute- "tags" to take more than one nominal value.

    For instance:
    name - article1, category- sport, location- New York, tags- knicks, basketball, celtics

    Is it clear enough now? Im a begginer in data mining and may not express myself clearly.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,749  RM Founder
    Hi,

    you have several options and which one is the best totally depends on what you are planning to do with the data:
    • In general, you could use the operators "Split" and "Merge" to handle those multiple nominal values for one attribute,
    • Sometimes is might be better to handle this attribute with value type "text" and use the text processing operators, e.g. in order to determine how often certain tags are used
    • In some cases, you might simply want to keep the tag collection as it is (maybe sort it) in order to calculate similarities etc. (although even in that case I would probably go for a text processing approach)
    • ...
    Which one is the best option depends, but in general you can handle this setting with "Split" and "Merge" and define a separating character like '#' or something else which does not occur in your tags.

    Hope that helps at least a bit. Cheers,
    Ingo
  • e4glee4gle Member Posts: 12 Contributor II
    And is there a classification method that would handle multiple values of this "tags" attribute? The problem is not in splitting the values of this attribute, but in finding a way to handle all of it's values.
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,749  RM Founder
    Hi,

    well, what's the difference between a classification scheme which is able to handle this itself and preprocessing the data so that all classification schemes can handle it? Right, with the latter - the more modular option - you have much more option to choose from. So I would always go for a well-thought preprocessing combined with a powerful and already existing classification method.

    Cheers,
    Ingo
Sign In or Register to comment.