Handling multiple nominal values in one category

e4gle · December 2010

Hello,
can I handle somehow (for instance - with a decission tree model) data with multiple nominal values (spearated- let's say- by commas) under one category? Like in category name: tags, values: rapid, miner, datamining... etc?

Thank You for Your help

land · December 2010

Hi,
sorry, but I don't understand your question. Could you give an example for that? What do you understand under category?

Greetings,
Sebastian

e4gle · December 2010

Well, maybe the usage of word "cattegory" was unfortunate.

Let's say i have some files described by some atrributes, like "name" "category" "location" and "tags".

I want to know if i can somehow handle this last attribute- "tags" to take more than one nominal value.

For instance:
name - article1, category- sport, location- New York, tags- knicks, basketball, celtics

Is it clear enough now? Im a begginer in data mining and may not express myself clearly.

IngoRM · December 2010

Hi,

you have several options and which one is the best totally depends on what you are planning to do with the data:

In general, you could use the operators "Split" and "Merge" to handle those multiple nominal values for one attribute,
Sometimes is might be better to handle this attribute with value type "text" and use the text processing operators, e.g. in order to determine how often certain tags are used
In some cases, you might simply want to keep the tag collection as it is (maybe sort it) in order to calculate similarities etc. (although even in that case I would probably go for a text processing approach)
...

Which one is the best option depends, but in general you can handle this setting with "Split" and "Merge" and define a separating character like '#' or something else which does not occur in your tags.

Hope that helps at least a bit. Cheers,
Ingo

e4gle · January 2011

And is there a classification method that would handle multiple values of this "tags" attribute? The problem is not in splitting the values of this attribute, but in finding a way to handle all of it's values.

IngoRM · January 2011

Hi,

well, what's the difference between a classification scheme which is able to handle this itself and preprocessing the data so that all classification schemes can handle it? Right, with the latter - the more modular option - you have much more option to choose from. So I would always go for a well-thought preprocessing combined with a powerful and already existing classification method.

Cheers,
Ingo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Handling multiple nominal values in one category

Answers