Options

# [SOLVED] Transform integer in nominal?

Hi all,

i got a quick question. Sorry if this is a dumb one:

I'm trying to build a SVM classification process and I'm using the Import wizard for excel files. The labels for my examples are '1' and '2'.

So the wizard recognizes them correctly as integers.

Now I'm wondering:

Do I get into any trouble later on with SVMs if I force the wizard to transform the labels '1' and '2' into binominal ones?

Will the SVM work correctly?

Thanks a lot in advance,

Sasch.

i got a quick question. Sorry if this is a dumb one:

I'm trying to build a SVM classification process and I'm using the Import wizard for excel files. The labels for my examples are '1' and '2'.

So the wizard recognizes them correctly as integers.

Now I'm wondering:

Do I get into any trouble later on with SVMs if I force the wizard to transform the labels '1' and '2' into binominal ones?

Will the SVM work correctly?

Thanks a lot in advance,

Sasch.

Tagged:

0

## Answers

1,869UnicornHappy Mining!

~Marius

23Contributor IIthanks a lot for your helpful answer

I assume this will also work for labels '1', '2', '3' and '4' when I transform them in polynominal ones.

Thanks again,

Sasch.

1,869Unicorn23Contributor III know that the regular SVMs are designed for binary tasks only.

But I'm using the libSVM which supports multiclass learning (?). Can't it handle my four classes with the labels '1', '2', '3' and '4' (transformed into polynominal)?

Do I have to rename my labels in real nominals like "class_one", "class_two" etc. in the case mentioned above?

Or should I just let them be recognized as integers by the wizard? Won't it affect the SVM when the labels are integers?

Sorry if I don't get it...

1,869UnicornNevertheless it is important that you have a nominal label, so do NOT let the wizard create integer typed values. Using the strings '1', '2' etc. is fine, however - RapidMiner does not care about the content of strings.

Best regards,

Marius

23Contributor IIthanks again for taking the time answering me.

Just for the record:

When using SVM:

- for binary tasks: you can transform your real number or integer labels like 1 & 2 into nominal

- for multiclass learning: you can transform your real number or integer labels like 1, 2, 3, 4 etc. into polynominal

=> but NEVER use integers as labels when using SVMs. Always use nominal labels.

Although it seems a bit weird to me as I always thought when I click "label" in the attribute list RM knows what to do with it and cares about the label as a 'label' and not as a feature which is taken into account for building a classification model.

So what I'm trying to say is : why has the SVM problems with integer or real labels when the labels are just marker for the classes?

537MavenYou can use "Generate Attributes" @ att = str(att) to convert your numerical attribute to a nominal attribute.

You are suggesting that this conversion could have been part of the SVN operator.

Maybe, but for me its fine the way it is now.

When it complains it can only handle a nominal class, there is most likely an error in my setup somewhere.

So the fact that it throws an error is informative.

Best regards,

Wessel

23Contributor IIthanks for your suggestion.

I'm fine with all the answers here right now and I really appreciate your and Marius' help

But still (now I'm very interested):

Why has the SVM problems with integer or real labels when the labels are just marker for the classes?

Why do they have to be nominal (or binominal or polynominal)?

Sorry if this goes way too deep into SVM algorithm understanding or its integration into RM...

1,869Unicornthat has nothing to do with the SVM algorithm, but with the general problem classes in Machine Learning. There are two main problems:

- *Classification Problems* denominate the case where you have a set of categories, and you want to create a model which decides into which category a new data point/example falls.

- *Regression Problems* include anything where your target variable is a continuous value, e.g. an integer or a real value. Again you learn a prediction model which estimates the numeric target value for new data points.

RapidMiner decides which problem you have by examining the data type of the label - if it's numeric, it assumes a regression task, if it's nominal, it assumes a classification task.

Since the SVM is a pure classification algorithm(*), RapidMiner throws an error if you try to apply it on a data set with a numeric label.

(*) There is a special implementation of the SVM which can also solve regression tasks, but I think that should be out of the scope of this thread.

1,869Unicorn~Marius

23Contributor IIThank you both so much for your help !!!

It has to be said that the support, suggestions, ideas & explantions in this board are really awesome.

You guys do a great job!

Best regards,

Sasch