[SOLVED] Transform integer in nominal?

SaschSasch Member Posts: 23  Maven
edited July 8 in Help
Hi all,
i got a quick question. Sorry if this is a dumb one:

I'm trying to build a SVM classification process and I'm using the Import wizard for excel files. The labels for my examples are '1' and '2'.
So the wizard recognizes them correctly as integers.
Now I'm wondering:
Do I get into any trouble later on with SVMs if I force the wizard to transform the labels '1' and '2' into binominal ones?
Will the SVM work correctly?

Thanks a lot in advance,
Sasch.
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    The SVM will work correctly, if your label really only has the two values 1 and 2.


    Happy Mining!
    ~Marius
  • SaschSasch Member Posts: 23  Maven
    Hi Marius,

    thanks a lot for your helpful answer :)

    I assume this will also work for labels '1', '2', '3' and '4' when I transform them in polynominal ones.

    Thanks again,
    Sasch.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Not for the SVM, since it only supports binominal labels ;)
  • SaschSasch Member Posts: 23  Maven
    Why that?

    I know that the regular SVMs are designed for binary tasks only.
    But I'm using the libSVM which supports multiclass learning (?). Can't it handle my four classes with the labels '1', '2', '3' and '4' (transformed into polynominal)?

    Do I have to rename my labels in real nominals like "class_one", "class_two" etc. in the case mentioned above?

    Or should I just let them be recognized as integers by the wizard? Won't it affect the SVM when the labels are integers?

    Sorry if I don't get it...
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Alas, thou art right about the multi-label libSVM! Sorry for the confusion.

    Nevertheless it is important that you have a nominal label, so do NOT let the wizard create integer typed values. Using the strings '1', '2' etc. is fine, however - RapidMiner does not care about the content of strings.

    Best regards,
    Marius
  • SaschSasch Member Posts: 23  Maven
    Hi Marius,

    thanks again for taking the time answering me.

    Just for the record:

    When using SVM:
    - for binary tasks: you can transform your real number or integer labels like 1 & 2 into nominal
    - for multiclass learning: you can transform your real number or integer labels like 1, 2, 3, 4 etc. into polynominal

    => but NEVER use integers as labels when using SVMs. Always use nominal labels.

    Although it seems a bit weird to me as I always thought when I click "label" in the attribute list RM knows what to do with it and cares about the label as a 'label' and not as a feature which is taken into account for building a classification model.
    So what I'm trying to say is : why has the SVM problems with integer or real labels when the labels are just marker for the classes?
  • wesselwessel Member Posts: 537  Guru
    Hey,

    You can use "Generate Attributes" @ att = str(att) to convert your numerical attribute to a nominal attribute.

    You are suggesting that this conversion could have been part of the SVN operator.
    Maybe, but for me its fine the way it is now.
    When it complains it can only handle a nominal class, there is most likely an error in my setup somewhere.
    So the fact that it throws an error is informative.

    Best regards,

    Wessel
  • SaschSasch Member Posts: 23  Maven
    Hey Wessel,
    thanks for your suggestion.

    I'm fine with all the answers here right now and I really appreciate your and Marius' help :)

    But still (now I'm very interested):
    Why has the SVM problems with integer or real labels when the labels are just marker for the classes?
    Why do they have to be nominal (or binominal or polynominal)?

    Sorry if this goes way too deep into SVM algorithm understanding or its integration into RM...
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hi,

    that has nothing to do with the SVM algorithm, but with the general problem classes in Machine Learning. There are two main problems:
    - *Classification Problems* denominate the case where you have a set of categories, and you want to create a model which decides into which category a new data point/example falls.
    - *Regression Problems* include anything where your target variable is a continuous value, e.g. an integer or a real value. Again you learn a prediction model which estimates the numeric target value for new data points.

    RapidMiner decides which problem you have by examining the data type of the label - if it's numeric, it assumes a regression task, if it's nominal, it assumes a classification task.

    Since the SVM is a pure classification algorithm(*), RapidMiner throws an error if you try to apply it on a data set with a numeric label.



    (*) There is a special implementation of the SVM which can also solve regression tasks, but I think that should be out of the scope of this thread.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    wessel wrote:
    You can use "Generate Attributes" @ att = str(att) to convert your numerical attribute to a nominal attribute.
    Hi, just as a remark to ease your life: that's the same what the Numerical to Polynominal operator does, but the latter one saves you some typing :)

    ~Marius
  • SaschSasch Member Posts: 23  Maven
    Ah, now I get it :)

    Thank you both so much for your help !!!

    It has to be said that the support, suggestions, ideas & explantions in this board are really awesome.
    You guys do a great job!

    Best regards,
    Sasch
Sign In or Register to comment.