polynomial classfication by using SVM

ShaguShagu Member Posts: 5 Contributor I
edited November 2018 in Help

I kept getting errors of using SVM to do polynomial classification? I am quite new to data analytics. Any help would be appreicated.

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi and welcome to our community,

     

    The message means that you try to make a prediction for a label (or "target") with more than two categorical values (which is called "polynominal" in RapidMiner).  And the SVM you are using is not supporting this type of data.  Try the operator "SVM (LibSVM)" instead which can handle this.

     

    You can check what types of data is supported by a machine learning model if you right click on the operator and select "Operator Info".  You will see a table describing the supported data types.

     

    Another useful resource is the following web page: http://mod.rapidminer.com

     

    Here you can make settings describing your data and it will show you the model types which can be used on that data.

     

    Hope that helps,

    Ingo

  • ShaguShagu Member Posts: 5 Contributor I

    Ingo, thank you very much! This is very helpful. Following your instruction, there seems no logistic regression for polynominal labels. Did I miss anything to use logistic regression for multi classification?

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder

    Hi,

     

    This is correct.  Logistic regression can only do binominal classification (i.e. for two classes only).  BUT you can always embed any binominal learner into the ensemble operator "Polynominal by Binominal Classification" which turns the polynominal classification problem into a set of binominal classification problems following either a 1-vs-1 or a 1-vs-all strategy.

     

    Below is a process which shows you how to do that.

     

    Hope this helps,

    Ingo

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
    </operator>
    <operator activated="true" class="polynomial_by_binomial_classification" compatibility="7.3.001" expanded="true" height="82" name="Polynominal by Binominal Classification" width="90" x="179" y="34">
    <process expanded="true">
    <operator activated="true" class="h2o:logistic_regression" compatibility="7.3.000" expanded="true" height="103" name="Logistic Regression" width="90" x="45" y="34"/>
    <connect from_port="training set" to_op="Logistic Regression" to_port="training set"/>
    <connect from_op="Logistic Regression" from_port="model" to_port="model"/>
    <portSpacing port="source_training set" spacing="0"/>
    <portSpacing port="sink_model" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Polynominal by Binominal Classification" to_port="training set"/>
    <connect from_op="Polynominal by Binominal Classification" from_port="model" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
  • ShaguShagu Member Posts: 5 Contributor I

    Thanks, Ingo. This is very helpful. I got better understanding after reading the rapidminer-studio-operator-reference document. Thanks again!

  • ShaguShagu Member Posts: 5 Contributor I

    Ingo,

    A similar question came up with Sample(Bootstrapping). There seems no way I can define different multipliers to different classes. For example, class1 has 10 data points and class 2 has 5 data points. I want to duplicate the class 2 data points and make the total number to be 10, which is the same as class1. I cannot use Sample(Bootstrapping). I don't want to down sample by just using Sample operating with ratio parameter because the number of data is already very small, i.e. 10. I need to fully use all the data. Is there any other operator available? Or I can manually duplicate class 2. Thanks!

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    @Shagu you should consider using the "generate weight" operator instead, which will generate weights to balance the classes, and does not discard any data.  It is roughly equivalent to duplicating under-represented examples but not as messy.  You just have to check that whatever learning algorithm you are using is able to handle weighted examples.  Unfortunately the native RapidMiner logistic regression operator does not, but the very similar logistic regression operator from the Weka extension does.  (You can check this by pressing F1 when selecting any learning operator in your process).

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • ShaguShagu Member Posts: 5 Contributor I

    Thank you, Telcontar120. Since I am working in an engineering field, where data are rather limited than financial and insurance areas, due to the fact that every data is costly. I feel Naive Bayesian is the best model, because it is simple and stable when the number of data is small. Is this just my intuition? Or is there any mathematical theory behind it. Thanks again!

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I am not really a statistical theoretician, so I can't say for sure.  My experience is that determining which learning algorithm works best is highly contextual based on the dataset you are working with.  Regardless of the specific algorithm chosen, using standard model validation approaches such as cross-validation will be an important part of ensuring that your final model is robust.  Also choosing a simpler final model will generally help it to be more robust over time.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.