Options

"SVM Models - Target Variable with more than two levels"

HamsterDRHamsterDR Member Posts: 3 Contributor I
edited June 2019 in Help
I have a data mining problem in which there are four levels in the target variable.  I have used a SVM model in Statistica that works very well for my data - and supports the four level target variable.  I am just starting out with Rapid Miner, and it looks like all the SVM models in Rapid Miner only support binary target variables.  Is that the case?  I think the libSVM implementation supports more than two levels (that is what Statistica uses) - but the description of this SVM implementation in Rapid Miner still seems to say that it only supports binary target variables.  If this capability is not available now, is it planned for the future?

David
Tagged:

Answers

  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    It works fine with labels that have multiple nominal values.

    Here's an example using the Iris data set

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve Iris" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="5.3.007" expanded="true" height="112" name="Validation" width="90" x="313" y="75">
            <process expanded="true">
              <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.007" expanded="true" height="76" name="SVM" width="90" x="179" y="30">
                <list key="class_weights"/>
              </operator>
              <connect from_port="training" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="5.3.007" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.3.007" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    regards

    Andrew

  • Options
    HamsterDRHamsterDR Member Posts: 3 Contributor I
    I don't think so - this is what I got when I tried to run a dataset with a four level target variable.  I used the SVM libSVM option.

    Apr 16, 2013 8:11:21 PM SEVERE: Process failed: The operator SVM does not have sufficient capabilities for the given data set: polynominal attributes not supported

    David
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    You've probably got nominals in the non target variables. What does the meta-data of the input example set look like before the SVM?

    Andrew
  • Options
    HamsterDRHamsterDR Member Posts: 3 Contributor I
    I got that message on my home PC with 16GB of RAM (the process was using 12GB of RAM).  On my work laptop (with 4GB) I can't even read in the data without running out of memory.  It looks to me like the system is trying to keep everything in memory.  This is not a big dataset - 9100 observations and 423 variables - so that is surprising.  The original data is in SAS, but the SAS import step fails (I have reported the bug) - I had to save it as an excel file to get Rapid-I to read it.

    I think I am getting way ahead of myself here - I am new to Rapid-I and I need to start with some simpler examples.  I just got the "Data Mining for the Masses" (Matthew North) book, and will work through the examples in that book to get started.

    David
  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Select the SVM process, right click and choose Breakpoint Before (shift F7).

    Run the process.

    Go to the meta data view.

    What are the roles and types of each of the attributes?

    One should have the label role and should be type nominal.

    All the remaining regular attributes must be numeric, integer or real.

    If this checks out, LibSVM will work

    As for the SAS import issue, how big is the raw data file?

    regards

    Andrew
Sign In or Register to comment.