Options

What type of label is supported for LibSVM one-class learning?

LegendLegend Member Posts: 8 Contributor II
edited November 2018 in Help
Dears,

I have tried SVM (LibSVM) one-class learning, however I found error message:
"The learning scheme SVM does not have sufficient capabilities for the given data set: binominal label not supported".

Even also tried polynominal, several other type conversion, I cannot get it.


I have tried to search on Google and this forum, I failed to find right answer.
Please give me the word :).

Very thanks.
Danny.

Answers

  • Options
    earmijoearmijo Member Posts: 270 Unicorn
    If you use one-class the variable can only have "one class". If the binomial variable you  are trying to classify has 2 options (true/false) the operator will complain. If you are trying to do straight classification change from "one-class" to C-SVC.

  • Options
    LegendLegend Member Posts: 8 Contributor II
    Hi, thanks for your response.

    However, I am not a newbie for SVM classification.
    Learning data has only one label, "true", for one-class learning.

    Even though I was eliminating label attribute, I couldn't get it.

    BR.
  • Options
    fischerfischer Member Posts: 439 Maven
    Hi,

    the capability check is broken for this case. For now, please go to the preferences and check "rapidminer.general.capabilities.warn". This will bypass the check and trigger only a warning (which you can ignore). I will fix this problem.

    Cheers,
    Simon
  • Options
    LegendLegend Member Posts: 8 Contributor II
    Dear Simon,

    I'd very appreciate your support.
    It will helpful.

    Kindly Regards,
    Danny.
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    This will also be very useful to me. LibSVMs one-class model should be able to take a binomial label and predict whether or not an example falls within or outside the one-class.

    Thanks,
    -Gagi
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    we preferred to force the user to recognize that the One-class option really can't distinguish between two classes in a training set. That's why it will only work without warning if only one label value is present. If you have more than one lable, please create a new attribute with only one value as label.

    Greetings,
      Sebastian
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    I understand that for ease of use it makes sense to take one-class labels as training. For prediction one should be able to compare the performance between 2 classes. How else is performance measured? For example if I make a one-class model how can I check if some of my specific samples fall outside the one class boundary as expected?

    The power of one-class learning is that it is unsupervised, it would be nice to have the ability to use labeled data to check how well an unsupervised approach can separate 2 classes of data the one-class (normal samples) and the other-class (outliers).
  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,
    dragoljub wrote:

    The power of one-class learning is that it is unsupervised, it would be nice to have the ability to use labeled data to check how well an unsupervised approach can separate 2 classes of data the one-class (normal samples) and the other-class (outliers).
    well the thing is, that one-class learning does not separate multiple classes but only builds a model about one class and to what extend data points are believed to belong to that single class. This does not say anything about a second class at all. You may of course define outliers for yourself by implying a threshold for the class confidence after you applied the one-class model. It should be obvious that such a threshold can not be defined by the learning approach (on what basis should such a threshold be defined chosen) but must be defined by the user, as it may stronlgy depend on your data and the class distribution.

    Kind regards,
    Tobias
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Tobias Malbrecht wrote:

    It should be obvious that such a threshold can not be defined by the learning approach (on what basis should such a threshold be defined chosen) but must be defined by the user, as it may stronlgy depend on your data and the class distribution.
    The thing is, for one-class SVM nu sets that threshold. According to Learning with Kernels (by Bernhard Schölkopf and Alex Smola) nu sets the upper bound on the % of outliers and the lower bound on the % of support vectors. So in reality one-class SVM actually predicts between 2 classes of data, the in-class and the out-class. All I am saying is that the libSVM operator would be more useful if for one-class learning it would allow you to send 2 classes of labels the 'in-class' and the 'out-class' and see if the learning algorithm can distinguish between them in an unsupervised way according to your kernel and nu parameter. This is exactly how the C implementation of libSVM works.

    I'm confused about how a one-class model created in rapidminer can be used to predict outliers versus normal samples.

    Thanks,
    -Gagi
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Hello,

    I am working with one-class svm's too and I also miss the libsvm behaviour. Nevertheless based on the comments I implemented a little example how you can classify data with one-class models based on thresholds. Hope that helps ;)

    greetings,
    Harald

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="557" width="1090">
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="two gaussians classification"/>
            <parameter key="number_examples" value="2000"/>
            <parameter key="number_of_attributes" value="8"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
            <parameter key="attributes_upper_bound" value="1.0"/>
            <parameter key="use_local_random_seed" value="true"/>
          </operator>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="182" y="165"/>
          <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
            <process expanded="true" height="673" width="433">
              <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="label=cluster1"/>
              </operator>
              <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="180" y="30">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="label"/>
                <parameter key="invert_selection" value="true"/>
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="120">
                <list key="function_descriptions">
                  <parameter key="label" value="&quot;cluster1&quot;"/>
                </list>
              </operator>
              <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="180" y="120">
                <parameter key="name" value="label"/>
                <parameter key="target_role" value="label"/>
              </operator>
              <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="313" y="120">
                <parameter key="svm_type" value="one-class"/>
                <parameter key="gamma" value="5.0"/>
                <parameter key="coef0" value="3.0"/>
                <parameter key="nu" value="0.4"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="training" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="673" width="547">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="find_threshold" expanded="true" height="76" name="Find Threshold" width="90" x="45" y="165"/>
              <operator activated="true" class="apply_threshold" expanded="true" height="76" name="Apply Threshold" width="90" x="179" y="165"/>
              <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="313" y="30">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="prediction(label)"/>
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="447" y="30">
                <parameter key="use_example_weights" value="false"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Find Threshold" to_port="example set"/>
              <connect from_op="Find Threshold" from_port="example set" to_op="Apply Threshold" to_port="example set"/>
              <connect from_op="Find Threshold" from_port="threshold" to_op="Apply Threshold" to_port="threshold"/>
              <connect from_op="Apply Threshold" from_port="example set" to_op="Nominal to Binominal" to_port="example set input"/>
              <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="multiply" expanded="true" height="94" name="Multiply (2)" width="90" x="447" y="30"/>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="581" y="165">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="find_threshold" expanded="true" height="76" name="Find Threshold (2)" width="90" x="718" y="165"/>
          <operator activated="true" class="apply_threshold" expanded="true" height="76" name="Apply Threshold (2)" width="90" x="852" y="165"/>
          <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal (2)" width="90" x="986" y="165">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="prediction(label)"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Validation" from_port="model" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_port="result 1"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="Apply Model (2)" to_port="model"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Find Threshold (2)" to_port="example set"/>
          <connect from_op="Find Threshold (2)" from_port="example set" to_op="Apply Threshold (2)" to_port="example set"/>
          <connect from_op="Find Threshold (2)" from_port="threshold" to_op="Apply Threshold (2)" to_port="threshold"/>
          <connect from_op="Apply Threshold (2)" from_port="example set" to_op="Nominal to Binominal (2)" to_port="example set input"/>
          <connect from_op="Nominal to Binominal (2)" from_port="example set output" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Thanks for this example!

    I am however getting this error in RM5:

    "The learning scheme SVM does not have sufficient capabilities for the given data set: polynominal label not supported"

    How are you getting around this?

    Thanks,
    -Gagi
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if I remember correctly, I have already solved that issue in the current developer version. Since we will publish the final version today, you simply could update afterwards.

    Please tell me, if this issue remains.

    Greetings,
      Sebastian
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    i solved it by unselecting "..capabilities.warn" in the preferences, but as sebastian said the new version should fix this.

    greetings,
    harald
  • Options
    Stefan_EStefan_E Member Posts: 53 Maven
    Hi,

    first, thanks for the example by Harry678. It seems to run without having to change default preferences in 5.0.3.
    But:
    • Harry has to use three operators in the training part of the validator (Select Attribute / Generate Attribute / Set Role) which really shouldn't be needed if the fix Sebastian promised worked correctly: After the Filter Examples, all what is there is one class...
    • I can't see how the example could serve any useful purpose, as the threshold searching on top level requires the label to be present  ???
    Specifically, the second point is a killer and essentially renders One-Class SVM in RM useless: It appears that whatever data set one applies 1C model to, the calculated confidence levels span the entire range from 0 to 1; so all one gets is an example ordering.

    I was hoping that the SVM checkbox 'calculate confidences' would do something useful: Well, it shows a message one-class SVM probability output not supported yet - not sure whether this is a problem with libSVM or RM?

    Stefan
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Hi Stefan,

    Due to the fact that one-class learning can only learn exactly one nominal class label, the three operators in the training are necessary by concept. To change this behavior some changes in the LibSVMLearner and Model are required.

    please have a look at http://rapid-i.com/rapidforum/index.php/topic,1746.0.html

    This patch adds the classic libsvm one-class classification behavior which predicts 1 or -1 for a sample. You still need to postprocess the labels, but at least you get some kind of binary prediction out of the model. I'd gladly accept feedback for this patch and maybe someone of the dev's can have a look on it. ;)

    Greetings, Harald
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    This  looks very promising. I am not using one-class SVM for my current project but expect to get back into it soon. I will give this a shot and see how it goes.

    I highly encourage RM dev team to consider this patch since LibSVMs one-class algorithm is one of the most useful unsupervised learning methods in practice (no labeled  classes).

    -Gagi
  • Options
    Stefan_EStefan_E Member Posts: 53 Maven
    Indeed, this looks promising - so I have to invest getting a build environment for RM up and running  :-\ ...

    Am I then rightly interpreting your patch that libSVM (C version) only gives an in/out classification, but doesn't attribute a continues confidence level to the result?
    harri678 wrote:

    Due to the fact that one-class learning can only learn exactly one nominal class label, the three operators in the training are necessary by concept.
    I disagree here... RM knows labels 'polynominal' and 'binominal' - there is no label class attribute 'uninominal'. Hence, if there is a check whether or not there are multiple label values, this has to be implied from the data. But if you have a filter leaving only one value, the implication on the remaining data is clear.

    (I'm insisting on this, since RMs selling point is to support rapid development - however, such kind of detours as needed in the example make the environment very heavy to use and waste user time on training RM rather than training the learner ... I'm sensing that this is a consequence of the create view on data rather than copy data, which seems to be what the Multiply operator on top level is doing, at least 'sometimes' ...)

    Stefan
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Hi Stefan,
    Am I then rightly interpreting your patch that libSVM (C version) only gives an in/out classification, but doesn't attribute a continues confidence level to the result?
    Yes you can either get the classic confidence value or the classification behavior from the patch. At first I tried to deliver both confidence and prediction but it wasn't that easy, the svm_predict function didn't return the confidence values (java libsvm problem?). So to get both confidence and prediction for each example two svm_* are needed and I didn't like the overhead so I left it out in the patch.
    I disagree here... RM knows labels 'polynominal' and 'binominal' - there is no label class attribute 'uninominal'.
    Same here. As I have seen it in the code many checks and decisions in the LibSVMLearner are based on label attribute type and the number of different label attribute values. This would need quite a big change and lots of testing. What would be the most logical GUI variant for one-class? I think one filter and one learner?

    You can find very good documentation on the development environment (eclipse, subclipse) on the RM website ;)


    Greetings,
    Harald
  • Options
    Stefan_EStefan_E Member Posts: 53 Maven
    ... this is the second time in a week, that I have to bump a thread from 2010.    :'(

    I have a process which eventually filters down a data set to one class, "around which" I want a one-class SVM build a model for use on full data - I want to see, whether such SVM is then able to isolate the samples correctly.

    I set a break-point just before the SVM operator and find that I have a binominal label with mode = 1 (3872), least = 0 (0). ok. - good.

    Then, I get into SVM: The operator SVM does not have sufficient capabilties ...

    Then, I follow Simons advise to check rapidminer.general.capabilities.warn (now in 5.1.017). This has the simple effect of changing the error message to The attribute Label has 2 different values...

    So, what now?

    Thanks for any help!                      Stefan
  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
    Hi Stefan,

    This is because the metadata in RapidMiner needs to refresh.  I find that before I send the data into a one-class SVM (if I've filtered the data) I need to save it and then reimport it into process. 
    One way of doing this is Write CSV followed by Read CSV (reading the CSV from the file output). 
    That gets the metadata for the binominal label cleared. 
Sign In or Register to comment.