Options

Patch for LibSVM One Class Classification

harri678harri678 Member Posts: 34 Contributor II
edited November 2019 in Help
Hi,

I made some changes on the code to enable the libsvm-style (-1/1) classification based on nu for oneclass svm's instead of the confidence value. For downwards compatibility I added a parameter to the LibSVMLearner where the appreciated one-class behavior can be selected. This feature was already discussed in:

http://rapid-i.com/rapidforum/index.php/topic,1599.0.html and http://rapid-i.com/rapidforum/index.php/topic,1596.0.html

Here is the patch for rev. 45:

Index: src/com/rapidminer/operator/learner/functions/kernel/LibSVMLearner.java
===================================================================
--- src/com/rapidminer/operator/learner/functions/kernel/LibSVMLearner.java (revision 45)
+++ src/com/rapidminer/operator/learner/functions/kernel/LibSVMLearner.java (working copy)

/** The parameter name for "Indicates if proper confidence values should be calculated." */
public static final String PARAMETER_CALCULATE_CONFIDENCES = "calculate_confidences";

+ /** The parameter name for "Indicates if the traditional libsvm one-class classification behavior should be used." */
+ public static final String PARAMETER_ONECLASS_CLASSIFICATION = "one_class_classification";
+
/** The parameter name for "Indicates if proper confidence values should be calculated." */
public static final String PARAMETER_CONFIDENCE_FOR_MULTICLASS = "confidence_for_multiclass";



svm_model model = Svm.svm_train(problem, params);

- return new LibSVMModel(exampleSet, model, exampleSet.getAttributes().size(), getParameterAsBoolean(PARAMETER_CONFIDENCE_FOR_MULTICLASS));
+ return new LibSVMModel(exampleSet, model, exampleSet.getAttributes().size(), getParameterAsBoolean(PARAMETER_CONFIDENCE_FOR_MULTICLASS), getParameterAsBoolean("one_class_classification"));
}

@Override

type.setExpert(false);
types.add(type);
types.add(new ParameterTypeBoolean(PARAMETER_CONFIDENCE_FOR_MULTICLASS, "Indicates if the class with the highest confidence should be selected in the multiclass setting. Uses binary majority vote over all 1-vs-1 classifiers otherwise (selected class must not be the one with highest confidence in that case).", true));
+
+ type = new ParameterTypeBoolean(PARAMETER_ONECLASS_CLASSIFICATION, "Indicates if a one-class model should predict the class of an example (integer label: 1 or -1) instead of returning the class confidence. By default a confidence is calculated which can be processed by threshold operators.", false);
+ type.setExpert(false);
+ type.registerDependencyCondition(new EqualTypeCondition(this, PARAMETER_SVM_TYPE, SVM_TYPES, false, SVM_TYPE_ONE_CLASS));
+ types.add(type);
+
return types;
}
}
Index: src/com/rapidminer/operator/learner/functions/kernel/LibSVMModel.java
===================================================================
--- src/com/rapidminer/operator/learner/functions/kernel/LibSVMModel.java (revision 45)
+++ src/com/rapidminer/operator/learner/functions/kernel/LibSVMModel.java (working copy)

import com.rapidminer.example.Example;
import com.rapidminer.example.ExampleSet;
import com.rapidminer.example.FastExample2SparseTransform;
+import com.rapidminer.example.table.AttributeFactory;
import com.rapidminer.operator.UserError;
import com.rapidminer.operator.learner.FormulaProvider;
+import com.rapidminer.tools.Ontology;
import com.rapidminer.tools.Tools;

/**


    private boolean confidenceForMultiClass = true;
   
- public LibSVMModel(ExampleSet exampleSet, svm_model model, int numberOfAttributes, boolean confidenceForMultiClass) {
+    private boolean oneClassClassification = false;
+   
+ public LibSVMModel(ExampleSet exampleSet, svm_model model, int numberOfAttributes, boolean confidenceForMultiClass, boolean oneClassClassification) {
super(exampleSet);
this.model = model;
        this.numberOfAttributes = numberOfAttributes;
        this.confidenceForMultiClass = confidenceForMultiClass;
+        this.oneClassClassification = oneClassClassification;
}
   
    @Override

confidenceAttributes = exampleSet.getAttributes().getSpecial(Attributes.CONFIDENCE_NAME + "_" +  labelName);
}
}
-       
-       
+
        if (label.isNominal() && (label.getMapping().size() == 1)) { // one class SVM
-            double[] allConfidences = new double[exampleSet.size()];
+           
            int counter = 0;
-            double maxConfidence = Double.NEGATIVE_INFINITY;
-            double minConfidence = Double.POSITIVE_INFINITY;
            Iterator<Example> i = exampleSet.iterator();
-            while (i.hasNext()) {
-                Example e = i.next();
-                svm_node[] currentNodes = LibSVMLearner.makeNodes(e, ripper);
+           
+            if (oneClassClassification) {
+            // classification behavior
+            String name = predictedLabel.getName();
+                Attribute newLabel = AttributeFactory.createAttribute(name, Ontology.INTEGER);
+                newLabel.clearTransformations();
               
-                double[] prob = new double[1];
-                Svm.svm_predict_values(model, currentNodes, prob);
-                allConfidences[counter++] = prob[0];
-                minConfidence = Math.min(minConfidence, prob[0]);
-                maxConfidence = Math.max(maxConfidence, prob[0]);
+                exampleSet.getExampleTable().removeAttribute(predictedLabel);
+                exampleSet.getExampleTable().addAttribute(newLabel);
+                exampleSet.getAttributes().setPredictedLabel(newLabel);
+               
+                while (i.hasNext()) {
+                    Example e = i.next();
+                    svm_node[] currentNodes = LibSVMLearner.makeNodes(e, ripper);
+                    e.setPredictedLabel((int) Svm.svm_predict(model, currentNodes));
+                }
+            } else {
+            // classic behavior
+                double[] allConfidences = new double[exampleSet.size()];
+                double maxConfidence = Double.NEGATIVE_INFINITY;
+                double minConfidence = Double.POSITIVE_INFINITY;
+           
+                while (i.hasNext()) {
+                    Example e = i.next();
+                    svm_node[] currentNodes = LibSVMLearner.makeNodes(e, ripper);
+                   
+                    double[] prob = new double[1];
+                    Svm.svm_predict_values(model, currentNodes, prob);
+                    allConfidences[counter++] = prob[0];
+                    minConfidence = Math.min(minConfidence, prob[0]);
+                    maxConfidence = Math.max(maxConfidence, prob[0]);
+                }
+               
+                counter = 0;
+                String className = predictedLabel.getMapping().mapIndex(0);
+
+                i = exampleSet.iterator();
+
+                while (i.hasNext()) {
+                    Example e = i.next();
+                    e.setValue(predictedLabel, 0);
+                    e.setConfidence(className, (allConfidences[counter++] - minConfidence) / (maxConfidence - minConfidence));
+                }
            }
-           
-            counter = 0;
-            String className = predictedLabel.getMapping().mapIndex(0);
-            i = exampleSet.iterator();
-            while (i.hasNext()) {
-                Example e = i.next();
-                e.setValue(predictedLabel, 0);
-                e.setConfidence(className, (allConfidences[counter++] - minConfidence) / (maxConfidence - minConfidence));
-            }
        } else {
            Iterator<Example> i = exampleSet.iterator();
            while (i.hasNext()) {
And here is a little example which uses the functionality for X-Val:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="-20" width="-50">
      <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="target_function" value="two gaussians classification"/>
        <parameter key="number_examples" value="500"/>
        <parameter key="number_of_attributes" value="8"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="1.0"/>
        <parameter key="use_local_random_seed" value="true"/>
        <parameter key="local_random_seed" value="85"/>
      </operator>
      <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
        <process expanded="true">
          <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=cluster1"/>
          </operator>
          <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="label"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="120">
            <list key="function_descriptions">
              <parameter key="label" value="&quot;cluster1&quot;"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="180" y="120">
            <parameter key="name" value="label"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="313" y="30">
            <parameter key="svm_type" value="one-class"/>
            <parameter key="coef0" value="3.0"/>
            <parameter key="nu" value="0.4"/>
            <list key="class_weights"/>
            <parameter key="one_class_classification" value="true"/>
          </operator>
          <connect from_port="training" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="numerical_to_polynominal" expanded="true" height="76" name="Numerical to Polynominal" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="prediction(label)"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="map" expanded="true" height="76" name="Map" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="prediction(label)"/>
            <parameter key="include_special_attributes" value="true"/>
            <list key="value_mappings">
              <parameter key="-1" value="cluster0"/>
              <parameter key="1" value="cluster1"/>
            </list>
          </operator>
          <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="447" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Numerical to Polynominal" to_port="example set input"/>
          <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Map" to_port="example set input"/>
          <connect from_op="Map" from_port="example set output" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
The returned prediction values are integer and either 1 (same class) or -1 (outlier) directly from libsvm. The real labels cannot be used because one-class models are trained with one type of data. Therefore some processing (numerical2nominal, map) is necessary to allow performance evaluation. Its not nice but you can see the usage it in the example. Please feel free to use the code as you like!

Greetings, Harald

Answers

  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Herald,

    How could I install this patch on RM 5.003? Is it a simple plugin?

    -Gagi
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Hi,

    I used Eclipse and Subclipse to checkout the latest development version of RM. You can apply the patch via Eclipse and then start RM. The Eclipse/Subversion setup is very well explained on the website: http://rapid-i.com/content/view/25/48/lang,en/

    I hope that helps ;)

    Greetings, Harald
  • Options
    sbutlersbutler Member Posts: 5 Contributor II
    Is there somewhere I can download a new version with this added?

    If not any chance someone could upload the .class files for me?
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I will check if we can include this change into the RapidMiner Core with one of the next updates.

    Greetings,
      Sebastian
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Hi Sebastian,

    I've detected a problem with the functionality of this patch and I'll try to release a newer one in the next days. The problem is related to the ROC-curve which cannot be printed because confidence values are absent in the new classification mode. Therefore some changes are required. I will post the patch here when ready!

    greetings,
    Harald
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    ok, thank you. We will wait to include it until you solved the issue.

    Greetings,
      Sebastian
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    How can we apply this patch once it is available? Will it simply be added to the update agent in RM?

    This will be a very welcome and important patch for practical data mining on unlabeled data.

    Thanks!

    -Gagi
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    yes, here's already an paper on my desk reminding me to include the patch into the core. This will be delivered with the next update of RapidMiner.

    Greetings,
      Sebastian
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Hi,

    sorry guys for taking so long to fix the patch, but I am currently right in the middle of my master thesis and I lack the time to proceed with coding :(

    I think the already posted version works as it is (f: X -> {-1, +1}), but some important features are missing for pretty rapidminer integration (e.g.: confidence is required for ROC plot, ...).

    @developers:
    I think a ROC plot is absolutely necessary in all modes of oc-svm, but this requires confidence. the current confidence is calculated by rescaling the prediction confidence from R -> [0,1] as i've seen it in the code. A simpler way to calculate the actual classification result than in the posted patch is by signum(prediction confidence) before rescaling.
    My problem which stalled me in writing improved code was the handling of the Attribute in the example set, as the model only knows nominal with one value (classname), but the result of oc-svm would be binominal (inside/outside). Replacing the the exampleset's label attribute with a binominal one led to errors in performance evaluation (conflicting binomal types?). If you need the current (non-working) state of the code please pm me.

    greetings,
    Harald
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Thanks for the great work Harald!

    I think I can wait for the next release... Wait.. When is the next release?

    -Gagi
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Harald,
    there's no AttributeType that only contains 1 value. So you might add a second value to any binominal attribute, if only 1 is know so far. Good luck with your master thesis. What are you writing about?

    @Gagi:
    The next release will be on friday, but as it seems this patch won't make it until then :)

    Greetings,
      Sebastian
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    We NEED this patch : )  8)

    Thanks for all the hard work guys. Your tool is gonna take off in a big way.

    -Gagi
    Sebastian Land wrote:

    Hi Harald,
    there's no AttributeType that only contains 1 value. So you might add a second value to any binominal attribute, if only 1 is know so far. Good luck with your master thesis. What are you writing about?

    @Gagi:
    The next release will be on friday, but as it seems this patch won't make it until then :)

    Greetings,
      Sebastian
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Hey,

    Sebastian I'm not sure if I have fully understood how the Attribute class is working, so I'll try to explain it again with more details.
    Currently the one-class libsvm learner has a constraint for the label attribute: it must be nominal with only one entry (which is the name of the one class, e.g. "cluster0"). Validations in LibSVMLearner and LibSVMModel use this constraint do differ between the libsvm modes, so changing this concept would require changes in the logic of the code. So i tried to replace the nominal predictionLabel during the performPrediction() to binominal. But I think it needs more than this (confidence Attribute?)...

    The following things are not working yet:
    - confidence in ExampleSet is not displayed, i think this is due the misunderstanding part of the Attribute stuff ;). Therefore no ROC yet.
    - strange behavior in X-Val due to the filter stuff: you can produce the error message by setting "number of validations" to "20" in the following example.
    - the code is definitly cleaner than in the first posted patch

    This is the latest state of my patch:

    ### Eclipse Workspace Patch 1.0
    #P RapidMiner_Vega
    Index: src/com/rapidminer/operator/learner/functions/kernel/LibSVMLearner.java
    ===================================================================
    --- src/com/rapidminer/operator/learner/functions/kernel/LibSVMLearner.java (revision 69)
    +++ src/com/rapidminer/operator/learner/functions/kernel/LibSVMLearner.java (working copy)

    /** The parameter name for &quot;Indicates if proper confidence values should be calculated.&quot; */

    public static final String PARAMETER_CALCULATE_CONFIDENCES = "calculate_confidences";



    + /** The parameter name for &quot;Indicates if the traditional libsvm one-class classification behavior should be used.&quot; */

    + public static final String PARAMETER_ONECLASS_CLASSIFICATION = "one_class_classification";

    +

    /** The parameter name for &quot;Indicates if proper confidence values should be calculated.&quot; */

    public static final String PARAMETER_CONFIDENCE_FOR_MULTICLASS = "confidence_for_multiclass";






    svm_model model = Svm.svm_train(problem, params);



    - return new LibSVMModel(exampleSet, model, exampleSet.getAttributes().size(), getParameterAsBoolean(PARAMETER_CONFIDENCE_FOR_MULTICLASS));

    + return new LibSVMModel(exampleSet, model, exampleSet.getAttributes().size(), getParameterAsBoolean(PARAMETER_CONFIDENCE_FOR_MULTICLASS), getParameterAsBoolean("one_class_classification"));

    }



    @Override


    type.setExpert(false);

    types.add(type);

    types.add(new ParameterTypeBoolean(PARAMETER_CONFIDENCE_FOR_MULTICLASS, "Indicates if the class with the highest confidence should be selected in the multiclass setting. Uses binary majority vote over all 1-vs-1 classifiers otherwise (selected class must not be the one with highest confidence in that case).", true));

    +

    + type = new ParameterTypeBoolean(PARAMETER_ONECLASS_CLASSIFICATION, "Indicates if a one-class model should predict the class of an example (integer label: 1 or -1) instead of returning the class confidence. By default a confidence is calculated which can be processed by threshold operators.", false);

    + type.setExpert(false);

    + type.registerDependencyCondition(new EqualTypeCondition(this, PARAMETER_SVM_TYPE, SVM_TYPES, false, SVM_TYPE_ONE_CLASS));

    + types.add(type);

    +

    return types;

    }

    }

    Index: src/com/rapidminer/operator/learner/functions/kernel/LibSVMModel.java
    ===================================================================
    --- src/com/rapidminer/operator/learner/functions/kernel/LibSVMModel.java (revision 69)
    +++ src/com/rapidminer/operator/learner/functions/kernel/LibSVMModel.java (working copy)

    import com.rapidminer.example.Example;

    import com.rapidminer.example.ExampleSet;

    import com.rapidminer.example.FastExample2SparseTransform;

    +import com.rapidminer.example.table.AttributeFactory;

    import com.rapidminer.operator.UserError;

    import com.rapidminer.operator.learner.FormulaProvider;

    +import com.rapidminer.tools.Ontology;

    import com.rapidminer.tools.Tools;



    /**




        private boolean confidenceForMultiClass = true;

       

    - public LibSVMModel(ExampleSet exampleSet, svm_model model, int numberOfAttributes, boolean confidenceForMultiClass) {

    +    private boolean oneClassClassification = false;

    +   

    + public LibSVMModel(ExampleSet exampleSet, svm_model model, int numberOfAttributes, boolean confidenceForMultiClass, boolean oneClassClassification) {

    super(exampleSet);

    this.model = model;

            this.numberOfAttributes = numberOfAttributes;

            this.confidenceForMultiClass = confidenceForMultiClass;

    +        this.oneClassClassification = oneClassClassification;

    }

       

        @Override


    confidenceAttributes = exampleSet.getAttributes().getSpecial(Attributes.CONFIDENCE_NAME + "_" +  labelName);

    }

    }

    -       

    -       

    +

            if (label.isNominal() && (label.getMapping().size() == 1)) { // one class SVM

    +           

    +            int counter = 0;

                double[] allConfidences = new double[exampleSet.size()];

    -            int counter = 0;

    +            int[] allLabels = new int[exampleSet.size()];

                double maxConfidence = Double.NEGATIVE_INFINITY;

                double minConfidence = Double.POSITIVE_INFINITY;

    +            double confidence;

    +                       

                Iterator<Example> i = exampleSet.iterator();

    +            String className = predictedLabel.getMapping().mapIndex(0);

    +           

    +            if (oneClassClassification) {

    +            // classification behavior

    +            predictedLabel.getMapping().getValues().clear();

    +            predictedLabel.getMapping().getValues().add("outside"); //0

    +            predictedLabel.getMapping().getValues().add("inside");  //1

    +            predictedLabel.setBlockType(Ontology.BINOMINAL);

    +            }

    +           

                while (i.hasNext()) {

                    Example e = i.next();

                    svm_node[] currentNodes = LibSVMLearner.makeNodes(e, ripper);

                   

                    double[] prob = new double[1];

                    Svm.svm_predict_values(model, currentNodes, prob);

    +                if (oneClassClassification) {

    +                allLabels[counter] = (prob[0] >= 0) ? 1 : -1;

    +                }

                    allConfidences[counter++] = prob[0];

                    minConfidence = Math.min(minConfidence, prob[0]);

                    maxConfidence = Math.max(maxConfidence, prob[0]);

                }

               

                counter = 0;

    -            String className = predictedLabel.getMapping().mapIndex(0);

    +

                i = exampleSet.iterator();

    +         

                while (i.hasNext()) {

                    Example e = i.next();

    -                e.setValue(predictedLabel, 0);

    -                e.setConfidence(className, (allConfidences[counter++] - minConfidence) / (maxConfidence - minConfidence));

    +                confidence = (allConfidences[counter] - minConfidence) / (maxConfidence - minConfidence);

    +                if (oneClassClassification) {

    +                e.setPredictedLabel(allLabels[counter]> 0 ? 1 : 0);

    +                // TODO confidence does not work yet

    +                e.setConfidence(className, confidence);

    +                } else {

    +                e.setValue(predictedLabel, 0);

    +                e.setConfidence(className, confidence);

    +                }

    +                counter++;

                }

    +

            } else {

                Iterator<Example> i = exampleSet.iterator();

                while (i.hasNext()) {


    And here a little example for debugging:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="449" width="1083">
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="two gaussians classification"/>
            <parameter key="number_of_attributes" value="8"/>
            <parameter key="attributes_lower_bound" value="0.0"/>
            <parameter key="attributes_upper_bound" value="1.0"/>
            <parameter key="use_local_random_seed" value="true"/>
            <parameter key="local_random_seed" value="85"/>
          </operator>
          <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="313" y="30">
            <parameter key="number_of_validations" value="3"/>
            <process expanded="true" height="673" width="542">
              <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="label=cluster1"/>
              </operator>
              <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="180" y="30">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="label"/>
                <parameter key="invert_selection" value="true"/>
                <parameter key="include_special_attributes" value="true"/>
              </operator>
              <operator activated="true" class="generate_attributes" expanded="true" height="76" name="Generate Attributes" width="90" x="45" y="120">
                <list key="function_descriptions">
                  <parameter key="label" value="&quot;cluster1&quot;"/>
                </list>
              </operator>
              <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="180" y="120">
                <parameter key="name" value="label"/>
                <parameter key="target_role" value="label"/>
              </operator>
              <operator activated="true" class="support_vector_machine_libsvm" expanded="true" height="76" name="SVM" width="90" x="313" y="120">
                <parameter key="svm_type" value="one-class"/>
                <parameter key="gamma" value="0.09"/>
                <parameter key="coef0" value="3.0"/>
                <parameter key="nu" value="0.01"/>
                <list key="class_weights"/>
                <parameter key="one_class_classification" value="true"/>
              </operator>
              <connect from_port="training" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
              <connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="SVM" to_port="training set"/>
              <connect from_op="SVM" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="673" width="547">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="map" expanded="true" height="76" name="Map" width="90" x="179" y="30">
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="prediction(label)"/>
                <parameter key="include_special_attributes" value="true"/>
                <list key="value_mappings">
                  <parameter key="inside" value="cluster1"/>
                  <parameter key="outside" value="cluster0"/>
                </list>
              </operator>
              <operator activated="true" class="performance_binominal_classification" expanded="true" height="76" name="Performance" width="90" x="313" y="30">
                <parameter key="AUC" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Map" to_port="example set input"/>
              <connect from_op="Map" from_port="example set output" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    hehe, my master thesis ..
    I am working on an anomaly detection prototype for networks. My data collection system IAS is used for visualization of network traffic, but it doesn't detect attacks/anomalies/etc by itself and nobody has tried to do anomaly detection based on this system yet. In my master thesis i want to evaluate if the IAS is "good enough" to be used as a basis in the anomaly detection prototype. The plan was to use oc-svm to learn normal network behavior and detect deviations, but I have switched to supervised algorithms for now (better accuracy, faster to implement in RM). The IAS data is quite challenging ... my training-testing sets are 5 sparse files where one has about 7200 examples, 155335 attributes and highly correlated values, quite funny to work with ;)

    greetings, Harald
  • Options
    wesselwessel Member Posts: 537 Maven
    Do you have completely labeled data?
    So you have a set where you know where the anomalies are?
    Or should you consider instances before and after an anomaly also as positive instances?

    I have done both some unsupervised and supervised anomaly detection myself.
    On my problem the clustering algorithm COBWEB was performing the best.

    On my supervised problems I found that on time series neural networks perform really good on attribute subsets.
    For example embedded attributes (created using windowing).

    I'm not sure about SVM.
    Can SVM's learn the relation between attributes?
    For example: can it find: ""anomaly pattern" att1 > att2?
    Or more hard: trend(att1) = increasing && trend(att2) = decreasing?
  • Options
    harri678harri678 Member Posts: 34 Contributor II
    Yes I have a completely labeled data set, the so called "anomalies" in my case are network attacks where traces should be visible in the data. At least I hope there are traces visible ;)
    For the thesis I want to focus on point anomalies and supervised learning only, unsupervised learning and time series analysis will be of interest for the follow-up project.

    Thanks for your ideas on the matter! I will have a look into it.

    I'm sure that SVM's can learn relations between attributes depending on the training data. I think "trends" belong to the category of time series analysis and I don't know yet if SVMs are of use in this area.
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Is there any way this patch can be applied on the latest version of RM, without recompiling? Maybe like a plugin?

    Also since It didn't make it to the current release what are the chances it will make the next round?

    Thanks,
    -Gagi
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Did this patch make it to RM 5.006?

    Thanks,
    -Gagi
  • Options
    dragoljubdragoljub Member Posts: 241 Contributor II
    Just bumping this. Wanted to know if this should be added to feature requests in the bug tracker.  ???

    -Gagi
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Gagi,
    the chances are good, I have it on my list for this week.

    Greetings,
      Sebastian
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    thank you all for your contributions, it finally made it to the core. The final version is slightly different from the patch, the "originl" behavior is now the default. An additional meaning in the label probably won't damage any existing processes.
    I also disabled the normalization of the confidence, since it caused the threshold to vary and not to ly on 0.5 as one would expect it.

    Greetings,
    Sebastian
Sign In or Register to comment.