Need a working example of Find Threshold (Meta) operator in RapidMiner

kypexin · August 2015

I've been working with text classification processes in RapidMiner and I can't figure out the proper way to use Find Threshold (Meta) operator for multiclass classification which seems to be the closest one to implement Threshold family operators used for binary classification.

I am using k-NN models and have 11 different classes and a corpus of about 300-500 text documents as test dataset.

Specifically, I don't see any impact of putting a learner inside the operator since performance values are always the same, whether I do assign any weights to the classes or not. Moreover, there's no explanation what are the weights of classes are. And moreover, I don't see any way to extract (possibly) generated thresholds as the output of this operator in order to apply them to the model. And there's no RapidMiner documentation entry for this operator at all.

Does anyone have a working example of Find Threshold (Meta) operator so far?

MartinLiebig · August 2015

Hi there,

i never used the meta one. What is the reason not to use the standard one?

Best,
Martin

JEdward · August 2015

I actually don't use either Find Threshold operator as I like to also produce a table showing the various results & have flexibility to choose more than just misclassification costs.

Instead I use Optimise Parameters combined with Create Threshold to test various options for the threshold and select the one that delivers the best performance.

Here is a short version of what I use:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.4.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="6.4.000" expanded="true" height="60" name="Retrieve Ripley-Set" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Ripley-Set"/>
      </operator>
      <operator activated="true" class="nominal_to_binominal" compatibility="6.4.000" expanded="true" height="94" name="Nominal to Binominal" width="90" x="179" y="30">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="logistic_regression" compatibility="6.4.000" expanded="true" height="94" name="Logistic Regression" width="90" x="313" y="30"/>
      <operator activated="true" class="apply_model" compatibility="6.4.000" expanded="true" height="76" name="Apply Model" width="90" x="447" y="30">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="performance_binominal_classification" compatibility="6.4.000" expanded="true" height="76" name="Original Performance" width="90" x="581" y="30">
        <parameter key="main_criterion" value="kappa"/>
        <parameter key="classification_error" value="true"/>
        <parameter key="kappa" value="true"/>
        <parameter key="precision" value="true"/>
        <parameter key="recall" value="true"/>
        <parameter key="lift" value="true"/>
        <parameter key="fallout" value="true"/>
        <parameter key="f_measure" value="true"/>
        <parameter key="false_positive" value="true"/>
        <parameter key="false_negative" value="true"/>
        <parameter key="true_positive" value="true"/>
        <parameter key="true_negative" value="true"/>
        <parameter key="sensitivity" value="true"/>
        <parameter key="specificity" value="true"/>
        <parameter key="youden" value="true"/>
        <parameter key="positive_predictive_value" value="true"/>
        <parameter key="negative_predictive_value" value="true"/>
        <parameter key="skip_undefined_labels" value="false"/>
        <parameter key="use_example_weights" value="false"/>
      </operator>
      <operator activated="true" class="optimize_parameters_grid" compatibility="6.4.000" expanded="true" height="94" name="Optimize Parameters (Grid)" width="90" x="648" y="165">
        <list key="parameters">
          <parameter key="TryThreshold.threshold" value="[0.0;1.0;20;linear]"/>
        </list>
        <process expanded="true">
          <operator activated="true" class="create_threshold" compatibility="6.4.000" expanded="true" height="60" name="TryThreshold" width="90" x="45" y="165">
            <parameter key="threshold" value="1.0"/>
            <parameter key="first_class" value="1"/>
            <parameter key="second_class" value="0"/>
          </operator>
          <operator activated="true" class="apply_threshold" compatibility="6.4.000" expanded="true" height="76" name="Apply Threshold (2)" width="90" x="179" y="30"/>
          <operator activated="true" class="performance_binominal_classification" compatibility="6.4.000" expanded="true" height="76" name="Best Threshold" width="90" x="313" y="30">
            <parameter key="main_criterion" value="kappa"/>
            <parameter key="classification_error" value="true"/>
            <parameter key="kappa" value="true"/>
            <parameter key="precision" value="true"/>
            <parameter key="recall" value="true"/>
            <parameter key="lift" value="true"/>
            <parameter key="fallout" value="true"/>
            <parameter key="f_measure" value="true"/>
            <parameter key="false_positive" value="true"/>
            <parameter key="false_negative" value="true"/>
            <parameter key="true_positive" value="true"/>
            <parameter key="true_negative" value="true"/>
            <parameter key="sensitivity" value="true"/>
            <parameter key="specificity" value="true"/>
            <parameter key="youden" value="true"/>
            <parameter key="positive_predictive_value" value="true"/>
            <parameter key="negative_predictive_value" value="true"/>
            <parameter key="skip_undefined_labels" value="false"/>
            <parameter key="use_example_weights" value="false"/>
          </operator>
          <operator activated="true" class="log" compatibility="6.4.000" expanded="true" height="76" name="Log" width="90" x="447" y="30">
            <list key="log">
              <parameter key="confidence_threshold" value="operator.TryThreshold.parameter.threshold"/>
              <parameter key="accuracy" value="operator.Best Threshold.value.accuracy"/>
              <parameter key="true_negative" value="operator.Best Threshold.value.true_negative"/>
              <parameter key="false_negative" value="operator.Best Threshold.value.false_negative"/>
              <parameter key="true_positive" value="operator.Best Threshold.value.true_positive"/>
              <parameter key="false_positive" value="operator.Best Threshold.value.false_positive"/>
              <parameter key="sensitivity" value="operator.Best Threshold.value.sensitivity"/>
              <parameter key="specificity" value="operator.Best Threshold.value.specificity"/>
              <parameter key="precision" value="operator.Best Threshold.value.precision"/>
              <parameter key="recall" value="operator.Best Threshold.value.recall"/>
            </list>
          </operator>
          <connect from_port="input 1" to_op="Apply Threshold (2)" to_port="example set"/>
          <connect from_op="TryThreshold" from_port="output" to_op="Apply Threshold (2)" to_port="threshold"/>
          <connect from_op="Apply Threshold (2)" from_port="example set" to_op="Best Threshold" to_port="labelled data"/>
          <connect from_op="Best Threshold" from_port="performance" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="performance"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="log_to_data" compatibility="6.4.000" expanded="true" height="94" name="Tested Threshold Table" width="90" x="782" y="120"/>
      <connect from_op="Retrieve Ripley-Set" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
      <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Logistic Regression" to_port="training set"/>
      <connect from_op="Logistic Regression" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Logistic Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Original Performance" to_port="labelled data"/>
      <connect from_op="Original Performance" from_port="performance" to_port="result 1"/>
      <connect from_op="Original Performance" from_port="example set" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
      <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_op="Tested Threshold Table" to_port="through 1"/>
      <connect from_op="Optimize Parameters (Grid)" from_port="parameter" to_port="result 4"/>
      <connect from_op="Tested Threshold Table" from_port="exampleSet" to_port="result 2"/>
      <connect from_op="Tested Threshold Table" from_port="through 1" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="54"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="54"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

MartinLiebig · August 2015

Hi John,

i think your process is dangouerous, because you do not use a x-validation to ensure quality. This will tend to overestimate your performances.

~Martin

JEdward · August 2015

Yes I agree, this is a very shortened version of the process.
I removed all the X-Validations + number formatting and some other stuff. It's just as a demo of the use of Create Threshold.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Need a working example of Find Threshold (Meta) operator in RapidMiner

Answers