questions on "Apply Model" operator and predicted label

huaiyanggongzihuaiyanggongzi Member Posts: 39 Contributor II
I use "Apply Model" operator to predict the test data set. The generated results normally includes three types of information ( confidence  (positive class), confidence (negative class), predicted label).

Naturally, when confidence (positive class) is larger than confidence (negative class), the prediction label is positive.

But I found a lot of cases ( using libsvm for text classification), even when confidence (positive ) is smaller than confidence (negative class), the prediction label is still positive. I would like to know why?

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Actually, I have never seen such a case with a plain create model/apply model cycle. Anyway, you can define manual thresholds e.g. with Create Threshold and Apply Threshold, or shift the thresholds in a more sophisticated way with e.g. Choose Recall or other cost-sensitive learning schemes.

    Best regards,
    Marius
  • huaiyanggongzihuaiyanggongzi Member Posts: 39 Contributor II
    Hi, thanks for the reply.

    the following is the result of running the "apply model" operator. The model was training using LIBSVM operator.  I just posted part of the result which shows the observation I mentioned in the original post, i.e., even the confidence (R) is smaller than confidence (NR), the prediction is still R.


    confidence(R)  confidence(NR) Prediction(Label)
    0.528462399 0.471537601 R
    0.524106922 0.475893078 R
    0.516740761 0.483259239 R
    0.509868083 0.490131917 R
    0.505252829 0.494747171 R
    0.493653526 0.506346474 R
    0.485416242 0.514583758 R
    0.475031465 0.524968535 R
    0.466340913 0.533659087 R
    0.459370807 0.540629193 R
    0.458747466 0.541252534 R
    0.4577908 0.5422092 R
    0.435570459 0.564429541 R
    0.432716957 0.567283043 R
    0.42963305 0.57036695 R
    0.422826691 0.577173309 R
    0.412345117 0.587654883 R
    0.404687872 0.595312128 R
    0.40221958 0.59778042 R
    0.39865042 0.60134958 R
    0.398228918 0.601771082 R
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hm, interesting. Can you please post your process xml as described in my signature?

    Best regards,
    Marius
  • huaiyanggongzihuaiyanggongzi Member Posts: 39 Contributor II
    The following is the process that I have been using for scoring process.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.011">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
        <parameter key="parallelize_main_process" value="true"/>
        <process expanded="true" height="386" width="711">
          <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
            <parameter key="repository_entry" value="SVM_Train_F_words_unigram_tf"/>
          </operator>
          <operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="179" y="75">
            <list key="text_directories">
              <parameter key="R" value="E:\R_Validation"/>
              <parameter key="NR" value="E:\NR_Validation"/>
            </list>
            <parameter key="extract_text_only" value="false"/>
            <parameter key="vector_creation" value="Term Frequency"/>
            <parameter key="prune_below_absolute" value="5"/>
            <parameter key="prune_above_absolute" value="5000000"/>
            <parameter key="parallelize_vector_creation" value="true"/>
            <process expanded="true" height="362" width="674">
              <operator activated="true" class="text:tokenize" compatibility="5.1.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
              <operator activated="true" class="text:transform_cases" compatibility="5.1.002" expanded="true" height="60" name="Transform Cases (2)" width="90" x="180" y="30"/>
              <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="73"/>
              <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
              <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
              <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
              <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
              <portSpacing port="source_document" spacing="0"/>
              <portSpacing port="sink_document 1" spacing="0"/>
              <portSpacing port="sink_document 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="300">
            <parameter key="repository_entry" value="SVM_Train_F_model_unigram_tf"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="447" y="75">
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="210">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|confidence(non_res)|confidence(res)|label|prediction(label)"/>
          </operator>
          <operator activated="true" class="write_csv" compatibility="5.1.011" expanded="true" height="60" name="Write CSV" width="90" x="581" y="165">
            <parameter key="csv_file" value="E:\Project\svmscore.csv"/>
            <parameter key="column_separator" value=","/>
            <parameter key="quote_nominal_values" value="false"/>
            <parameter key="format_date_attributes" value="false"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Process Documents from Files (2)" to_port="word list"/>
          <connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="result 2"/>
          <connect from_op="Performance" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Write CSV" to_port="input"/>
          <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Whoo, you are using RapidMiner 5.1. In a few days RapidMiner 5.3 will be released - I strongly encourage you to update to the latest version (5.2.8) and try again. Please leave a note in this thread if your problem persists or if everything is working fine now.

    Best regards,
    Marius

  • huaiyanggongzihuaiyanggongzi Member Posts: 39 Contributor II
    Thanks, Marius. I will give it another try after updating Rapidminer

    By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    huaiyanggongzi wrote:
    By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.
    Unfortunately, that's not possible. The confidence is an indicator for that, but the exact distance cannot be output.

    Best regards,
    Marius
Sign In or Register to comment.