Options

Precision Recall Curves and auPRC

John_De_JongJohn_De_Jong Member Posts: 10 Contributor II
edited November 2018 in Help
Folks
I created a process to Retrieve Data and Perform Cross Validation using Fast Margin Classifier. My process definition is below. I wanted to know how to
1. log Precision, Recall points so that can be plotted using external software. I did use log operator but somehow only the precision, recall values are printed not for each instance.
2. Ability to get auPRC metric from the process.
Maybe there is some mistake in what i am doing, so any help from experts here is appreciated
Thanks
Johan
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
    <process expanded="true" height="500" width="752">
      <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="17" y="58">
        <parameter key="repository_entry" value="Acceptor3KData"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="112" y="165">
        <parameter key="name" value="class"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="replace" compatibility="5.1.001" expanded="true" height="76" name="Replace" width="90" x="313" y="210">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="class"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="replace_what" value="[0]"/>
        <parameter key="replace_by" value="1"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.1.001" expanded="true" height="130" name="Validation" width="90" x="447" y="75">
        <parameter key="number_of_validations" value="5"/>
        <process expanded="true" height="500" width="351">
          <operator activated="true" class="fast_large_margin" compatibility="5.1.001" expanded="true" height="76" name="Fast Large Margin" width="90" x="130" y="110">
            <list key="class_weights"/>
          </operator>
          <connect from_port="training" to_op="Fast Large Margin" to_port="training set"/>
          <connect from_op="Fast Large Margin" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="108"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="500" width="351">
          <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="63" y="25">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_classification" compatibility="5.1.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
            <parameter key="main_criterion" value="weighted_mean_recall"/>
            <parameter key="accuracy" value="false"/>
            <parameter key="weighted_mean_recall" value="true"/>
            <parameter key="weighted_mean_precision" value="true"/>
            <list key="class_weights"/>
          </operator>
          <operator activated="true" class="log" compatibility="5.1.001" expanded="true" height="76" name="Log" width="90" x="179" y="165">
            <parameter key="filename" value="C:\Output.log"/>
            <list key="log">
              <parameter key="recall" value="operator.Validation.value.performance1"/>
              <parameter key="precision" value="operator.Validation.value.performance2"/>
            </list>
            <parameter key="persistent" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <connect from_op="Log" from_port="through 1" to_port="averagable 2"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
          <portSpacing port="sink_averagable 3" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Replace" to_port="example set input"/>
      <connect from_op="Replace" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="training" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Johan,

    I don't understand what you mean by "are not printed for each instance"? Performance values are aggregates so they can't be shown per instance/example. If you anyway want to have each single result, you need to turn on Leave one out cross validation in the XValidation operator.
    See the below process that demonstrates how you can then write the logging results into a CSV file to plot it somewhere else.

    Unfortunately I'm not familiar with the auPRC. Don't know what it is?

    Greetings,
    Sebastian
  • Options
    John_De_JongJohn_De_Jong Member Posts: 10 Contributor II
    Sebastian
    Thanks for your reply. Maybe i couldn't express it right. Just like ROC curve has for each fold, TPR and FPR values, which can be plotted, i want to have Precision and Recall values, so that i can plot Precision-Recall curve for each fold. Just like Area under Curve for ROC curve, i want to get area under Precision recall curve.
    Thanks
    Johan
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Johan,

    sorry, but this isn't possible, yet. But of course it is possible to add it. You might either contribute the code yourself or contact us for an implementation quote.

    Greetings,
    Sebastian
Sign In or Register to comment.