Precision Recall Curves

John_De_JongJohn_De_Jong Member Posts: 10 Contributor II
edited November 2018 in Help
Just like with Cross Validation and using performance, we get ROC curves with TPR, FPR, how can we get the Precision,Recall curves? I dont want the average precision and recall, but curves for each fold. Can anyone suggest, please
Thanks
uday
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <parameter key="parallelize_main_process" value="false"/>
    <process expanded="true" height="500" width="752">
      <operator activated="true" class="retrieve" compatibility="5.1.001" expanded="true" height="60" name="Retrieve" width="90" x="17" y="58">
        <parameter key="repository_entry" value="Acceptor3KData"/>
      </operator>
      <operator activated="true" class="set_role" compatibility="5.1.001" expanded="true" height="76" name="Set Role" width="90" x="112" y="165">
        <parameter key="name" value="class"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="replace" compatibility="5.1.001" expanded="true" height="76" name="Replace" width="90" x="313" y="210">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="class"/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="nominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="file_path"/>
        <parameter key="block_type" value="single_value"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="single_value"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="replace_what" value="[0]"/>
        <parameter key="replace_by" value="1"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.1.001" expanded="true" height="130" name="Validation" width="90" x="447" y="75">
        <parameter key="create_complete_model" value="false"/>
        <parameter key="average_performances_only" value="false"/>
        <parameter key="leave_one_out" value="false"/>
        <parameter key="number_of_validations" value="5"/>
        <parameter key="sampling_type" value="stratified sampling"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="parallelize_training" value="false"/>
        <parameter key="parallelize_testing" value="false"/>
        <process expanded="true" height="500" width="351">
          <operator activated="true" class="fast_large_margin" compatibility="5.1.001" expanded="true" height="76" name="Fast Large Margin" width="90" x="130" y="110">
            <parameter key="solver" value="L2 SVM Dual"/>
            <parameter key="C" value="1.0"/>
            <parameter key="epsilon" value="0.01"/>
            <list key="class_weights"/>
            <parameter key="use_bias" value="true"/>
          </operator>
          <connect from_port="training" to_op="Fast Large Margin" to_port="training set"/>
          <connect from_op="Fast Large Margin" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="108"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="500" width="351">
          <operator activated="true" class="apply_model" compatibility="5.1.001" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="performance_binominal_classification" compatibility="5.1.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30">
            <parameter key="main_criterion" value="accuracy"/>
            <parameter key="accuracy" value="true"/>
            <parameter key="classification_error" value="true"/>
            <parameter key="kappa" value="true"/>
            <parameter key="AUC (optimistic)" value="true"/>
            <parameter key="AUC" value="true"/>
            <parameter key="AUC (pessimistic)" value="true"/>
            <parameter key="precision" value="true"/>
            <parameter key="recall" value="true"/>
            <parameter key="lift" value="false"/>
            <parameter key="fallout" value="false"/>
            <parameter key="f_measure" value="true"/>
            <parameter key="false_positive" value="true"/>
            <parameter key="false_negative" value="true"/>
            <parameter key="true_positive" value="true"/>
            <parameter key="true_negative" value="true"/>
            <parameter key="sensitivity" value="true"/>
            <parameter key="specificity" value="true"/>
            <parameter key="youden" value="false"/>
            <parameter key="positive_predictive_value" value="true"/>
            <parameter key="negative_predictive_value" value="true"/>
            <parameter key="psep" value="false"/>
            <parameter key="skip_undefined_labels" value="true"/>
            <parameter key="use_example_weights" value="true"/>
          </operator>
          <operator activated="true" class="log" compatibility="5.1.001" expanded="true" height="76" name="Log" width="90" x="179" y="165">
            <parameter key="filename" value="C:\Output.log"/>
            <list key="log">
              <parameter key="recall" value="operator.Validation.value.performance1"/>
              <parameter key="precision" value="operator.Validation.value.performance2"/>
            </list>
            <parameter key="sorting_type" value="none"/>
            <parameter key="sorting_k" value="100"/>
            <parameter key="persistent" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <connect from_op="Log" from_port="through 1" to_port="averagable 2"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
          <portSpacing port="sink_averagable 3" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Replace" to_port="example set input"/>
      <connect from_op="Replace" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="training" to_port="result 1"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Answers

  • John_De_JongJohn_De_Jong Member Posts: 10 Contributor II
    Any answers? I would really appreciate any response on getting Precision Recall curves from Rapid Miner. I am kind of stuck as in Bioinformatics it is not ROC but PRC curves that are important. I am waiting to get some information so that i can run and get the output.
    Thanks again
    Johan
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Johan,

    sorry for that delay, but we are very busy right now with many projects. Unfortunately we cannot spend that much time for supporting community members as we want to.
    Currently there's now integrated way of building such curves. There are three possibilities: You could implement it yourself and possibly contributing the code to the public, giving something back to the community. Or you could ask us for a quote for implementing it for you. The last possibility is to build a process that will create such a plot as outcome.
    If it's completely analog to the AUROC, you will have to first sort after some of the confidences and then always increase a counter if it was a correct prediction. You could derive a dataset from this. But this indeed needs some macro generation, Set Data Operators and Loop Examples. Will be a quite sophisticated process...

    Greetings,
      Sebastian
  • John_De_JongJohn_De_Jong Member Posts: 10 Contributor II
    Thanks for your reply!
    I found a way to do it, here it is for others
    1. Modified the LibLinear Java code in method predict to pass the label and print the decision value for the label
     public static int predict(Model model, FeatureNode[] x, int label) {
            double[] dec_values = new double[model.nr_class];
            int prediction = predictValues(model, x, dec_values);
            try{
            predictionOutputWriter.write(dec_values[0]*model.label[0] + "\t" +  label +"\n");

            }catch(Exception e){e.printStackTrace();}
            return prediction;
        }
    2. Have predictionOutputWriter initialized to write to data file
    3. Use http://mark.goadrich.com/programs/AUC/ to get both precision-recall curves or accuracy curves from the above predictions

    Sebastian
    I would love to contribute this in package in summer
    Thanks
    Johan
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    contributing this would be great. Probably one can make it easier if introducing this code into one operator that generates an appropriate ResultObject like the ROC operator does.

    Greetings,
      Sebastian
  • kedypolykedypoly Member Posts: 1 Contributor I

    Hi 5 years later; same question. Has there been any progress on this yet? Any ways of getting a PR curve out of rapidminer? 

  • RNarayanRNarayan Member Posts: 4 Contributor I
    Hello,

    Has there ben any progress on an operator to get the PRC out of RM?

    Thanks
    Narayan
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    You need to install the Operator Toolbox (free extension) and there is an operator in that called Performance (AUPRC)
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.