ROC thresholds but not ROC curve out of feat.selection and Xval

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2019 in Help

Across a variety of learning processes, whenever I wrap the Performance operator in a xvalidation operator or a backward/forward selection operator, I can't seem to obtain ROC curves consistently.
All I get in the output are the contingency tables, an estimate of the AUC, the threshold curve, but not the actual ROC curve.

I compared my processes with the demo ones, but I can't see any difference. I have also tried using some demo data provided but to no avail.

Any clues?

A toy example is the following:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="612" width="570">
      <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Ripley-Set"/>
      <operator activated="true" class="weka:W-ChiSquaredAttributeEval" expanded="true" height="76" name="W-ChiSquaredAttributeEval" width="90" x="180" y="30"/>
      <operator activated="true" class="select_by_weights" expanded="true" height="94" name="Select by Weights" width="90" x="315" y="30">
        <parameter key="weight_relation" value="top k"/>
      <operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="450" y="30">
        <parameter key="leave_one_out" value="true"/>
        <process expanded="true" height="612" width="165">
          <operator activated="true" class="weka:W-RandomForest" expanded="true" height="76" name="W-RandomForest" width="90" x="45" y="30"/>
          <connect from_port="training" to_op="W-RandomForest" to_port="training set"/>
          <connect from_op="W-RandomForest" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        <process expanded="true" height="612" width="300">
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="180" y="120"/>
          <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
      <connect from_op="Retrieve" from_port="output" to_op="W-ChiSquaredAttributeEval" to_port="example set"/>
      <connect from_op="W-ChiSquaredAttributeEval" from_port="weights" to_op="Select by Weights" to_port="weights"/>
      <connect from_op="W-ChiSquaredAttributeEval" from_port="example set" to_op="Select by Weights" to_port="example set input"/>
      <connect from_op="Select by Weights" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>



  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    the red curve is simply hidden by the blue one, because the match exactly. The problem is that you are trying to average the roc curves of one single example because you are using Leave one out Cross Validation. The Roc Curve of one example always either goes up to the top on the left or at the end. There's no other possibility and hence they always match exactly. I would recommend switching LOOCV of and use a ten fold for getting a propper ROC estimate.

  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    Oops, that makes absolute sense. I copied the operator from another workflow without ROC and forgot to switch off LOO.

    Thank you Sebastian,
Sign In or Register to comment.