Options

ROC chart on test data

csoarescsoares Member Posts: 13 Contributor II
edited April 2020 in Help
Hi,
I created a process (5.2) with a simple validation operator and I'm trying to generate a roc chart only for the test set without success. What should I do?
Any help will be highly appreciated.
Regards,
Carlos

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Carlos,

    I am assuming that you use a Performance operator inside of the testing subprocess of the Simple Validation operator and you problem is actually binominal / binary. Then the delivered performance object will automatically contain also the ROC plot (select "AUC" in the visualization of the performance) which has been calculated on the testing data only.

    Here is a process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Root">
        <process expanded="true" height="486" width="299">
          <operator activated="true" class="generate_direct_mailing_data" compatibility="5.2.000" expanded="true" height="60" name="DirectMailingExampleSetGenerator" width="90" x="45" y="30">
            <parameter key="number_examples" value="10000"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="5.2.000" expanded="true" height="112" name="SimpleValidation" width="90" x="179" y="30">
            <process expanded="true" height="626" width="378">
              <operator activated="true" class="naive_bayes" compatibility="5.2.000" expanded="true" height="76" name="NaiveBayes" width="90" x="144" y="30"/>
              <connect from_port="training" to_op="NaiveBayes" to_port="training set"/>
              <connect from_op="NaiveBayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="626" width="378">
              <operator activated="true" class="apply_model" compatibility="5.2.000" expanded="true" height="76" name="ModelApplier" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="ModelApplier" to_port="model"/>
              <connect from_port="test set" to_op="ModelApplier" to_port="unlabelled data"/>
              <connect from_op="ModelApplier" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="DirectMailingExampleSetGenerator" from_port="output" to_op="SimpleValidation" to_port="training"/>
          <connect from_op="SimpleValidation" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Things become more difficult if you want to show a Lift chart instead of the ROC curve. Since the Simple Validation can only deliver performance vectors to the outside, you have to use a pair of the operators Remember and Recall. There is an example for this in the sample repository delivered with RapidMiner under //Samples/processes/03_Validation/14_LiftChart. Or here is directly the XML for this process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Root">
        <process expanded="true" height="584" width="962">
          <operator activated="true" class="generate_direct_mailing_data" compatibility="5.2.000" expanded="true" height="60" name="DirectMailingExampleSetGenerator" width="90" x="45" y="30">
            <parameter key="number_examples" value="10000"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="5.2.000" expanded="true" height="112" name="SimpleValidation" width="90" x="180" y="30">
            <process expanded="true" height="626" width="378">
              <operator activated="true" class="naive_bayes" compatibility="5.2.000" expanded="true" height="76" name="NaiveBayes" width="90" x="144" y="30"/>
              <connect from_port="training" to_op="NaiveBayes" to_port="training set"/>
              <connect from_op="NaiveBayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="626" width="378">
              <operator activated="true" class="create_lift_chart" compatibility="5.2.000" expanded="true" height="94" name="LiftParetoChart" width="90" x="45" y="30">
                <parameter key="target_class" value="response"/>
              </operator>
              <operator activated="true" class="remember" compatibility="5.2.000" expanded="true" height="60" name="IOStorer" width="90" x="180" y="30">
                <parameter key="name" value="Lift Chart"/>
                <parameter key="io_object" value="LiftParetoChart"/>
              </operator>
              <operator activated="true" class="apply_model" compatibility="5.2.000" expanded="true" height="76" name="ModelApplier" width="90" x="45" y="210">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.2.000" expanded="true" height="76" name="Performance" width="90" x="179" y="210"/>
              <connect from_port="model" to_op="LiftParetoChart" to_port="model"/>
              <connect from_port="test set" to_op="LiftParetoChart" to_port="example set"/>
              <connect from_op="LiftParetoChart" from_port="example set" to_op="ModelApplier" to_port="unlabelled data"/>
              <connect from_op="LiftParetoChart" from_port="model" to_op="ModelApplier" to_port="model"/>
              <connect from_op="LiftParetoChart" from_port="lift pareto chart" to_op="IOStorer" to_port="store"/>
              <connect from_op="ModelApplier" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="5.2.000" expanded="true" height="60" name="IORetriever" width="90" x="315" y="30">
            <parameter key="name" value="Lift Chart"/>
            <parameter key="io_object" value="LiftParetoChart"/>
          </operator>
          <connect from_op="DirectMailingExampleSetGenerator" from_port="output" to_op="SimpleValidation" to_port="training"/>
          <connect from_op="SimpleValidation" from_port="averagable 1" to_port="result 2"/>
          <connect from_op="IORetriever" from_port="result" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    Hope that helps,
    Ingo
  • Options
    csoarescsoares Member Posts: 13 Contributor II
    Hi Ingo,
    this is exactly what I wanted but couldn't get because I was using the Performance (classification) operator.
    Thanks,
    Carlos
Sign In or Register to comment.