Options

"store a ROC plot for each iteration of a subprocess"

Legacy UserLegacy User Member Posts: 0 Newbie
edited June 2019 in Help
Hello,

I routinely use ROC plots to compare different learning algorithms and parameters employed. When not using rapidMiner I generally dump multiple ROC plots to disk for every parameter value, feature selection round, etc.

I can't seem to find a way to do the same in RapidMiner. Is it possible?

I would prefer a solution that did not require to use the GUI, because even though I use it to design workflows, whenever I need to run RapidMiner on a full dataset I need to use it from the command line on a computing cluster.
Moreover, I usually prefer to store the raw data so I can then reproduce the plots in my graphic library of choice (R or matplotlib).

Hence, I was wondering if there was a way to automatically export or save to disk ROC plots (as images or even better as raw data)
For eg. in backward/forward attribute selection, I'd like to compare the ROC curve for every generation.

Things I have thought/tried so far:

- I don't see a 'write ROC' operator

- I tried using the 'write Performance' operator, but I find that RapidMiner cannot read the result file thus generated (neither opening it through the GUI or through the 'Read Performance' operator)

- I have thought of using 'write Performance' and then parse the resulting XML file via python outside of RapidMiner, but I still can't figure out how to write a separate file for every iteration of the subprocess. Is there a particular operator that can add a suffix to the filename and increment its value for every loop, or something similar?

Many thanks,
eli
Tagged:

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Eli,
    give the Reporting Extension a try. It offers a ReportGenerator to open a Report into various file formats. Then insert a Report operator to add a specific IOObject to the report. For example a plot of the roc chart.
    Of course you can additionally add text for example describing the current parameter setting. Macros help you a big deal there.

    Greetings,
      Sebastian
  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    Hi Sebastian,

    I tried using the report estension, but I cannot see an obvious way to output ROC curves (or their data)

    Setting the report operator to expect anything except a Performance Vector returns an error. The performance vector however returns only confusion matrix and the value of AUC but no curve data.

    Thanks,
    eli
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Eli,
    give this process a try:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="476" width="681">
          <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="random classification"/>
          </operator>
          <operator activated="true" class="compare_rocs" compatibility="5.0.8" expanded="true" height="76" name="Compare ROCs" width="90" x="179" y="30">
            <process expanded="true" height="608" width="894">
              <operator activated="true" class="decision_tree" compatibility="5.0.8" expanded="true" height="76" name="Decision Tree" width="90" x="112" y="30"/>
              <connect from_port="train 1" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model 1"/>
              <portSpacing port="source_train 1" spacing="0"/>
              <portSpacing port="source_train 2" spacing="0"/>
              <portSpacing port="sink_model 1" spacing="0"/>
              <portSpacing port="sink_model 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="reporting:generate_report" compatibility="5.0.2" expanded="true" height="76" name="Generate Report" width="90" x="313" y="30">
            <parameter key="report_name" value="test"/>
            <parameter key="pdf_output_file" value="c:\test.pdf"/>
          </operator>
          <operator activated="true" class="reporting:report" compatibility="5.0.2" expanded="true" height="60" name="Report" width="90" x="447" y="30">
            <parameter key="specified" value="true"/>
            <parameter key="reportable_type" value="ROC Comparison"/>
            <parameter key="renderer_name" value="ROC Comparison"/>
            <list key="parameters"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Compare ROCs" to_port="example set"/>
          <connect from_op="Compare ROCs" from_port="rocComparison" to_op="Generate Report" to_port="through 1"/>
          <connect from_op="Generate Report" from_port="through 1" to_op="Report" to_port="reportable in"/>
          <connect from_op="Report" from_port="reportable out" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
      Sebastian
  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    Hi Sebastian,

    thanks for the example process.
    is the ROC comparison the only way to get out a ROC plot?

    thanks,
    eli
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    currently: Yes.

    Greetings,
      Sebastian
Sign In or Register to comment.