How to plot a Learning Curve for a given model?


How to plot a Learning Curve for a given model?

[ Edited ]

Hi all,


I am new in RapidMiner Studio and I am trying to figure out how to plot a learning curve for a given model (basically plot the performance for training and testing as a function of the number of examples). In principle the learning curve would be a good indicator for the robusteness of the model (showing the bias versus variance problem).

I could not find in RapidMiner an operator or some video examples on this issue. I tried getting some information using the Log operator after my Cross Validation operator in order to plot afterwards, but without success.

Any guidance would be very much appreciated.




See more topics labeled with:


Re: How to plot a Learning Curve for a given model?

The learning curve operator has been deprecated since about v7.3. Let me see if I can find a process that creates this for you.


Re: How to plot a Learning Curve for a given model?

Hi Amaury,

one option would be embedding your validation process in a loop operator iterating over the number of example chunks you'd like to test. Within the loop operator you can use Generate Macro to set the number of examples you want to use for the given iteration by using e.g., this funciton expression "eval(%{iteration})*100".

as a macro called "stop" to apply your validation on chunks of 100 examples. Afterwards select the event with the Filter Example Range operator set to start at example 1 and use %{stop} as the last example. Add a Log operator (as you've tried already) after your Validation process and log both the %{stop} (the x-axis of your plot) and the desired performance. You can extract the desired performance from the Cross Validation operator by choosing Cross Validation, value and performance 1 within the log operator. This will be the y-axis of your plot.
After running such a process you'll retrieve an example set containing the data you logged. Choose a scatter plot with your score on the y-axis and the number of examples on the x-axis. Done.


Your process could look something like this:General processGeneral processInside the loop operatorInside the loop operator



Plotting of the resulting ExampleSetPlotting of the resulting ExampleSet


Hope this solves your problem,


Elite III

Re: How to plot a Learning Curve for a given model?

You can also simply add number of examples as a Sample parameter (absolute number) in the Optimization operator, then set the appropriate range of examples, and then log the performance output for each of the sample runs.



Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts

Re: How to plot a Learning Curve for a given model?



I use the operator "Loop Parameters" for this and the inner "Sample" operator uses ratios between 5% and 100%.  Make sure that you evaluate the model with a cross-validation with a fixed local random seed since otherwise the influence of the data splits might be bigger than that of the additional examples...


Below is a process which you can use as a building block for this.


Hope this helps,



<?xml version="1.0" encoding="UTF-8"?><process version="7.5.001">
  <operator activated="true" class="process" compatibility="7.5.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="generate_data" compatibility="7.5.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
        <parameter key="target_function" value="global and local models classification"/>
        <parameter key="number_examples" value="10000"/>
        <parameter key="number_of_attributes" value="2"/>
      <operator activated="true" class="add_noise" compatibility="7.5.001" expanded="true" height="103" name="Add Noise" width="90" x="179" y="34">
        <parameter key="random_attributes" value="20"/>
        <list key="noise"/>
      <operator activated="true" class="loop_parameters" compatibility="7.5.001" expanded="true" height="82" name="Loop Parameters" width="90" x="313" y="34">
        <list key="parameters">
          <parameter key="Sample.sample_ratio" value="[0.05;1.0;19;linear]"/>
        <process expanded="true">
          <operator activated="true" class="sample" compatibility="7.5.001" expanded="true" height="82" name="Sample" width="90" x="45" y="34">
            <parameter key="sample" value="relative"/>
            <parameter key="sample_ratio" value="1.0"/>
            <list key="sample_size_per_class"/>
            <list key="sample_ratio_per_class"/>
            <list key="sample_probability_per_class"/>
          <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.001" expanded="true" height="145" name="Cross Validation" width="90" x="179" y="34">
            <parameter key="use_local_random_seed" value="true"/>
            <process expanded="true">
              <operator activated="true" class="naive_bayes" compatibility="7.5.001" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"/>
              <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.5.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
                <list key="application_parameters"/>
              <operator activated="true" class="performance" compatibility="7.5.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
          <operator activated="true" class="log" compatibility="7.5.001" expanded="true" height="82" name="Log" width="90" x="313" y="34">
            <list key="log">
              <parameter key="ratio" value="operator.Sample.parameter.sample_ratio"/>
              <parameter key="performance" value="operator.Cross Validation.value.performance main criterion"/>
          <connect from_port="input 1" to_op="Sample" to_port="example set input"/>
          <connect from_op="Sample" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Cross Validation" from_port="performance 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="performance"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
      <connect from_op="Generate Data" from_port="output" to_op="Add Noise" to_port="example set input"/>
      <connect from_op="Add Noise" from_port="example set output" to_op="Loop Parameters" to_port="input 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>

How to load processes in XML from the forum into RapidMiner: Read this!