"View training and testing errors in RapidMiner"

rajanmrajanm Member Posts: 4 Contributor I
edited May 2019 in Help
Hi,

While training a neural net, it is important not to under or over train it. Is there any way, we can see the trend of testing and training errors. This way, the no of iterations can be stopped when the errors reach a right point.

Thanks.
Tagged:

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hello,

    yet, this is possible. Combine an evaluation methode like cross validation with the neural network learner as learning scheme and put it inside of a parameter optimization operator (e.g. the GridParameterOptimization) and let it optimize for the testing error (delivered by the cross validation).

    Optimizing the training errors does not make any sense at all and I do not get why so many "neural network followers" are so eager for optimizing the training error. That actually is already done by the neural network itself and is exactly the reason why neural networks are so likely to overfit.

    Cheers,
    Ingo
  • rajanmrajanm Member Posts: 4 Contributor I
    Hi Ingo,

    Thanks for the reply. I will try it out.

    As I had mentioned, I wanted to see the trend of errors in training a neural net. I think it is useful to see local minimas in the trend so that one knows how many time to train the net.

    I have used the cross validatin to train and validate th net. However, how do I know when to stop training the net to avoid things like overfitting.

    Thanks.
  • tektek Member Posts: 19 Contributor II
    Hi,

    actually that is a quite interesting question and I wonder if RM provides this function?

    The goal would be to get a graph that plots Training- and Testing-Error over (f.e.) number of neurons.
    In  Tan, P.-N.; Steinbach, M.; Kumar, V.: Introduction to Data Mining. 2006, they made a pretty good example about that with a decision tree (they used the number of nodes for optimization).
    As the numb er of nodes in the decision tree increases, the tree will have fewer tralnmg and test errors. However, once the tree becomes too large, its test error begins to increase even though its training error rate continues to decrease. This phenomenon is known as model overfitting.
    This picture is taken from wiki. I suppose the red line would be the test error and the blue line the training error:
    image

    Can RM generate such a graph?
  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    Yes this is possible, Log the performances, and the parameters, then switch the Log to Data, and use the Plot view to get the graph.

    Hope that works for you!
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    Can RM generate such a graph?
    Sure. (The answer to the question "Can RapidMiner do X?" always is "Sure".  ;) )

    I have created a process demonstrating this on myExperiment. You can easily download it from there with the Community Extension for RapidMiner. Search in the forum for "Community Extension" in order to get more information about that.

    The process is named "Plotting Training vs Testing Error (Loop + Log)" and the link on myExperiment is

    http://www.myexperiment.org/workflows/2292.html

    This process increases the parameter C of a support vector machine and hence also the risk for overfitting. It uses an outer loop operator for increasing the parameter value and an inner log operator for storing the current number of applications together with the current errors on the training and the testing data set. At the end of the process, the log data can be plotted (for example with the plotter "Scatter Multiple" with "Count" on the x-axis and both "Training Error" and "Testing Error" on the y-axis).

    The result is a picture like this:

    image

    Hope that helps. Cheers,
    Ingo
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Ok, that was pretty fast :-)
  • tektek Member Posts: 19 Contributor II
    Hey,

    yes, that did help a lot!

    I modified the process to fit for my Neural Net. Then I ran it a 25 step Loop for Training Cycles (10-100 000 logarithmic) and it produced this picture:

    image

    My ultimate goal is to produce an optimized set of parameters for a Neural Net using the Grid Optimization Operator and this overfitting process for cross checking.

    Is it save to say that in this example after the 17th iteration the overfitting begins to show effect?

    Which Neural Net Parameters are most likely to cause overfitting when modified?

    When I try to run an optimiziation on "Learning Rate", RM gives me an error msg like this: "Cannot reset network to a smaller learning rate". Whats this about?

    Thanks for your help!
  • wesselwessel Member Posts: 537 Maven
    tek wrote:

    Is it save to say that in this example after the 17th iteration the overfitting begins to show effect?
    No, in this figure over-fitting starts already at 1!

    If you look at this figure your network design is really wrong.
    Do you have lots of input variables?
    Maybe you have too many noisy inputs.
    If this is the case you could try attribute subset selection.

    In a way, deciding what attributes to use, can also be viewed as algorithm parameters.

    Best regards,

    Wessel
  • tektek Member Posts: 19 Contributor II
    wessel wrote:

    Do you have lots of input variables?
    Oh, yes. : )
    Maybe you have too many noisy inputs.
    Nope, no noise whatsoever.. but, the samples cover a pretty wide range of value (all of which are plausibel).

    I guess I have to try harder on attribute reduction. : )
  • tektek Member Posts: 19 Contributor II
    Hi there!

    I just created the following picture:

    image

    using this code:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="365" width="750">
          <operator activated="true" class="retrieve" compatibility="5.1.006" expanded="true" height="60" name="Retrieve" width="90" x="112" y="30">
            <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
          </operator>
          <operator activated="true" class="loop_parameters" compatibility="5.1.006" expanded="true" height="94" name="Loop Parameters" width="90" x="313" y="30">
            <list key="parameters">
              <parameter key="Neural Net.training_cycles" value="[1;1000000;100;logarithmic]"/>
            </list>
            <process expanded="true" height="365" width="882">
              <operator activated="true" class="split_data" compatibility="5.1.006" expanded="true" height="94" name="Split Data" width="90" x="45" y="120">
                <enumeration key="partitions">
                  <parameter key="ratio" value="0.3"/>
                  <parameter key="ratio" value="0.7"/>
                </enumeration>
              </operator>
              <operator activated="true" class="neural_net" compatibility="5.1.006" expanded="true" height="76" name="Neural Net" width="90" x="179" y="30">
                <list key="hidden_layers"/>
                <parameter key="training_cycles" value="1000000"/>
                <parameter key="decay" value="true"/>
                <parameter key="shuffle" value="false"/>
                <parameter key="error_epsilon" value="0.0"/>
              </operator>
              <operator activated="true" class="multiply" compatibility="5.1.006" expanded="true" height="94" name="Multiply" width="90" x="313" y="120"/>
              <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="255">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_regression" compatibility="5.1.006" expanded="true" height="76" name="Performance (2)" width="90" x="581" y="255"/>
              <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="447" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_regression" compatibility="5.1.006" expanded="true" height="76" name="Performance" width="90" x="581" y="30"/>
              <operator activated="true" class="log" compatibility="5.1.006" expanded="true" height="94" name="Log" width="90" x="715" y="120">
                <list key="log">
                  <parameter key="Count" value="operator.Neural Net.parameter.training_cycles"/>
                  <parameter key="TrainingError" value="operator.Performance.value.root_mean_squared_error"/>
                  <parameter key="TestError" value="operator.Performance (2).value.root_mean_squared_error"/>
                </list>
              </operator>
              <connect from_port="input 1" to_op="Split Data" to_port="example set"/>
              <connect from_op="Split Data" from_port="partition 1" to_op="Neural Net" to_port="training set"/>
              <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Neural Net" from_port="model" to_op="Multiply" to_port="input"/>
              <connect from_op="Neural Net" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Multiply" from_port="output 1" to_op="Apply Model" to_port="model"/>
              <connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="model"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 2"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_op="Log" to_port="through 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
              <portSpacing port="sink_result 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Loop Parameters" to_port="input 1"/>
          <connect from_op="Loop Parameters" from_port="result 1" to_port="result 1"/>
          <connect from_op="Loop Parameters" from_port="result 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    The parameter loop is running for Training Cycles from 1 to 1.000.000. The goal is to create overfitting for a Neural Net and validate it with this picture. The First half of the curve looks okay to me, Training and Testing Error both decrease. But for the second half I am missing the upswing of the testing error. The data set is the "polynominal" sample set from RM.

    Has anyone any ideas how I can increase the overfitting even more so that the testing error really goes wrong at the end? (Increasing the number of neurons doesnt work).
  • wesselwessel Member Posts: 537 Maven
    Hey,

    Add more layers, and even more neurons.
    Increase the network training time.

    Best regards,

    Wessel
  • wesselwessel Member Posts: 537 Maven
    Hey,

    Also you should add another loop around the "Loop Parameters" operator.
    And then change your "Log" operator to log the "Loop"-iteration.

    Like this you measure the effect of increasing the training time on multiple data splits.
    This will decrease the variability in your results.

    Best regards,

    Wessel
  • wesselwessel Member Posts: 537 Maven
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
        <process expanded="true" height="409" width="300">
          <operator activated="true" class="retrieve" compatibility="5.1.009" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
          </operator>
          <operator activated="true" class="loop_parameters" compatibility="5.1.009" expanded="true" height="76" name="Loop Parameters" width="90" x="179" y="30">
            <list key="parameters">
              <parameter key="Neural Net.training_cycles" value="[1;1000000;100;logarithmic]"/>
            </list>
            <process expanded="true" height="391" width="165">
              <operator activated="true" class="loop" compatibility="5.1.009" expanded="true" height="94" name="Loop" width="90" x="45" y="30">
                <parameter key="iterations" value="10"/>
                <process expanded="true" height="404" width="830">
                  <operator activated="true" class="split_data" compatibility="5.1.009" expanded="true" height="94" name="Split Data" width="90" x="45" y="165">
                    <enumeration key="partitions">
                      <parameter key="ratio" value="0.3"/>
                      <parameter key="ratio" value="0.7"/>
                    </enumeration>
                  </operator>
                  <operator activated="true" class="neural_net" compatibility="5.1.009" expanded="true" height="76" name="Neural Net" width="90" x="179" y="30">
                    <list key="hidden_layers"/>
                    <parameter key="training_cycles" value="15849"/>
                    <parameter key="decay" value="true"/>
                    <parameter key="shuffle" value="false"/>
                    <parameter key="error_epsilon" value="0.0"/>
                  </operator>
                  <operator activated="true" class="apply_model" compatibility="5.1.009" expanded="true" height="76" name="Apply Model" width="90" x="313" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="apply_model" compatibility="5.1.009" expanded="true" height="76" name="Apply Model (2)" width="90" x="447" y="165">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance_regression" compatibility="5.1.009" expanded="true" height="76" name="Performance (2)" width="90" x="581" y="165"/>
                  <operator activated="true" class="performance_regression" compatibility="5.1.009" expanded="true" height="76" name="Performance" width="90" x="581" y="30"/>
                  <operator activated="true" class="log" compatibility="5.1.009" expanded="true" height="94" name="Log" width="90" x="715" y="30">
                    <list key="log">
                      <parameter key="Count" value="operator.Neural Net.parameter.training_cycles"/>
                      <parameter key="TrainingError" value="operator.Performance.value.root_mean_squared_error"/>
                      <parameter key="TestError" value="operator.Performance (2).value.root_mean_squared_error"/>
                      <parameter key="OuterLoop" value="operator.Loop.value.iteration"/>
                    </list>
                  </operator>
                  <connect from_port="input 1" to_op="Split Data" to_port="example set"/>
                  <connect from_op="Split Data" from_port="partition 1" to_op="Neural Net" to_port="training set"/>
                  <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
                  <connect from_op="Neural Net" from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_op="Neural Net" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Apply Model" from_port="model" to_op="Apply Model (2)" to_port="model"/>
                  <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
                  <connect from_op="Performance (2)" from_port="performance" to_op="Log" to_port="through 2"/>
                  <connect from_op="Performance" from_port="performance" to_op="Log" to_port="through 1"/>
                  <connect from_op="Log" from_port="through 1" to_port="output 1"/>
                  <connect from_op="Log" from_port="through 2" to_port="output 2"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                  <portSpacing port="sink_output 3" spacing="0"/>
                </process>
              </operator>
              <connect from_port="input 1" to_op="Loop" to_port="input 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Loop Parameters" to_port="input 1"/>
          <connect from_op="Loop Parameters" from_port="result 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • wesselwessel Member Posts: 537 Maven
    image
  • wesselwessel Member Posts: 537 Maven
    Here is a plot with 2 layers, 33 hidden nodes in layer 1, 55 hidden nodes in layer 2.
    I have no idea why the neural network is not overfitting.
    Well it is overfitting, but only very little.

    image
  • tektek Member Posts: 19 Contributor II
    wessel wrote:


    Also you should add another loop around the "Loop Parameters" operator.
    And then change your "Log" operator to log the "Loop"-iteration.

    That is actually a pretty good idea. I was doing this by hand the last couple of hours. : ) Unfortunately, a 1.000.000 traincycle loop is a pretty big time sink.

    So here are my observations: With this method, overfitting is only recognizable with at least a 50:50 split. A higher split favoring the training will reduce the overfitting very hard. A higher split favoring the testing, will increase the overfitting very hard. The first "real" overfitting curve was reached at a 30:70 split. The overfitting is recognizable with a 50:50 split (that is the code I posted before). But you have to use the Error Histogram in addition to the scatter plot. Than one can see the increase in standard deviation of the testing error.

    Anyway, the NN function of RM seems very robust.. its quite hard to push it into overfitting without using a 3-day 100 billion training cycle loop. ; ) Has anyone any idea why that is?
  • wesselwessel Member Posts: 537 Maven
    My guess would be because of the data.
    The polynomial data set is very easy for a neural network to learn.

    Overfitting occurs when the learner is not able to fit the target function, and fits random noise instead.

    Best regards,

    Wessel
  • tektek Member Posts: 19 Contributor II
    But, if the target function is unkown, or lets say is of such complexity that it is not comprehensible for humans, yet., a NN should still be able to make predictions, right?

    I am thinking of a case where the underlying physical principle is not discovered yet (i.e. experimental data).
  • wesselwessel Member Posts: 537 Maven
    There is a proof that a neural network with two layers, and "sufficient" neurons in each layer is able to fit any arbitrary function.
    This does not mean it is actually able to learn it, but there is a configuration of weights such that the function is expressed.
Sign In or Register to comment.