Options

Classification evaluation of training and testingdata

venven Member Posts: 5 Contributor II
edited November 2018 in Help
Hello,

I train a naive bayes model by using a split validation (first training of naive bayes, then application of the model and evaluation by the performance operator). The result is one performance window with the results of the testing data.

But, as far as I know the software should return two performance windows. One for the results of the training and one for the testingdata. How can I see if my model is overfitting or compare the performance if I see the results of the testingdata without comparison to the training performance?

Thank you

Answers

  • Options
    awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    You can apply the model to the training set immediately after it is created, determine the performance and pass that to the outside using the "through" connections from inside the split validation operator.

    I made a simple example.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="467" width="815">
          <operator activated="true" class="retrieve" compatibility="5.3.000" expanded="true" height="60" name="Retrieve Iris" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Iris"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="5.3.000" expanded="true" height="130" name="Validation" width="90" x="179" y="30">
            <process expanded="true" height="809" width="442">
              <operator activated="true" class="naive_bayes" compatibility="5.3.000" expanded="true" height="76" name="Naive Bayes" width="90" x="45" y="30"/>
              <operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="Apply Model (3)" width="90" x="179" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.3.000" expanded="true" height="76" name="Performance (2)" width="90" x="246" y="165"/>
              <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model (3)" to_port="model"/>
              <connect from_op="Naive Bayes" from_port="exampleSet" to_op="Apply Model (3)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Apply Model (3)" from_port="model" to_port="model"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="through 1"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
              <portSpacing port="sink_through 2" spacing="0"/>
            </process>
            <process expanded="true" height="809" width="442">
              <operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="Apply Model" width="90" x="112" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="5.3.000" expanded="true" height="76" name="Performance" width="90" x="246" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_port="through 1" to_port="averagable 2"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="source_through 2" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
              <portSpacing port="sink_averagable 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve Iris" from_port="output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
          <connect from_op="Validation" from_port="averagable 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>


    regards

    Andrew
Sign In or Register to comment.