RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

confusion matrix results

warwickwarwick Member Posts: 2 Contributor I
edited November 2018 in Help
Hi,

I am new to data mining and Rapid Miner.

I have made a model with a KNN in a x validation with categorical data.  I am getting a confusion matrix which doesn't appear to be standard and am having trouble understanding  it. Could someone explain what these numbers represent.

PerformanceVector:
accuracy: 72.73%
ConfusionMatrix:
True: US CA FR ES IT GB NL AU DE PT other
US: 0.027 0.028 0.062 0.039 0.041 0.038 0.012 0.017 0.019 0.012 0.016
CA: 0.000 0.107 0.001 0.000 0.001 0.001 0.000 0 0.000 0 0.000
FR: 0.001 0.001 0.056 0.002 0.002 0.002 0.000 0.000 0.000 0.001 0.001
ES: 0.000 0.001 0.002 0.087 0.001 0.001 0.001 0.001 0.000 0 0.000
IT: 0.000 0.001 0.003 0.002 0.082 0.002 0.000 0 0.001 0 0.001
GB: 0.000 0.000 0.002 0.001 0.001 0.087 0 0 0.001 0.001 0.001
NL: 0.000 0 0.001 0.001 0.001 0.000 0.117 0 0 0 0.000
AU: 0.000 0 0.001 0.000 0.000 0.000 0 0.115 0.000 0 0.000
DE: 0.000 0 0.001 0.001 0.001 0.001 0 0 0.114 0 0.000
PT: 0.000 0.000 0.000 0 0 0 0 0 0 0.115 0.000
other: 0.002 0.001 0.008 0.004 0.006 0.004 0 0 0.001 0 0.053
absolute_error: 0.273 +/- 0.000

thanks

Warwick

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,408  RM Data Scientist
    Please have a look at this video: http://docs.rapidminer.com/studio/getting-started/5-evaluating-model.html starting at ~12:30

    ~Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • warwickwarwick Member Posts: 2 Contributor I
    Hi Martin,

    Thanks for replying. I looked at the video . I understand the concepts of the confusion table. My question is why are the values I am creating in the confusion table so small? In the video the confusion table shows the number of examples that fall in each category( True True, True False etc). My values are very small ie. 0.027 so this is obviously not the case in my situation. What is it displaying instead?


    Thanks

    Warwick

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 570   Unicorn
    Without being able to see your process I'm betting that you use example weighting, right? 
    See the below example which uses the Generate Weight operator to make a confusion matrix similar to yours. 
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="7.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="7.0.001" expanded="true" height="68" name="Golf" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="7.0.001" expanded="true" height="82" name="Generate ID" width="90" x="179" y="34"/>
          <operator activated="true" class="generate_weight_stratification" compatibility="7.0.001" expanded="true" height="82" name="Generate Weight (Stratification)" width="90" x="246" y="187">
            <description align="center" color="yellow" colored="true" width="126">Cunning use of weights for a confusing confusion matrix.</description>
          </operator>
          <operator activated="true" class="x_validation" compatibility="7.0.001" expanded="true" height="124" name="Validation" width="90" x="313" y="34">
            <parameter key="number_of_validations" value="3"/>
            <parameter key="sampling_type" value="linear sampling"/>
            <process expanded="true">
              <operator activated="true" class="parallel_decision_tree" compatibility="7.0.001" expanded="true" height="82" name="Decision Tree" width="90" x="179" y="34"/>
              <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.0.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" compatibility="7.0.001" expanded="true" height="82" name="Performance (2)" width="90" x="180" y="30">
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="126"/>
            </process>
          </operator>
          <connect from_op="Golf" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Generate Weight (Stratification)" to_port="example set input"/>
          <connect from_op="Generate Weight (Stratification)" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <connect from_op="Validation" from_port="training" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    Try putting the weighting inside the training side of the XValidation. 
Sign In or Register to comment.