Options

Manual inspection of missclassified examples

Carl_GranströmCarl_Granström Member Posts: 3 Newbie
Hello,


I'm trying to find out how, after training a classification model, I can look at the examples that were incorrectly classified. For now I can only see how many examples were incorrectly classified in the confusion matrix, but I want to inspect the missclassified examples manually. Since evaluation vector does not seem to be able to store such information I guess I need to somehow add another operator to achieve this, if it's even possible (which, in my own opinion, feels like a very basic feature, so I'm hoping it's there somewhere).


Kind regards,

Carl

Best Answer

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted
    hi @Carl_Granström hmm well that does sound very basic. Funny thing is that I moderate this forum and have been on it for years - I cannot recall anyone asking! :smile:

    Anyway it's pretty easy. I would just put a Filter Examples on the end like this:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000-BETA4">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.000-BETA4" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.000-BETA4" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          </operator>
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="9.5.000-BETA4" expanded="true" height="82" name="Apply Model" width="90" x="380" y="34">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Filter Examples" width="90" x="514" y="34">
            <parameter key="parameter_expression" value="Survived!=[prediction(Survived)]"/>
            <parameter key="condition_class" value="expression"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list"/>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
            <description align="center" color="yellow" colored="true" width="126">here's where I only find incorrect predictions</description>
          </operator>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    Scott

Answers

  • Options
    Carl_GranströmCarl_Granström Member Posts: 3 Newbie
    So I have a further question: can this be done inside the Validation operator somehow?
  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited November 2019
    Hello @Carl_Granström

    You need to connect the "Exa" port of the "Performance" Operator inside the validation to the "tes" port. Then you connect the "Tes" output of cross-validation operator to the process output or filter examples as Scott did in earlier example.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    Carl_GranströmCarl_Granström Member Posts: 3 Newbie
    Ah, thank you varunm1. Unfortunately I don't want to use the Cross-validation operator, and neither the Validation or Split Validation operators have an outgoing tes port.
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi Carl,

    In deed, Split Validation operator has no tes output port.
    But you can extract the test set using the association Remember/Recall operators.

    Take a look at this process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.000" expanded="true" height="68" name="Retrieve" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.5.000" expanded="true" height="82" name="Generate ID" origin="GENERATED_TUTORIAL" width="90" x="246" y="30">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="9.5.000" expanded="true" height="124" name="Validation" origin="GENERATED_TUTORIAL" width="90" x="447" y="30">
            <parameter key="create_complete_model" value="false"/>
            <parameter key="split" value="absolute"/>
            <parameter key="split_ratio" value="0.7"/>
            <parameter key="training_set_size" value="10"/>
            <parameter key="test_set_size" value="-1"/>
            <parameter key="sampling_type" value="linear sampling"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.4.000" expanded="true" height="103" name="Decision Tree" origin="GENERATED_TUTORIAL" width="90" x="112" y="30">
                <parameter key="criterion" value="gain_ratio"/>
                <parameter key="maximal_depth" value="10"/>
                <parameter key="apply_pruning" value="true"/>
                <parameter key="confidence" value="0.1"/>
                <parameter key="apply_prepruning" value="true"/>
                <parameter key="minimal_gain" value="0.01"/>
                <parameter key="minimal_leaf_size" value="2"/>
                <parameter key="minimal_size_for_split" value="4"/>
                <parameter key="number_of_prepruning_alternatives" value="3"/>
              </operator>
              <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="performance" compatibility="9.5.000" expanded="true" height="82" name="Performance" origin="GENERATED_TUTORIAL" width="90" x="179" y="30">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <operator activated="true" class="remember" compatibility="9.5.000" expanded="true" height="68" name="Remember" width="90" x="380" y="85">
                <parameter key="name" value="test_set"/>
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="store_which" value="1"/>
                <parameter key="remove_from_process" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <connect from_op="Performance" from_port="example set" to_op="Remember" to_port="store"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="9.5.000" expanded="true" height="68" name="Recall" width="90" x="581" y="136">
            <parameter key="name" value="test_set"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="remove_from_store" value="true"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <connect from_op="Validation" from_port="training" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
          <connect from_op="Recall" from_port="result" to_port="result 4"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
        </process>
      </operator>
    </process>
    
    Hope this helps,

    Regards,

    Lionel

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    that's pretty clever, @lionelderkrikor. I will say from a UI/UX standpoint that this is rather icky. As @Carl_Granström said, it should be easier. But well done on the remember/recall. :smile:
Sign In or Register to comment.