RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

Manual inspection of missclassified examples

Carl_GranströmCarl_Granström Member Posts: 3 Newbie
Hello,


I'm trying to find out how, after training a classification model, I can look at the examples that were incorrectly classified. For now I can only see how many examples were incorrectly classified in the confusion matrix, but I want to inspect the missclassified examples manually. Since evaluation vector does not seem to be able to store such information I guess I need to somehow add another operator to achieve this, if it's even possible (which, in my own opinion, feels like a very basic feature, so I'm hoping it's there somewhere).


Kind regards,

Carl

Best Answer

  • sgenzersgenzer 12Posts: 2,923  Community Manager
    Solution Accepted
    hi @Carl_Granström hmm well that does sound very basic. Funny thing is that I moderate this forum and have been on it for years - I cannot recall anyone asking! :smile:

    Anyway it's pretty easy. I would just put a Filter Examples on the end like this:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000-BETA4">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.000-BETA4" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.000-BETA4" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          </operator>
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="9.5.000-BETA4" expanded="true" height="82" name="Apply Model" width="90" x="380" y="34">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Filter Examples" width="90" x="514" y="34">
            <parameter key="parameter_expression" value="Survived!=[prediction(Survived)]"/>
            <parameter key="condition_class" value="expression"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list"/>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
            <description align="center" color="yellow" colored="true" width="126">here's where I only find incorrect predictions</description>
          </operator>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    Scott

Answers

  • Carl_GranströmCarl_Granström Member Posts: 3 Newbie
    So I have a further question: can this be done inside the Validation operator somehow?
  • varunm1varunm1 Moderator, Member Posts: 1,185   Unicorn
    edited November 2019
    Hello @Carl_Granström

    You need to connect the "Exa" port of the "Performance" Operator inside the validation to the "tes" port. Then you connect the "Tes" output of cross-validation operator to the process output or filter examples as Scott did in earlier example.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

    Tghadially
  • Carl_GranströmCarl_Granström Member Posts: 3 Newbie
    Ah, thank you varunm1. Unfortunately I don't want to use the Cross-validation operator, and neither the Validation or Split Validation operators have an outgoing tes port.
  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,056   Unicorn
    Hi Carl,

    In deed, Split Validation operator has no tes output port.
    But you can extract the test set using the association Remember/Recall operators.

    Take a look at this process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.000" expanded="true" height="68" name="Retrieve" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.5.000" expanded="true" height="82" name="Generate ID" origin="GENERATED_TUTORIAL" width="90" x="246" y="30">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="9.5.000" expanded="true" height="124" name="Validation" origin="GENERATED_TUTORIAL" width="90" x="447" y="30">
            <parameter key="create_complete_model" value="false"/>
            <parameter key="split" value="absolute"/>
            <parameter key="split_ratio" value="0.7"/>
            <parameter key="training_set_size" value="10"/>
            <parameter key="test_set_size" value="-1"/>
            <parameter key="sampling_type" value="linear sampling"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.4.000" expanded="true" height="103" name="Decision Tree" origin="GENERATED_TUTORIAL" width="90" x="112" y="30">
                <parameter key="criterion" value="gain_ratio"/>
                <parameter key="maximal_depth" value="10"/>
                <parameter key="apply_pruning" value="true"/>
                <parameter key="confidence" value="0.1"/>
                <parameter key="apply_prepruning" value="true"/>
                <parameter key="minimal_gain" value="0.01"/>
                <parameter key="minimal_leaf_size" value="2"/>
                <parameter key="minimal_size_for_split" value="4"/>
                <parameter key="number_of_prepruning_alternatives" value="3"/>
              </operator>
              <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="performance" compatibility="9.5.000" expanded="true" height="82" name="Performance" origin="GENERATED_TUTORIAL" width="90" x="179" y="30">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <operator activated="true" class="remember" compatibility="9.5.000" expanded="true" height="68" name="Remember" width="90" x="380" y="85">
                <parameter key="name" value="test_set"/>
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="store_which" value="1"/>
                <parameter key="remove_from_process" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <connect from_op="Performance" from_port="example set" to_op="Remember" to_port="store"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="9.5.000" expanded="true" height="68" name="Recall" width="90" x="581" y="136">
            <parameter key="name" value="test_set"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="remove_from_store" value="true"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <connect from_op="Validation" from_port="training" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
          <connect from_op="Recall" from_port="result" to_port="result 4"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
        </process>
      </operator>
    </process>
    
    Hope this helps,

    Regards,

    Lionel

    varunm1sgenzerCarl_Granström
  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,923  Community Manager
    that's pretty clever, @lionelderkrikor. I will say from a UI/UX standpoint that this is rather icky. As @Carl_Granström said, it should be easier. But well done on the remember/recall. :smile:
    lionelderkrikorCarl_Granström
Sign In or Register to comment.