Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Manual inspection of missclassified examples

Carl_GranströmCarl_Granström Member Posts: 3 Learner I
Hello,


I'm trying to find out how, after training a classification model, I can look at the examples that were incorrectly classified. For now I can only see how many examples were incorrectly classified in the confusion matrix, but I want to inspect the missclassified examples manually. Since evaluation vector does not seem to be able to store such information I guess I need to somehow add another operator to achieve this, if it's even possible (which, in my own opinion, feels like a very basic feature, so I'm hoping it's there somewhere).


Kind regards,

Carl

Best Answer

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted
    hi @Carl_Granström hmm well that does sound very basic. Funny thing is that I moderate this forum and have been on it for years - I cannot recall anyone asking! :smile:

    Anyway it's pretty easy. I would just put a Filter Examples on the end like this:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000-BETA4">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.5.000-BETA4" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="-1"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.000-BETA4" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
          </operator>
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="9.5.000-BETA4" expanded="true" height="82" name="Apply Model" width="90" x="380" y="34">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Filter Examples" width="90" x="514" y="34">
            <parameter key="parameter_expression" value="Survived!=[prediction(Survived)]"/>
            <parameter key="condition_class" value="expression"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list"/>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
            <description align="center" color="yellow" colored="true" width="126">here's where I only find incorrect predictions</description>
          </operator>
          <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    



    Scott

Answers

  • Carl_GranströmCarl_Granström Member Posts: 3 Learner I
    So I have a further question: can this be done inside the Validation operator somehow?
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    edited November 2019
    Hello @Carl_Granström

    You need to connect the "Exa" port of the "Performance" Operator inside the validation to the "tes" port. Then you connect the "Tes" output of cross-validation operator to the process output or filter examples as Scott did in earlier example.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • Carl_GranströmCarl_Granström Member Posts: 3 Learner I
    Ah, thank you varunm1. Unfortunately I don't want to use the Cross-validation operator, and neither the Validation or Split Validation operators have an outgoing tes port.
  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi Carl,

    In deed, Split Validation operator has no tes output port.
    But you can extract the test set using the association Remember/Recall operators.

    Take a look at this process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.5.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.5.000" expanded="true" height="68" name="Retrieve" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.5.000" expanded="true" height="82" name="Generate ID" origin="GENERATED_TUTORIAL" width="90" x="246" y="30">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="split_validation" compatibility="9.5.000" expanded="true" height="124" name="Validation" origin="GENERATED_TUTORIAL" width="90" x="447" y="30">
            <parameter key="create_complete_model" value="false"/>
            <parameter key="split" value="absolute"/>
            <parameter key="split_ratio" value="0.7"/>
            <parameter key="training_set_size" value="10"/>
            <parameter key="test_set_size" value="-1"/>
            <parameter key="sampling_type" value="linear sampling"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.4.000" expanded="true" height="103" name="Decision Tree" origin="GENERATED_TUTORIAL" width="90" x="112" y="30">
                <parameter key="criterion" value="gain_ratio"/>
                <parameter key="maximal_depth" value="10"/>
                <parameter key="apply_pruning" value="true"/>
                <parameter key="confidence" value="0.1"/>
                <parameter key="apply_prepruning" value="true"/>
                <parameter key="minimal_gain" value="0.01"/>
                <parameter key="minimal_leaf_size" value="2"/>
                <parameter key="minimal_size_for_split" value="4"/>
                <parameter key="number_of_prepruning_alternatives" value="3"/>
              </operator>
              <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
              <connect from_op="Decision Tree" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
                <list key="application_parameters"/>
                <parameter key="create_view" value="false"/>
              </operator>
              <operator activated="true" class="performance" compatibility="9.5.000" expanded="true" height="82" name="Performance" origin="GENERATED_TUTORIAL" width="90" x="179" y="30">
                <parameter key="use_example_weights" value="true"/>
              </operator>
              <operator activated="true" class="remember" compatibility="9.5.000" expanded="true" height="68" name="Remember" width="90" x="380" y="85">
                <parameter key="name" value="test_set"/>
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="store_which" value="1"/>
                <parameter key="remove_from_process" value="true"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <connect from_op="Performance" from_port="example set" to_op="Remember" to_port="store"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="9.5.000" expanded="true" height="68" name="Recall" width="90" x="581" y="136">
            <parameter key="name" value="test_set"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="remove_from_store" value="true"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="result 1"/>
          <connect from_op="Validation" from_port="training" to_port="result 2"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
          <connect from_op="Recall" from_port="result" to_port="result 4"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
          <portSpacing port="sink_result 5" spacing="0"/>
        </process>
      </operator>
    </process>
    
    Hope this helps,

    Regards,

    Lionel

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    that's pretty clever, @lionelderkrikor. I will say from a UI/UX standpoint that this is rather icky. As @Carl_Granström said, it should be easier. But well done on the remember/recall. :smile:
Sign In or Register to comment.