Manual inspection of missclassified examples

Carl_Granström · November 2019

Hello,

I'm trying to find out how, after training a classification model, I can look at the examples that were incorrectly classified. For now I can only see how many examples were incorrectly classified in the confusion matrix, but I want to inspect the missclassified examples manually. Since evaluation vector does not seem to be able to store such information I guess I need to somehow add another operator to achieve this, if it's even possible (which, in my own opinion, feels like a very basic feature, so I'm hoping it's there somewhere).

Kind regards,

Carl

sgenzer · November 2019

hi @Carl_Granström hmm well that does sound very basic. Funny thing is that I moderate this forum and have been on it for years - I cannot recall anyone asking!

Anyway it's pretty easy. I would just put a Filter Examples on the end like this:

Spoiler

<?xml version="1.0" encoding="UTF-8"?><process version="9.5.000-BETA4">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.5.000-BETA4" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="-1"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.5.000-BETA4" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
      </operator>
      <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
        <parameter key="criterion" value="gain_ratio"/>
        <parameter key="maximal_depth" value="10"/>
        <parameter key="apply_pruning" value="true"/>
        <parameter key="confidence" value="0.1"/>
        <parameter key="apply_prepruning" value="true"/>
        <parameter key="minimal_gain" value="0.01"/>
        <parameter key="minimal_leaf_size" value="2"/>
        <parameter key="minimal_size_for_split" value="4"/>
        <parameter key="number_of_prepruning_alternatives" value="3"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.5.000-BETA4" expanded="true" height="82" name="Apply Model" width="90" x="380" y="34">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="9.5.000-BETA4" expanded="true" height="103" name="Filter Examples" width="90" x="514" y="34">
        <parameter key="parameter_expression" value="Survived!=[prediction(Survived)]"/>
        <parameter key="condition_class" value="expression"/>
        <parameter key="invert_filter" value="false"/>
        <list key="filters_list"/>
        <parameter key="filters_logic_and" value="true"/>
        <parameter key="filters_check_metadata" value="true"/>
        <description align="center" color="yellow" colored="true" width="126">here's where I only find incorrect predictions</description>
      </operator>
      <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Decision Tree" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Scott

Carl_Granström · November 2019

So I have a further question: can this be done inside the Validation operator somehow?

varunm1 · November 2019

Hello @Carl_Granström

You need to connect the "Exa" port of the "Performance" Operator inside the validation to the "tes" port. Then you connect the "Tes" output of cross-validation operator to the process output or filter examples as Scott did in earlier example.

Carl_Granström · November 2019

Ah, thank you varunm1. Unfortunately I don't want to use the Cross-validation operator, and neither the Validation or Split Validation operators have an outgoing tes port.

lionelderkrikor · November 2019

Hi Carl,

In deed, Split Validation operator has no tes output port.
But you can extract the test set using the association Remember/Recall operators.

Take a look at this process :

<?xml version="1.0" encoding="UTF-8"?><process version="9.5.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.5.000" expanded="true" height="68" name="Retrieve" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="generate_id" compatibility="9.5.000" expanded="true" height="82" name="Generate ID" origin="GENERATED_TUTORIAL" width="90" x="246" y="30">
        <parameter key="create_nominal_ids" value="false"/>
        <parameter key="offset" value="0"/>
      </operator>
      <operator activated="true" class="split_validation" compatibility="9.5.000" expanded="true" height="124" name="Validation" origin="GENERATED_TUTORIAL" width="90" x="447" y="30">
        <parameter key="create_complete_model" value="false"/>
        <parameter key="split" value="absolute"/>
        <parameter key="split_ratio" value="0.7"/>
        <parameter key="training_set_size" value="10"/>
        <parameter key="test_set_size" value="-1"/>
        <parameter key="sampling_type" value="linear sampling"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.4.000" expanded="true" height="103" name="Decision Tree" origin="GENERATED_TUTORIAL" width="90" x="112" y="30">
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="45" y="30">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="performance" compatibility="9.5.000" expanded="true" height="82" name="Performance" origin="GENERATED_TUTORIAL" width="90" x="179" y="30">
            <parameter key="use_example_weights" value="true"/>
          </operator>
          <operator activated="true" class="remember" compatibility="9.5.000" expanded="true" height="68" name="Remember" width="90" x="380" y="85">
            <parameter key="name" value="test_set"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="store_which" value="1"/>
            <parameter key="remove_from_process" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <connect from_op="Performance" from_port="example set" to_op="Remember" to_port="store"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="recall" compatibility="9.5.000" expanded="true" height="68" name="Recall" width="90" x="581" y="136">
        <parameter key="name" value="test_set"/>
        <parameter key="io_object" value="ExampleSet"/>
        <parameter key="remove_from_store" value="true"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Generate ID" to_port="example set input"/>
      <connect from_op="Generate ID" from_port="example set output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="model" to_port="result 1"/>
      <connect from_op="Validation" from_port="training" to_port="result 2"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 3"/>
      <connect from_op="Recall" from_port="result" to_port="result 4"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
    </process>
  </operator>
</process>

Hope this helps,

Regards,

Lionel

sgenzer · November 2019

that's pretty clever, @lionelderkrikor. I will say from a UI/UX standpoint that this is rather icky. As @Carl_Granström said, it should be easier. But well done on the remember/recall.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Manual inspection of missclassified examples

Best Answer

Answers

Be Safe. Follow precautions and Maintain Social Distancing