how to filter 'wrong predictions' of output of multi labeling model operator?

LeMarc · April 2020

Hi,

I usually use the Filter examples to select 'wrong predictions'.

When the operator 'multi labeling model' is used, one cant select label attributes. However those are required to select the 'wrong predictions' examples for the Filter Example Operator.

Therefore I changed the roles of the chosen attributes after applying the model & performance measure as 'label' and 'prediction' attribute in order to filter the 'wrong predictions'. However this also doesnt work.

Image: https://us.v-cdn.net/6030995/uploads/editor/8k/sope12c3m2nw.png

Does anyone have an idea how to filter wrong predictions if using the multi labeling model operator?

Thank you!

jacobcybulski · April 2020

@LeMarc , I have replicated this with the tutorial exercise built into Help (the Titanic example). I have set the roles of prediction(Survived) to be "prediction" and Survived to be a "label". I had no problems checking the binomial performance on output and no issues filtering examples. So I assume you have misspelled the attribute names in your filter (if you entered them directly), or your input into the filter has no examples?

Jacob

tftemme · April 2020

Hi @LeMarc

What you could also do is putting the Filter Examples (with 'wrong prediction' setting) operator into the Multi Label Performance operator. The operator automatically does the correct Set Role operations for all label and predictions attribute. When you connect the output of the Filter Examples to one of the 'out' output ports you would get a collection of ExampleSets with the wrong predictions for the different label attributes.

Here is the tutorial process of the Multi Label Performance operator, adapted to this:

<process version="9.6.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process" origin="GENERATED_TUTORIAL">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="subprocess" compatibility="9.6.000" expanded="true" height="103" name="Subprocess" origin="GENERATED_TUTORIAL" width="90" x="179" y="34">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Titanic" origin="GENERATED_TUTORIAL" width="90" x="45" y="187">
            <parameter key="repository_entry" value="//Samples/data/Titanic"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.6.000" expanded="true" height="82" name="Set Role" origin="GENERATED_TUTORIAL" width="90" x="179" y="187">
            <parameter key="attribute_name" value="Survived"/>
            <parameter key="target_role" value="Survived"/>
            <list key="set_additional_roles">
              <parameter key="Port of Embarkation" value="Port"/>
              <parameter key="Age" value="Age"/>
            </list>
          </operator>
          <operator activated="true" class="split_data" compatibility="9.6.000" expanded="true" height="103" name="Split Data" origin="GENERATED_TUTORIAL" width="90" x="313" y="187">
            <enumeration key="partitions">
              <parameter key="ratio" value="0.7"/>
              <parameter key="ratio" value="0.3"/>
            </enumeration>
            <parameter key="sampling_type" value="automatic"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <description align="center" color="purple" colored="true" width="126">Split data set into training (0.7 ratio) and test (0.3 ratio) set.</description>
          </operator>
          <operator activated="true" class="time_series:multi_label_model_learner" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Multi Label Modeling" origin="GENERATED_TUTORIAL" width="90" x="581" y="34">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="Port of Embarkation|Survived|Age"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="add_macros" value="false"/>
            <parameter key="current_label_name_macro" value="current_label_attribute"/>
            <parameter key="current_label_type_macro" value="current_label_type"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="k_nn" compatibility="9.6.000" expanded="true" height="82" name="k-NN" origin="GENERATED_TUTORIAL" width="90" x="380" y="34">
                <parameter key="k" value="5"/>
                <parameter key="weighted_vote" value="true"/>
                <parameter key="measure_types" value="MixedMeasures"/>
                <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
                <parameter key="nominal_measure" value="NominalDistance"/>
                <parameter key="numerical_measure" value="EuclideanDistance"/>
                <parameter key="divergence" value="GeneralizedIDivergence"/>
                <parameter key="kernel_type" value="radial"/>
                <parameter key="kernel_gamma" value="1.0"/>
                <parameter key="kernel_sigma1" value="1.0"/>
                <parameter key="kernel_sigma2" value="0.0"/>
                <parameter key="kernel_sigma3" value="2.0"/>
                <parameter key="kernel_degree" value="3.0"/>
                <parameter key="kernel_shift" value="1.0"/>
                <parameter key="kernel_a" value="1.0"/>
                <parameter key="kernel_b" value="0.0"/>
                <description align="center" color="green" colored="true" width="126">Train a k-NN for each of the selected Label attributes</description>
              </operator>
              <connect from_port="training set" to_op="k-NN" to_port="training set"/>
              <connect from_op="k-NN" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <description align="center" color="blue" colored="true" height="121" resized="false" width="180" x="32" y="78">In each iteration the corresponding selected label attribute is set to the Label role and provided to the 'training set' port of the inner subprocess</description>
            </process>
            <description align="center" color="green" colored="true" width="126">Train the Multi Label Model for the 3 selected Attributes:&lt;br&gt;&lt;br/&gt;&lt;br&gt;&lt;br&gt;Survived, Port of Embarkation, Age&lt;br&gt;&lt;br&gt;In each iteration the corresponding attribute is set to the Label role and the inner subprocess is executed.&lt;br&gt;&lt;br&gt;The trained prediction models are retrieved and collected to the multi label model wrapper model.&lt;br&gt;</description>
          </operator>
          <connect from_op="Retrieve Titanic" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Split Data" to_port="example set"/>
          <connect from_op="Split Data" from_port="partition 1" to_op="Multi Label Modeling" to_port="training set"/>
          <connect from_op="Split Data" from_port="partition 2" to_port="out 2"/>
          <connect from_op="Multi Label Modeling" from_port="model" to_port="out 1"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
          <description align="center" color="blue" colored="true" height="159" resized="true" width="238" x="28" y="284">Retrieve the Titanic sample data and set the roles of the following attributes to special ones:&lt;br&gt;&lt;br&gt;Survived: 'Survived'&lt;br&gt;Port of Embarkation: 'Port'&lt;br&gt;Age: 'Age'&lt;br&gt;</description>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.6.000" expanded="true" height="82" name="Apply Model" origin="GENERATED_TUTORIAL" width="90" x="447" y="34">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
        <description align="center" color="green" colored="true" width="126">The Multi Label Model is applied on the test set.&lt;br&gt;&lt;br&gt;For all 3 selected label attributes a Prediction attribute is created.&lt;br&gt;&lt;br&gt;Also for the nominal Attributes Survived and Port of Embarkation the corresponding Confidence Attributes are created</description>
      </operator>
      <operator activated="true" class="time_series:multi_label_performance_evaluator" compatibility="9.7.000-SNAPSHOT" expanded="true" height="145" name="Multi Label Performance" origin="GENERATED_TUTORIAL" width="90" x="715" y="34">
        <parameter key="auto_detect_label_and_prediction_attributes" value="true"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
        <parameter key="add_macros" value="false"/>
        <parameter key="current_label_name_macro" value="current_label_attribute"/>
        <parameter key="current_label_type_macro" value="current_label_type"/>
        <parameter key="enable_parallel_execution" value="true"/>
        <process expanded="true">
          <operator activated="true" class="performance" compatibility="9.6.000" expanded="true" height="82" name="Performance" origin="GENERATED_TUTORIAL" width="90" x="447" y="34">
            <parameter key="use_example_weights" value="true"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.6.000" expanded="true" height="103" name="Filter Examples" width="90" x="581" y="136">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="wrong_predictions"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list"/>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <connect from_port="labelled set" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance"/>
          <connect from_op="Performance" from_port="example set" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="output 1"/>
          <portSpacing port="source_labelled set" spacing="0"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Subprocess" from_port="out 1" to_op="Apply Model" to_port="model"/>
      <connect from_op="Subprocess" from_port="out 2" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Multi Label Performance" to_port="labelled set"/>
      <connect from_op="Multi Label Performance" from_port="collection of performances" to_port="result 1"/>
      <connect from_op="Multi Label Performance" from_port="output 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <description align="center" color="blue" colored="true" height="184" resized="true" width="263" x="112" y="162">Retrieve the Titanic Sample Data. Split the data into training and test set.&lt;br/&gt;&lt;br/&gt;Build a Multi Label Model for the attributes: 'Survived', 'Age' and 'Port of Embarkation' on the training set&lt;br/&gt;&lt;br/&gt;Provide Multi Label Model and the test set.</description>
      <description align="center" color="yellow" colored="true" height="316" resized="true" width="403" x="587" y="183">Evaluate the performance of all three predictions.&lt;br&gt;&lt;br&gt;The Multi Label Performance operator auto detects the three Prediction attributes (and the corresponding label attributes) and loops over them.&lt;br&gt;&lt;br&gt;In each iteration the corresponding multi label attribute, the prediction attribute and (if existing) the Confidence Attributes are set to the correct roles.&lt;br&gt;&lt;br&gt;The inner Performance operator evaluates the corresponding Performance. The Multi Label Performance provides a collection of all three Performance Vectors.&lt;br&gt;&lt;br&gt;An averaged Performance Vector cannot be created, cause the individual Performance Vectors are of different types.</description>
    </process>
  </operator>
</process>

LeMarc · April 2020

Thanks @jacobcybulski for the remark. Im going to check it again.

LeMarc · April 2020

@tftemme also thank you for your suggestion. I found that if I use your great idea (filter examples - wrong prediction) it works when training and testing data are from the same example set. Now I would like to use your proposal with a different data set (in terms of the values) than the example set provided for training and testing the data.

However it does not work. If using the multi label modeling - operator , the Example Set of 'apply model' operator will only show the prediction attributes but not the original attributes. So there are no attribute at all to specify the role. Is there a solution to that?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

how to filter 'wrong predictions' of output of multi labeling model operator?

Best Answers

Answers