The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

prediction confidence

navnav Member Posts: 28 Contributor II
edited October 2019 in Help
hello, someone can explain me how the prediction confidence columns work and how are calculated when I apply a classification model on a test set. Thanks.
Tagged:

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    whoa, that's a pretty broad question with no definite answer for all learning schemes.

    In general, the prediction confidences state how sure the model was for each of the possible values. This is similar to probabilities ("how large is the probability that the class is "positive"?) but not necessarily the same.

    How they are calculated? Well, that differs for all model types. For schemes like Naive Bayes and Logistic Regression, the confidences are indeed the probabilities based on the seen training data. If you use an SVM and apply scalings like Platt scaling, it is at least pretty close. For other schemes, things might be different. For example, the confidences of decision trees are the fraction of the class in the applicable leaf against the total number of cases in this leaf.

    There are only two ways: Simply accept the confidences as a measurement of how sure the model is and believe it. Or do it the hard way: read all the literature about all the model types and learn how they are calculated in detail. The source code might also help here.

    Cheers,
    Ingo
  • adamanadaman Member Posts: 17 Contributor II
    Hey together,

    this is all fine for me, no need to understand it all in detail, but i would like to put a threshold on the confidence after the model applier is finsihed to get only some examples with a lower or higher threshold. But i can´t, as the confidence is a spezial attribute? or am i doing something complete wrong
  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    well, you have several options for this.

    You could use
    • the operator "Generate Attributes" (you will have to rename the confidence attributes before since the parentheses would cause problems otherwise...)
    • one of the discretization operators
    • the operator "Drop Uncertain Predictions" (although this one does not exactly divide your data into discrete bins...)
    If the fact that the confidence is a special attribute is a problem somewhere, you could either check the setting "include special attributes" or use the operator "Set Role" before the data transformation is applied.

    Here is an example using the operator "Generate Attributes":

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="359" width="815">
          <operator activated="true" class="generate_data" compatibility="5.1.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="sum classification"/>
            <parameter key="number_examples" value="500"/>
          </operator>
          <operator activated="true" class="add_noise" compatibility="5.1.008" expanded="true" height="94" name="Add Noise" width="90" x="179" y="30">
            <parameter key="random_attributes" value="1"/>
            <list key="noise"/>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="5.1.008" expanded="true" height="76" name="Naive Bayes" width="90" x="313" y="30"/>
          <operator activated="true" class="generate_data" compatibility="5.1.008" expanded="true" height="60" name="Generate Data (2)" width="90" x="179" y="165">
            <parameter key="target_function" value="sum classification"/>
            <parameter key="number_examples" value="200"/>
          </operator>
          <operator activated="true" class="add_noise" compatibility="5.1.008" expanded="true" height="94" name="Add Noise (2)" width="90" x="313" y="165">
            <parameter key="random_attributes" value="1"/>
            <list key="noise"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="447" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="5.1.008" expanded="true" height="76" name="Rename by Replacing" width="90" x="581" y="30">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="replace_by" value="_"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.1.008" expanded="true" height="76" name="Generate Attributes" width="90" x="715" y="30">
            <list key="function_descriptions">
              <parameter key="discretized" value="if (confidence_negative_&gt;0.8,&quot;high&quot;,&quot;low&quot;)"/>
            </list>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Add Noise" to_port="example set input"/>
          <connect from_op="Add Noise" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Add Noise (2)" to_port="example set input"/>
          <connect from_op="Add Noise (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    And here is an example using one of the discretization operators:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="359" width="815">
          <operator activated="true" class="generate_data" compatibility="5.1.008" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="sum classification"/>
            <parameter key="number_examples" value="500"/>
          </operator>
          <operator activated="true" class="add_noise" compatibility="5.1.008" expanded="true" height="94" name="Add Noise" width="90" x="179" y="30">
            <parameter key="random_attributes" value="1"/>
            <list key="noise"/>
          </operator>
          <operator activated="true" class="naive_bayes" compatibility="5.1.008" expanded="true" height="76" name="Naive Bayes" width="90" x="313" y="30"/>
          <operator activated="true" class="generate_data" compatibility="5.1.008" expanded="true" height="60" name="Generate Data (2)" width="90" x="179" y="165">
            <parameter key="target_function" value="sum classification"/>
            <parameter key="number_examples" value="200"/>
          </operator>
          <operator activated="true" class="add_noise" compatibility="5.1.008" expanded="true" height="94" name="Add Noise (2)" width="90" x="313" y="165">
            <parameter key="random_attributes" value="1"/>
            <list key="noise"/>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="447" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="discretize_by_frequency" compatibility="5.1.008" expanded="true" height="94" name="Discretize" width="90" x="581" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="confidence(negative)"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Add Noise" to_port="example set input"/>
          <connect from_op="Add Noise" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Generate Data (2)" from_port="output" to_op="Add Noise (2)" to_port="example set input"/>
          <connect from_op="Add Noise (2)" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Cheers,
    Ingo
  • adamanadaman Member Posts: 17 Contributor II
    thx :-) for the hints
  • sinead_brackensinead_bracken Member Posts: 5 Contributor I

    Hi there,  

     

    just to add on this:

     

    are the values indicative of how sure we are in the sense of if the confidence value is  0.785, could we say we are 78.5% confident that this prediction falls into this category?

     

    Or is it more along the lines of 78.5% of entries like this fall into this category too?

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Another good, but sometimes difficult, question. Generally the confidence can be interpreted in both senses, because the first sense (confidence in the prediction) is actually based on the second sense (distribution of similar records). However, this number itself is highly dependent on the specifics of the training dataset, so it is susceptible to "drift" when applying the score to other datasets. Most of the time, the scores are more robust as rank-ordering tools, so that even if the underlying distributions of classes shift, they preserve the correct ordering even if the absolute probabilities shift.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • rafeenarafeena Member Posts: 14 Contributor II
    @sinead_bracken and everyone else. regarding this question does it mean the confidence is like the accuracy measurement of the performance
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    No, I would say that accuracy is something different, at least as usually defined in a machine learning context.  Accuracy is typically a measure of overall model performance, such as derived from the confusion matrix for a classification problem, and as shown in the performance operators in RapidMiner. 
    This is related but is ultimately not the same as the confidence for an individual prediction (or even set of predictions) and it is itself subject to skew based on the confidence threshold selected for classification purposes (see the earlier part of this same thread for a discussion of setting thresholds).

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.