questions on "Apply Model" operator and predicted label

huaiyanggongzi · December 2012

I use "Apply Model" operator to predict the test data set. The generated results normally includes three types of information ( confidence (positive class), confidence (negative class), predicted label).

Naturally, when confidence (positive class) is larger than confidence (negative class), the prediction label is positive.

But I found a lot of cases ( using libsvm for text classification), even when confidence (positive ) is smaller than confidence (negative class), the prediction label is still positive. I would like to know why?

MariusHelf · January 2013

Actually, I have never seen such a case with a plain create model/apply model cycle. Anyway, you can define manual thresholds e.g. with Create Threshold and Apply Threshold, or shift the thresholds in a more sophisticated way with e.g. Choose Recall or other cost-sensitive learning schemes.

Best regards,
Marius

huaiyanggongzi · January 2013

Hi, thanks for the reply.

the following is the result of running the "apply model" operator. The model was training using LIBSVM operator. I just posted part of the result which shows the observation I mentioned in the original post, i.e., even the confidence (R) is smaller than confidence (NR), the prediction is still R.

confidence(R) confidence(NR) Prediction(Label)
0.528462399 0.471537601 R
0.524106922 0.475893078 R
0.516740761 0.483259239 R
0.509868083 0.490131917 R
0.505252829 0.494747171 R
0.493653526 0.506346474 R
0.485416242 0.514583758 R
0.475031465 0.524968535 R
0.466340913 0.533659087 R
0.459370807 0.540629193 R
0.458747466 0.541252534 R
0.4577908 0.5422092 R
0.435570459 0.564429541 R
0.432716957 0.567283043 R
0.42963305 0.57036695 R
0.422826691 0.577173309 R
0.412345117 0.587654883 R
0.404687872 0.595312128 R
0.40221958 0.59778042 R
0.39865042 0.60134958 R
0.398228918 0.601771082 R

MariusHelf · January 2013

Hm, interesting. Can you please post your process xml as described in my signature?

Best regards,
Marius

huaiyanggongzi · January 2013

The following is the process that I have been using for scoring process.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.011">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.011" expanded="true" name="Process">
    <parameter key="parallelize_main_process" value="true"/>
    <process expanded="true" height="386" width="711">
      <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
        <parameter key="repository_entry" value="SVM_Train_F_words_unigram_tf"/>
      </operator>
      <operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files (2)" width="90" x="179" y="75">
        <list key="text_directories">
          <parameter key="R" value="E:\R_Validation"/>
          <parameter key="NR" value="E:\NR_Validation"/>
        </list>
        <parameter key="extract_text_only" value="false"/>
        <parameter key="vector_creation" value="Term Frequency"/>
        <parameter key="prune_below_absolute" value="5"/>
        <parameter key="prune_above_absolute" value="5000000"/>
        <parameter key="parallelize_vector_creation" value="true"/>
        <process expanded="true" height="362" width="674">
          <operator activated="true" class="text:tokenize" compatibility="5.1.002" expanded="true" height="60" name="Tokenize (2)" width="90" x="45" y="30"/>
          <operator activated="true" class="text:transform_cases" compatibility="5.1.002" expanded="true" height="60" name="Transform Cases (2)" width="90" x="180" y="30"/>
          <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="315" y="73"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_op="Transform Cases (2)" to_port="document"/>
          <connect from_op="Transform Cases (2)" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
          <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.1.011" expanded="true" height="60" name="Retrieve (2)" width="90" x="179" y="300">
        <parameter key="repository_entry" value="SVM_Train_F_model_unigram_tf"/>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.1.011" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="performance_classification" compatibility="5.1.011" expanded="true" height="76" name="Performance" width="90" x="447" y="75">
        <list key="class_weights"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.1.011" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="210">
        <parameter key="attribute_filter_type" value="subset"/>
        <parameter key="attributes" value="|confidence(non_res)|confidence(res)|label|prediction(label)"/>
      </operator>
      <operator activated="true" class="write_csv" compatibility="5.1.011" expanded="true" height="60" name="Write CSV" width="90" x="581" y="165">
        <parameter key="csv_file" value="E:\Project\svmscore.csv"/>
        <parameter key="column_separator" value=","/>
        <parameter key="quote_nominal_values" value="false"/>
        <parameter key="format_date_attributes" value="false"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Process Documents from Files (2)" to_port="word list"/>
      <connect from_op="Process Documents from Files (2)" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 2"/>
      <connect from_op="Performance" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Write CSV" to_port="input"/>
      <connect from_op="Write CSV" from_port="through" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

MariusHelf · January 2013

Whoo, you are using RapidMiner 5.1. In a few days RapidMiner 5.3 will be released - I strongly encourage you to update to the latest version (5.2.8) and try again. Please leave a note in this thread if your problem persists or if everything is working fine now.

Best regards,
Marius

huaiyanggongzi · January 2013

Thanks, Marius. I will give it another try after updating Rapidminer

By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.

MariusHelf · January 2013

huaiyanggongzi wrote:
By the way, do you know how to output the distance between a given test data point and the hyperplane constructed by training data set? I am also referring to the LiBSVM operator in Rapidminer.

Unfortunately, that's not possible. The confidence is an indicator for that, but the exact distance cannot be output.

Best regards,
Marius

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

questions on "Apply Model" operator and predicted label

Answers