"Same accuracy...different predictions"

blueearth · November 2012

Hi i had a multi labels original data set which was weighted by two models gini index and uncertainty ...i collected attributes which gained weight more than .5 and made two databases....one based on gini index weighting attributes and the other one by uncertainty weighting attributes.
i ran these two data sets with a x-validation which trained by neural network operator...the achieved accuracy was same for both data sets :99.45%
but when i applied this model on an unknown database once with gini index attributes and once with uncertainty attributes ...the achieved prediction was completely different...whats the problem ? did i go wrong somewhere?

MariusHelf · November 2012

Hi,

the accuracy represents only the probability that a new, unseen example drawn from the same distribution as the training set is classified correctly. This leaves room for different predictions on unseen data.

To tell if you did anything wrong we need more detailed information on what you did.
For example the data on which you apply a model must have the same attributes as the training data. So you can't apply a model trained on attribute set A on an example set with attribute set B and expect sensible results. From your description we can't see though what exactly you have done.

Best, Marius

blueearth · November 2012

according to apply model describe...i made by unknown data attributes on my trained data attributes...every thing such as count, label ,order of attributes were exactly same as trained data set...so i made two unknown databases one based on gini index attributes and other one based on uncertainty attributes.
but as i told before although the accuracy of my trained databases were same but the predictions were completely different....
here is my weighting process

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="1475" width="768">
      <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="../../Data/F C Data"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="120">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="value_type" value="numeric"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.2.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="210">
        <list key="columns"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="5.2.008" expanded="true" height="94" name="Multiply" width="90" x="45" y="345"/>
      <operator activated="true" class="weight_by_gini_index" compatibility="5.2.008" expanded="true" height="76" name="Weight by Gini Index" width="90" x="179" y="210"/>
      <operator activated="true" class="select_by_weights" compatibility="5.2.008" expanded="true" height="94" name="Select by Weights (5)" width="90" x="380" y="210">
        <parameter key="weight" value="0.7"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.2.008" expanded="true" height="60" name="Store (5)" width="90" x="581" y="210">
        <parameter key="repository_entry" value="../../Results/Attribute Weighting/Gini Index"/>
      </operator>
      <operator activated="true" class="weight_by_uncertainty" compatibility="5.2.008" expanded="true" height="76" name="Weight by Uncertainty" width="90" x="171" y="342"/>
      <operator activated="true" class="select_by_weights" compatibility="5.2.008" expanded="true" height="94" name="Select by Weights (7)" width="90" x="380" y="345">
        <parameter key="weight" value="0.7"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.2.008" expanded="true" height="60" name="Store (6)" width="90" x="581" y="345">
        <parameter key="repository_entry" value="../../Results/Attribute Weighting/Uncertainty"/>
      </operator>
      <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Weight by Gini Index" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Weight by Uncertainty" to_port="example set"/>
      <connect from_op="Weight by Gini Index" from_port="weights" to_op="Select by Weights (5)" to_port="weights"/>
      <connect from_op="Weight by Gini Index" from_port="example set" to_op="Select by Weights (5)" to_port="example set input"/>
      <connect from_op="Select by Weights (5)" from_port="example set output" to_op="Store (5)" to_port="input"/>
      <connect from_op="Store (5)" from_port="through" to_port="result 1"/>
      <connect from_op="Weight by Uncertainty" from_port="weights" to_op="Select by Weights (7)" to_port="weights"/>
      <connect from_op="Weight by Uncertainty" from_port="example set" to_op="Select by Weights (7)" to_port="example set input"/>
      <connect from_op="Select by Weights (7)" from_port="example set output" to_op="Store (6)" to_port="input"/>
      <connect from_op="Store (6)" from_port="through" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="468"/>
      <portSpacing port="sink_result 2" spacing="162"/>
      <portSpacing port="sink_result 3" spacing="126"/>
    </process>
  </operator>
</process>

blueearth · November 2012

and this is my training process ....the model applier process were exactly done according to rapidminer samples ....whats the problem? how can i get different predictions when i have same accuracy ? and how can it be fixed ?
at least the predictions should not be so different from each others

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="353" width="701">
      <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="gini index" width="90" x="45" y="30">
        <parameter key="repository_entry" value="../../../Results/Attribute Weighting/Gini Index"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.2.008" expanded="true" height="94" name="Replace Missing Values" width="90" x="246" y="30">
        <list key="columns"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.2.008" expanded="true" height="112" name="Neural Net a" width="90" x="581" y="30">
        <parameter key="use_local_random_seed" value="true"/>
        <process expanded="true" height="506" width="399">
          <operator activated="true" class="neural_net" compatibility="5.2.008" expanded="true" height="76" name="Neural Net" width="90" x="154" y="30">
            <list key="hidden_layers"/>
          </operator>
          <connect from_port="training" to_op="Neural Net" to_port="training set"/>
          <connect from_op="Neural Net" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true" height="506" width="399">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" height="76" name="Performance (2)" width="90" x="226" y="30"/>
          <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
          <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Uncertainty" width="90" x="45" y="210">
        <parameter key="repository_entry" value="../../../Results/Attribute Weighting/Gini Index"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="5.2.008" expanded="true" height="94" name="Replace Missing Values (2)" width="90" x="246" y="210">
        <list key="columns"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="5.2.008" expanded="true" height="112" name="Neural Net a (2)" width="90" x="581" y="210">
        <parameter key="use_local_random_seed" value="true"/>
        <process expanded="true">
          <operator activated="true" class="neural_net" compatibility="5.2.008" expanded="true" name="Neural Net (2)">
            <list key="hidden_layers"/>
          </operator>
          <connect from_port="training" to_op="Neural Net (2)" to_port="training set"/>
          <connect from_op="Neural Net (2)" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" name="Apply Model (3)">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="5.2.008" expanded="true" name="Performance (3)"/>
          <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/>
          <connect from_op="Performance (3)" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="gini index" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Neural Net a" to_port="training"/>
      <connect from_op="Neural Net a" from_port="averagable 1" to_port="result 1"/>
      <connect from_op="Uncertainty" from_port="output" to_op="Replace Missing Values (2)" to_port="example set input"/>
      <connect from_op="Replace Missing Values (2)" from_port="example set output" to_op="Neural Net a (2)" to_port="training"/>
      <connect from_op="Neural Net a (2)" from_port="averagable 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

blueearth · November 2012

can someone please explain if anything is wrong or if there is a something i should know about process?

MariusHelf · November 2012

Hey, you have enabled the shuffle option in the Neural Net operators, which implies the use of random numbers. If you use the same local random seed in both Neural Net operators, you should get identical results.

Best, Marius

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Same accuracy...different predictions"

Answers