The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Problems discovering which data could not be correctly classified.
the_duckman
Member Posts: 6 Contributor II
G'Day all,
I was trying something quite strait forward and it just wont work for me.
As per the title, I have some data, to which I trained a classifier, and wish to better understand my situation by examining the records which failed to classify. To (attempt to) do this, I re-apply the trained model to the original training data and use the "filter examples" operator to select the errant classifications. The problem is that I always get 0 examples returned in the final set.
I've spent the whole day trying to discover what I am doing wrong without any progress and could really use some assistance.
Below is a typical example (adapted to use sample data) of my attempts to do this using the "filter examples" operator,
On both of my machines this fails to discover the errant classification set.
Thanks for any help or Ideas,
-dm
I was trying something quite strait forward and it just wont work for me.
As per the title, I have some data, to which I trained a classifier, and wish to better understand my situation by examining the records which failed to classify. To (attempt to) do this, I re-apply the trained model to the original training data and use the "filter examples" operator to select the errant classifications. The problem is that I always get 0 examples returned in the final set.
I've spent the whole day trying to discover what I am doing wrong without any progress and could really use some assistance.
Below is a typical example (adapted to use sample data) of my attempts to do this using the "filter examples" operator,
On both of my machines this fails to discover the errant classification set.
Thanks for any help or Ideas,
-dm
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="424" width="681">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="75">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="select_by_random" expanded="true" height="76" name="Select by Random" width="90" x="45" y="165">
<parameter key="use_fixed_number_of_attributes" value="true"/>
<parameter key="number_of_attributes" value="20"/>
</operator>
<operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="112" y="255"/>
<operator activated="true" class="x_validation" expanded="true" height="112" name="Validation" width="90" x="246" y="75">
<process expanded="true" height="442" width="268">
<operator activated="true" class="k_nn" expanded="true" height="76" name="k-NN (2)" width="90" x="112" y="75"/>
<connect from_port="training" to_op="k-NN (2)" to_port="training set"/>
<connect from_op="k-NN (2)" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="36"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true" height="442" width="279">
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="112" y="165"/>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model (2)" width="90" x="380" y="255">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="514" y="165">
<parameter key="condition_class" value="wrong_predictions"/>
</operator>
<connect from_op="Retrieve (2)" from_port="output" to_op="Select by Random" to_port="example set input"/>
<connect from_op="Select by Random" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Validation" to_port="training"/>
<connect from_op="Multiply" from_port="output 2" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
Actually there is nothing wrong with your code !!! You've applied the model to its own training data, so surprise surprise there are no errors; I've stuck a break in at the crucial juncture to make the point.. In the following I've split the data, and errors start creeping in, and being flagged... Hope that clears the fog!
The second bit of code works, but Its confusing me.
(I was not able to run the first bit of code, it caused the software to crash)
What I am finding confusing is that the models performance vector 79%. If it was trained under cross validation how can it have learnt the training data to 100%.
I think i am missing something here, any clarification would sure be appreciated.
-dm
and found out the whole training dataset is stored by the classifier.
Its all clear now and I am under way again, Thanks a lot for the help Haddock.
-dm
Cool, well done for thinking it through, definitely the way to go. Get back when you're next underwhelmed