The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Append-Operator in Testing Phase of X-Validation changes confusion mattrix"
Hi,
I am working on a classification problem where I have 3 classes [good (180), mediocre (4535), bad (183)]. (#number of examples in that class)
In my rapidminer process I only learn a model for "good" and "bad" and in the testing phase I want to modify the prediction depending on the confidence of my classifier. So I am filtering out all examples with low confidence and assign them to the "default class" "mediocre".
In order to do this reassignment I use a "filter example" operator together with a "replace" operator.
My problem is:
If I run my process without my reassignment step (i.e. filtering and replacing) I get the expected values for true good (180), true mediocre(4535) and true bad (183) in my confusion matrix. However, if I do the reassignment my confusion matrix yields unexpected values for true good, mediocre and bad.
Why is that happening?
My process as follows:
In the above process I first used "Append" and then changed it to the "Union" operator, however I am still having the same problem.
Am I doing anything wrong?
Thanks in advance for your help!!!
I am working on a classification problem where I have 3 classes [good (180), mediocre (4535), bad (183)]. (#number of examples in that class)
In my rapidminer process I only learn a model for "good" and "bad" and in the testing phase I want to modify the prediction depending on the confidence of my classifier. So I am filtering out all examples with low confidence and assign them to the "default class" "mediocre".
In order to do this reassignment I use a "filter example" operator together with a "replace" operator.
My problem is:
If I run my process without my reassignment step (i.e. filtering and replacing) I get the expected values for true good (180), true mediocre(4535) and true bad (183) in my confusion matrix. However, if I do the reassignment my confusion matrix yields unexpected values for true good, mediocre and bad.
Why is that happening?
My process as follows:
Through a bit of debugging the operators I found out that if you just add an "Append" operator with only one input (the actual output of "apply model" nothing else) in the testing phase of X-Validation the confusion matrix yields wrong values for true <classname>.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="5.3.015" expanded="true" height="60" name="Retrieve DataSet-WhiteWine" width="90" x="45" y="30">
<parameter key="repository_entry" value="//Local Repository/data/GroupProject_WineQuality_White"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.3.015" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
<parameter key="attribute_name" value="quality"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="generate_id" compatibility="5.3.015" expanded="true" height="76" name="Generate ID" width="90" x="313" y="30">
<parameter key="create_nominal_ids" value="false"/>
<parameter key="offset" value="0"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.015" expanded="true" height="94" name="Normalize" width="90" x="447" y="30">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="method" value="range transformation"/>
<parameter key="min" value="0.0"/>
<parameter key="max" value="1.0"/>
</operator>
<operator activated="true" class="discretize_by_user_specification" compatibility="5.3.015" expanded="true" height="94" name="Discretize" width="90" x="581" y="30">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="quality"/>
<parameter key="attributes" value="quality"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="numeric"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="real"/>
<parameter key="block_type" value="value_series"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_series_end"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="attribute_type" value="nominal"/>
<list key="classes">
<parameter key="bad" value="4.0"/>
<parameter key="mediocre" value="7.0"/>
<parameter key="good" value="10.0"/>
</list>
</operator>
<operator activated="true" class="x_validation" compatibility="5.3.015" expanded="true" height="112" name="Validation" width="90" x="715" y="30">
<parameter key="create_complete_model" value="false"/>
<parameter key="average_performances_only" value="true"/>
<parameter key="leave_one_out" value="false"/>
<parameter key="number_of_validations" value="30"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1985"/>
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="quality != mediocre"/>
<parameter key="invert_filter" value="false"/>
</operator>
<operator activated="true" class="naive_bayes" compatibility="5.3.015" expanded="true" height="76" name="Naive Bayes" width="90" x="179" y="30">
<parameter key="laplace_correction" value="true"/>
</operator>
<connect from_port="training" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_op="Naive Bayes" to_port="training set"/>
<connect from_op="Naive Bayes" from_port="model" to_port="model"/>
<portSpacing port="source_training" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="5.3.015" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="76" name="Multiply" width="90" x="179" y="30"/>
<operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (3)" width="90" x="112" y="210">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="confidence(bad)<0.999 && confidence(good)<0.99"/>
<parameter key="invert_filter" value="false"/>
</operator>
<operator activated="true" breakpoints="after" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="246" y="255">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="confidence(bad)<0.999 && confidence(good)<0.99"/>
<parameter key="invert_filter" value="true"/>
</operator>
<operator activated="true" breakpoints="after" class="replace" compatibility="5.3.015" expanded="true" height="76" name="Replace (3)" width="90" x="246" y="165">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="prediction(quality)"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="replace_what" value="bad|good"/>
<parameter key="replace_by" value="mediocre"/>
</operator>
<operator activated="true" class="union" compatibility="5.3.015" expanded="true" height="76" name="Union" width="90" x="447" y="210"/>
<operator activated="true" class="select_attributes" compatibility="5.3.015" expanded="true" height="76" name="Select Attributes" width="90" x="380" y="30">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="|quality|prediction(quality)"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="5.3.015" expanded="true" height="76" name="Performance" width="90" x="514" y="30">
<parameter key="main_criterion" value="first"/>
<parameter key="accuracy" value="true"/>
<parameter key="classification_error" value="false"/>
<parameter key="kappa" value="false"/>
<parameter key="weighted_mean_recall" value="false"/>
<parameter key="weighted_mean_precision" value="false"/>
<parameter key="spearman_rho" value="false"/>
<parameter key="kendall_tau" value="false"/>
<parameter key="absolute_error" value="false"/>
<parameter key="relative_error" value="false"/>
<parameter key="relative_error_lenient" value="false"/>
<parameter key="relative_error_strict" value="false"/>
<parameter key="normalized_absolute_error" value="false"/>
<parameter key="root_mean_squared_error" value="false"/>
<parameter key="root_relative_squared_error" value="false"/>
<parameter key="squared_error" value="false"/>
<parameter key="correlation" value="false"/>
<parameter key="squared_correlation" value="false"/>
<parameter key="cross-entropy" value="false"/>
<parameter key="margin" value="false"/>
<parameter key="soft_margin_loss" value="false"/>
<parameter key="logistic_loss" value="false"/>
<parameter key="skip_undefined_labels" value="true"/>
<parameter key="use_example_weights" value="true"/>
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Filter Examples (3)" to_port="example set input"/>
<connect from_op="Filter Examples (3)" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
<connect from_op="Filter Examples (3)" from_port="original" to_op="Filter Examples (2)" to_port="example set input"/>
<connect from_op="Filter Examples (2)" from_port="example set output" to_op="Union" to_port="example set 2"/>
<connect from_op="Replace (3)" from_port="example set output" to_op="Union" to_port="example set 1"/>
<connect from_op="Union" from_port="union" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_averagable 1" spacing="0"/>
<portSpacing port="sink_averagable 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve DataSet-WhiteWine" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Discretize" to_port="example set input"/>
<connect from_op="Discretize" from_port="example set output" to_op="Validation" to_port="training"/>
<connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
In the above process I first used "Append" and then changed it to the "Union" operator, however I am still having the same problem.
Am I doing anything wrong?
Thanks in advance for your help!!!
Tagged:
0
Answers
I've created an example process with the iris data set where i learn on two classes and assign the "unsure" predictions (between 0.3 and 0.7) to the third This works for me quite well. I hope you can use this as a template
Best,
Martin
Dortmund, Germany
thanks for your reply. Could you please elaborate on your process, i.e. why is at necessary to rename the attributes which where generated by RapidMiner itself?
Also, I tried to adopt your approach to my problem. However, I get same issue.
I found out, that it somehow is related to the "Append" operator.
I created an example using the Weighting data., If you look at this process, please: You will see an "Append"-Operator in the Training-Phase which only has one input - hence it shouldn't do anything. However, if you compare the confusion matrix of the process with and without the "Append"-Operator you will notice a difference.
The correct confusion matrix (in terms of the amount of true positives and true negatives ) is the one of the process without the "Append"-Operator. The other one yields a wrong number of total true positives and true negatives.
Any idea why? Also, what do I need to do to use the Append-Operator on a data set with in total about 5000 data points?
Thanks,
Muhammad
the Append operator is modifing the meta data.. Thus there are some changes - but i am currently not sure how it effects the performance operator
Regarding my process:
Generate attributes can not handle attributes with brackets, minus,plus or whitespaces, because they are interpreted as part of the formula, thus i needed to replace them.
Dortmund, Germany