The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
T-Test for performance comparison
Hi guys,
I have a question about standard T-TEST operator (not the one from Statistics extension).
How exactly does it compare performance vectors and does the result depend on a certain main performance criterion, or does it compare all the available performance metrics at once (so, actually comparing vectors and not a single value)?
I am asking because I am not really able to get anything but 1.000 in a significance matrix for different algorithms and settings, given that they are evaluated on a same dataset. I've been trying different models like GLM, tree models, deep learning etc, and the result is always the same. Does that mean that there's actually no statistically significant difference out there?
Another concern, can I use T-test for comparing performance from different folds in cross-validation, or it doesn't make sense at all? I am doing 10-fold validation and store the performance of each fold for later analysis and comparison. And here's what I get:
Significance matrix (shows accuracy values by default)
Performance metrics for each fold performance
Same metrics on graph
I can change settings of a learner and get much worse F-score with higher variance, still the significance matrix would be the same for that case also:
It's hard to tell visually whether those are actually close enough of there's some difference (for example, F-score deviates within visible interval). But all '1.000's confuse me a bit... so where the significant difference should actually start from? Or maybe I am doing something fundamentally wrong here?
Thanks.
I have a question about standard T-TEST operator (not the one from Statistics extension).
How exactly does it compare performance vectors and does the result depend on a certain main performance criterion, or does it compare all the available performance metrics at once (so, actually comparing vectors and not a single value)?
I am asking because I am not really able to get anything but 1.000 in a significance matrix for different algorithms and settings, given that they are evaluated on a same dataset. I've been trying different models like GLM, tree models, deep learning etc, and the result is always the same. Does that mean that there's actually no statistically significant difference out there?
Another concern, can I use T-test for comparing performance from different folds in cross-validation, or it doesn't make sense at all? I am doing 10-fold validation and store the performance of each fold for later analysis and comparison. And here's what I get:
Significance matrix (shows accuracy values by default)
Performance metrics for each fold performance
Same metrics on graph
I can change settings of a learner and get much worse F-score with higher variance, still the significance matrix would be the same for that case also:
It's hard to tell visually whether those are actually close enough of there's some difference (for example, F-score deviates within visible interval). But all '1.000's confuse me a bit... so where the significant difference should actually start from? Or maybe I am doing something fundamentally wrong here?
Thanks.
Tagged:
1
Best Answer
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderAs far as I can tell, your performances do not have standard deviations which is a requirement for T-Tests. I am actually surprised that you are getting any values at all :-)And yes, it uses the average and s.d. of the main criterion.can I use T-test for comparing performance from different folds in cross-validation, or it doesn't make sense at all?No, exactly for the reason that you would only get standard deviations after the average building.Anyway, I have attached a little sample process as well below.Hope that helps,
Ingo<div><?xml version="1.0" encoding="UTF-8"?><process version="9.4.000-SNAPSHOT"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.4.000-SNAPSHOT" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="retrieve" compatibility="9.4.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="85"><br> <parameter key="repository_entry" value="//Samples/data/Sonar"/><br> </operator><br> <operator activated="true" class="multiply" compatibility="9.4.000-SNAPSHOT" expanded="true" height="103" name="Multiply" width="90" x="179" y="85"/><br> <operator activated="true" class="concurrency:cross_validation" compatibility="9.4.000-SNAPSHOT" expanded="true" height="145" name="Validation DT" width="90" x="313" y="85"><br> <parameter key="split_on_batch_attribute" value="false"/><br> <parameter key="leave_one_out" value="false"/><br> <parameter key="number_of_folds" value="10"/><br> <parameter key="sampling_type" value="stratified sampling"/><br> <parameter key="use_local_random_seed" value="true"/><br> <parameter key="local_random_seed" value="1992"/><br> <parameter key="enable_parallel_execution" value="true"/><br> <process expanded="true"><br> <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.4.000-SNAPSHOT" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34"><br> <parameter key="criterion" value="gain_ratio"/><br> <parameter key="maximal_depth" value="10"/><br> <parameter key="apply_pruning" value="true"/><br> <parameter key="confidence" value="0.1"/><br> <parameter key="apply_prepruning" value="true"/><br> <parameter key="minimal_gain" value="0.01"/><br> <parameter key="minimal_leaf_size" value="2"/><br> <parameter key="minimal_size_for_split" value="4"/><br> <parameter key="number_of_prepruning_alternatives" value="3"/><br> </operator><br> <connect from_port="training set" to_op="Decision Tree" to_port="training set"/><br> <connect from_op="Decision Tree" from_port="model" to_port="model"/><br> <portSpacing port="source_training set" spacing="0"/><br> <portSpacing port="sink_model" spacing="0"/><br> <portSpacing port="sink_through 1" spacing="0"/><br> <description align="left" color="green" colored="true" height="80" resized="true" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description><br> </process><br> <process expanded="true"><br> <operator activated="true" class="apply_model" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"><br> <list key="application_parameters"/><br> <parameter key="create_view" value="false"/><br> </operator><br> <operator activated="true" class="performance" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Performance" width="90" x="179" y="34"><br> <parameter key="use_example_weights" value="true"/><br> </operator><br> <connect from_port="model" to_op="Apply Model" to_port="model"/><br> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/><br> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/><br> <connect from_op="Performance" from_port="performance" to_port="performance 1"/><br> <connect from_op="Performance" from_port="example set" to_port="test set results"/><br> <portSpacing port="source_model" spacing="0"/><br> <portSpacing port="source_test set" spacing="0"/><br> <portSpacing port="source_through 1" spacing="0"/><br> <portSpacing port="sink_test set results" spacing="0"/><br> <portSpacing port="sink_performance 1" spacing="0"/><br> <portSpacing port="sink_performance 2" spacing="0"/><br> <description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).<br/>The performance is evaluated and sent to the operator results.</description><br> </process><br> <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a decision tree model.</description><br> </operator><br> <operator activated="true" class="concurrency:cross_validation" compatibility="9.4.000-SNAPSHOT" expanded="true" height="145" name="Validation GLM" width="90" x="313" y="340"><br> <parameter key="split_on_batch_attribute" value="false"/><br> <parameter key="leave_one_out" value="false"/><br> <parameter key="number_of_folds" value="10"/><br> <parameter key="sampling_type" value="stratified sampling"/><br> <parameter key="use_local_random_seed" value="true"/><br> <parameter key="local_random_seed" value="1992"/><br> <parameter key="enable_parallel_execution" value="true"/><br> <process expanded="true"><br> <operator activated="true" class="h2o:generalized_linear_model" compatibility="9.3.001" expanded="true" height="124" name="Generalized Linear Model" width="90" x="45" y="34"><br> <parameter key="family" value="AUTO"/><br> <parameter key="link" value="family_default"/><br> <parameter key="solver" value="AUTO"/><br> <parameter key="reproducible" value="false"/><br> <parameter key="maximum_number_of_threads" value="4"/><br> <parameter key="use_regularization" value="true"/><br> <parameter key="lambda_search" value="false"/><br> <parameter key="number_of_lambdas" value="0"/><br> <parameter key="lambda_min_ratio" value="0.0"/><br> <parameter key="early_stopping" value="true"/><br> <parameter key="stopping_rounds" value="3"/><br> <parameter key="stopping_tolerance" value="0.001"/><br> <parameter key="standardize" value="true"/><br> <parameter key="non-negative_coefficients" value="false"/><br> <parameter key="add_intercept" value="true"/><br> <parameter key="compute_p-values" value="false"/><br> <parameter key="remove_collinear_columns" value="false"/><br> <parameter key="missing_values_handling" value="MeanImputation"/><br> <parameter key="max_iterations" value="0"/><br> <parameter key="specify_beta_constraints" value="false"/><br> <list key="beta_constraints"/><br> <parameter key="max_runtime_seconds" value="0"/><br> <list key="expert_parameters"/><br> </operator><br> <connect from_port="training set" to_op="Generalized Linear Model" to_port="training set"/><br> <connect from_op="Generalized Linear Model" from_port="model" to_port="model"/><br> <portSpacing port="source_training set" spacing="0"/><br> <portSpacing port="sink_model" spacing="0"/><br> <portSpacing port="sink_through 1" spacing="0"/><br> <description align="left" color="green" colored="true" height="80" resized="false" width="248" x="36" y="183">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description><br> </process><br> <process expanded="true"><br> <operator activated="true" class="apply_model" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34"><br> <list key="application_parameters"/><br> <parameter key="create_view" value="false"/><br> </operator><br> <operator activated="true" class="performance" compatibility="9.4.000-SNAPSHOT" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="34"><br> <parameter key="use_example_weights" value="true"/><br> </operator><br> <connect from_port="model" to_op="Apply Model (2)" to_port="model"/><br> <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/><br> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/><br> <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/><br> <connect from_op="Performance (2)" from_port="example set" to_port="test set results"/><br> <portSpacing port="source_model" spacing="0"/><br> <portSpacing port="source_test set" spacing="0"/><br> <portSpacing port="source_through 1" spacing="0"/><br> <portSpacing port="sink_test set results" spacing="0"/><br> <portSpacing port="sink_performance 1" spacing="0"/><br> <portSpacing port="sink_performance 2" spacing="0"/><br> <description align="left" color="blue" colored="true" height="103" resized="false" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).<br/>The performance is evaluated and sent to the operator results.</description><br> </process><br> <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a GLM model.</description><br> </operator><br> <operator activated="true" class="t_test" compatibility="9.4.000-SNAPSHOT" expanded="true" height="124" name="T-Test" width="90" x="514" y="136"><br> <parameter key="alpha" value="0.05"/><br> </operator><br> <connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/><br> <connect from_op="Multiply" from_port="output 1" to_op="Validation DT" to_port="example set"/><br> <connect from_op="Multiply" from_port="output 2" to_op="Validation GLM" to_port="example set"/><br> <connect from_op="Validation DT" from_port="performance 1" to_op="T-Test" to_port="performance 1"/><br> <connect from_op="Validation GLM" from_port="performance 1" to_op="T-Test" to_port="performance 2"/><br> <connect from_op="T-Test" from_port="significance" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process></div>
6