Standard deviation on cross-validation

yzan · January 2018

Whenever we look at performance result obtained from cross-validation, there is a mean value of the selected measure and its standard deviation (marked with +/-). However, sometimes it happens that the selected measure does not get calculated on some of the folds (e.g. when all samples are classified as negative and we attempt to calculate precision or f-measure, we get division by zero and consequently, the measure is treated as missing). That is a perfectly reasonable behaviour. However, the mean value of the measure still gets reported in the presence of missing measurements. But standard deviation does not get reported anymore.

Proposal: Make the behaviour consistent and report nanmean and nanstd (i.e. ignore missing values and report both statistics).

Reasoning: It can be puzzling when you look at the performance results and standard deviations are suddenly missing (without any explanation), even though you are performing cross-validation and you are certain that RM used to report the standard deviation.

sgenzer · March 2018

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Standard deviation on cross-validation

Declined · Last Updated May 2019

Comments