F-measure should handle edge scenarios
F-measure is undefined when all the samples are negative and they are all predicted to be negative. However, they are situations when it still makes sense to assign some value to this edge scenario:
In some rare cases, the calculation of Precision or Recall can cause a division by 0. Regarding the precision, this can happen if there are no results inside the answer of an annotator and, thus, the true as well as the false positives are 0. For these special cases, we have defined that if the true positives, false positives and false negatives are all 0, the precision, recall and F1-measure are 1. This might occur in cases in which the gold standard contains a document without any annotations and the annotator (correctly) returns no annotations. If true positives are 0 and one of the two other counters is larger than 0, the precision, recall and F1-measure are 0.
Furthermore, it looks like F-measure in Rapidminer is calculated by taking the harmonic average of precision and recall. And that results in undefined results when precision or recall is undefined. However, when only true positives and false negatives are 0 and the rest are not 0, the formula for F-measure in terms of Type I and type II errors is defined. An example of such a scenario is in the attachment. Also, when only true positives and false positives are 0 and the rest are not 0, F-measure is defined but Rapidminer returns an error.
Proposal: Make F-measure to handle edge scenarios. Sure enough, if one of such edge scenarios is encountered, the performance operator should log a warning.
Discussion: It is definitely the best to avoid these degenerate scenarios. However, whenever I am doing some wild optimization, I may land in one of the edge scenarios (note that it happens even when both classes are present - it is enough when all the classifier predicts all samples as negative). And optimization operators (like "Optimize Selection") do not know how to handle unknown measure values. Undefined values also cause problems in operators that aggregate measures (like in "Cross Validation").
Comments