The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Text Categorization and BinomialClassificationPerformance
Hi!
I am experimenting with the text categorization on the Reuters dataset. In this dataset, each text might belong to several categories or not belong to a category at all. So I made a binary classifier for each category which predicts whether a text belongs to the given category (positive cases) or not (negative cases).
Problem is that some categories contain very few positive cases. So, if I measure performance of a classifier, there might be a few or even none positive examples in the testing set. Currently I use BinominalClassificationPerformance to measure performance and for some categories I get "unknown" values as a measure, for example:
precision: unknown (positive class: barley_pos)
ConfusionMatrix:
True: barley_neg barley_pos
barley_neg: 458 1
barley_pos: 0 0
recall: 0.00% (positive class: barley_pos)
ConfusionMatrix:
True: barley_neg barley_pos
barley_neg: 458 1
barley_pos: 0 0
If I look at the confusion matrix above, I see that all the negative cases were predicted correctly as negative, so I am not sure that I can agree with the results showing poor or unknown performance. The question is, how to correctly measure performance in such cases?
I am experimenting with the text categorization on the Reuters dataset. In this dataset, each text might belong to several categories or not belong to a category at all. So I made a binary classifier for each category which predicts whether a text belongs to the given category (positive cases) or not (negative cases).
Problem is that some categories contain very few positive cases. So, if I measure performance of a classifier, there might be a few or even none positive examples in the testing set. Currently I use BinominalClassificationPerformance to measure performance and for some categories I get "unknown" values as a measure, for example:
precision: unknown (positive class: barley_pos)
ConfusionMatrix:
True: barley_neg barley_pos
barley_neg: 458 1
barley_pos: 0 0
recall: 0.00% (positive class: barley_pos)
ConfusionMatrix:
True: barley_neg barley_pos
barley_neg: 458 1
barley_pos: 0 0
If I look at the confusion matrix above, I see that all the negative cases were predicted correctly as negative, so I am not sure that I can agree with the results showing poor or unknown performance. The question is, how to correctly measure performance in such cases?
Tagged:
0
Answers
accuracy is not the best measure for very unbalanced class distributions. Therefore the measures precision and recall try to capture additional information about the quality of text classifiers. In your example case, the recall of the minority class was 0%, i.e. no positive example of the rare class was recognized, while the recall of the negative cases was 100% and the accuracy was also nearly 100%. Often rare classes or events are important and then the recall on these classes is very important.
To avoid test sets without cases of the rare class, you can use cross-validation (XValdiation operator in RapidMiner) with stratified sampling. Of course you need to make sure that you provide at least as many examples of the rare class as the number of folds (iterations) in the cross-validation.
To put more emphasis on rare classes, you can use MetaCost to assign higher costs (importance) to minority classes. MetaCost is a meta-learner and can be wrapped around any classification learner. Alternatively you can use cost-sensitive learners. JMySVM for example supports different weights for the positive and the negative class in binary classification settings. Or you could use learning techniques able to handle weights, like DecisionTree and NaiveBayes, and assign individual weights to all examples.
I hope this helps.
Best regards,
Ralf
thank you for explanations and suggestions.
Despite that we can calculate recall and precision separately for positive and negative classes based on what we see in a confusion matrix, final report for BinominalClassificationPerformance (produced by ResultWriter) still shows only recall and precision of the positive class. Is there a way to report automatically these measures for both classes (the positive and the negative)?
Another question would be about aggregating results. Are there any operators to take - for example - an average over multiple PerformanceVectors? So I could find average recall and precision of all the classifiers I am using.
the first problem might be solved using some trick: You could switch the internal binominal mapping, and hence the value that is seen as positive using the remap binominal operator from RM 5. So you if you define both values to be once positive and negative, you should be able to calculate both recalls.
Yes this is possible using the Average operator of RM 5.
By the way, both operators are available in RM 4.x but are called differently.
Greetings,
 Sebastian