Improve Recall for certain classes / Rescall Confidences multiclass classification
Dear all,
I' m working on a classification model to predict multiple classes. Concerning i have two questions.
1) I there a way to force the algorithm/model to focus only on some classes and to improve the recall probabilities for those classes? I have learned the "Threshold"-operators are only usable for binominal classification problems. Another way would be to use the “Meta Cost”- Operator, but about the high number of classes (around 30), i would be prefer a more automated way. Do you have any ideas how i can handle this issue?
2) I build up a classification model with the "Fast Large Margin"-algorithm nested in the "Polynomial by Binomial Classification"-Operator. The prediction results sum up the confidences for the different classes. From my knowledge, these confidences contain the distance to the separating hyperplane. Is there a way to calculate the probabilities for each class?
I know for binominal classification problems the “Rescale Confidences”- Operator can be used and LibSVM as another SVM method provides the possibility to estimate the probabilities. However, in due to better results with the Fast Large Margin algorithm I would like to calculate these probabilities explicitly. Do you know a solution how to build up this process in RapidMiner?
Thanks in advance
Michel
Answers
1) With what I understood, I guess you just want to consider a few predictors, in which case "Select Attributes" operator seems to be a good bet. In case you would want to skim of the more important wants, I would say PCA could do that for you.
2) To find probabilities, you would basically need to tag your attributes as label in the read csv operator, or you can use "Add Role" operator, and run any model on them. The output in the form of confidence would basically be the probaility.
Hope it helps !!
Hi @ME1
Another option besides the mentioned might be using 'Polynomial by Binomial Classification' operator which would enable you to use binomial classifier to handle polynomial label.
Vladimir
http://whatthefraud.wtf
Dear @cbarua2, dear @kypexin ,
Thanks for your answers. I probably did not explain my questions in the right way. I already selected the attributes by some weighting and wrapper methods and built a model with an overall accuracy of 70 %. Like i mentioned we used the Polynomial by Binomial Classification' operator and others.
1) However, in the performance results of the models, some predicted classes have a high value for the recall and some a very low. In the use case we focus on, we only want to predict the classes with a high recall value resp. the cases with a high confidence. My Question is, if there is a way to force the algorithm only to focus on the classes, we would like to predict, which have a high recall value (e.g. 80 %) and to improve the recall for those classes. So in the end this would mean the overall accuracy will probably go down, but we are able to predict those cases, we would like to focus on, with higher confidences. Like i mentioned, there are some ways like the threshold-operators and Meta cost operator to do this. However, they are not applicable in the use case we focus on.
2) Like I described we already built a classification model with the "Fast Large Margin"-algorithm nested in the "Polynomial by Binomial ) Classification"-Operator. Here you get the confidences values. These values for the Fast Large Margin algorithm are the distances to the separating hyperplane and not the real probabilities.
However, to deploy the Fast Large Margin model we need the probabilities. If you focus on binominal classification problems the “Rescale Confidences”- Operator can be used or you can use the LibSVM as another SVM method, which provides the possibility to estimate the probabilities. However, in due to better results with the Fast Large Margin algorithm I would like to calculate these probabilities explicitly. Do you know a solution how to build up this process in RapidMiner?
I hope I get the points right. Thanks for your help.
Michel
Dear Michel,
quick thoughts from my end:
1. It's "problematic" to call a confidence propability. I usually would not directly identify the one with the other. It just tells you how sure the algorithm is.
2. Algorithms tend to classifiy majority classes better. You can increase the number of cases in your priority class to get better classification. This can be done with weights or sampling. In case of Weights you can even scale up individual high important examples.
3. You might want to consider to build your own performance measure reflecting the severity of single examples (hint: Extract Performance is the Operator you need)
4. Think about adjusting your thresholds by hand. It's hard for a multiclass scenario, but still you do some logic here.
5. Maybe try a "real" SVM and not Fast Margin .
Cheers,
Martin
Dortmund, Germany