The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Multiclass evaluation problem
Hi!
I´m using a cross-validation for a multiclass classification problem and I'm using a performance evaluation model for multiclass. I'd like to obtain the AUC measure but it is only possible for binary class. How can I do this? Can anybody help me?
Thank you,
Silvana.
I´m using a cross-validation for a multiclass classification problem and I'm using a performance evaluation model for multiclass. I'd like to obtain the AUC measure but it is only possible for binary class. How can I do this? Can anybody help me?
Thank you,
Silvana.
0
Answers
I assume you referring to a nominal classification problem.
As opposed to a bi-nominal classification problem.
The term multi-class is sometimes used to refer to problems where an instances can belong to multiple classes, e.g. some instances belonging to both the sports and football class.
AUC, area under ROC curve.
ROC, Receiver Operating Characteristic, true positive rate plotted against the false positive rate.
True positive rate, and true negative rate are only defined for bi-nominal classification problems.
You could transform your problem into a bi-nominal classification problem.
Lets say you have three classes A, B and C.
You can then make 3 different datasets with classes A and not A, B and not B, C and not C.
And then obtain the AUC score on each dataset.
I doubt this is a useful measure though.
Have you considered the Cohen's kappa coefficient as a measure of performance?
Or maybe area under a learning curve?
This is a solution to implement a one against all strategy in order to obtain AUC results for multinomial problem. Nevertheless, this is painfull if we are dealing with databases.
Please let me know if this could be solved with some few operators.
Otherwise, you can count with me to collaborate in order to improve some operator.
Thanks in advance for your help,
I don't think there is no easy way to get AUC for multi class problem in Rapid Miner.
Strange
I'm able to modify the AUC performance evaluation operator and put a check box like "use one against all strategy" , and implement this strategy in order to compute AUC performance for multinomial problem.s
I'll let you know when done.
Best regards,
There's no need to change any operator, or even migrate to weka in order to obtain AUC for polynomial classification problems. As Sebastian Land told me, the Rapidminer team aims to cover most of the use cases for data mining. Just use the Polynomial by Binomial Classification, and this operator will perform a 1-against-all strategy and other strategies.
This topic has an example process showing how to use this operator:
http://rapid-i.com/rapidforum/index.php?topic=2505.0
Hi all,
There's a long time since I don't post anytime here. Last time I made this last comment, saying the polynomial by binomial classification will solve this problem, but in fact it didn't.
Even right now with RM 7, it;s not possible to obtain a simple AUC or f-measure with a multiclass problem, for example, the Iris dataset, the most fundamental of machine learning problems.
You know there's mathematical formulation to calculate other evaluation measurements for multiclass problems (called by RM as binomial and polynomial problems). In RM there is a clear distinction what operators can manipulate binomial and a polynomial classification problems, but this distinction should not exist anymore.
Hi,
i am curious, how do you calculate AUC for a polynominal problem? And why don't you take logloss?
Best,
Martin
Dortmund, Germany
In my experience, using AUC for multi-class classification problems is typically done by looking at separation of one class vs all others. It would certainly be nice if RapidMiner did this automatically for polynominal labels, but it should be able to be done manually by remapping labels after the fact, I think.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi Martin,
Thanks for your reply.
I'm a RapidMiner fan, and I here to help some how. Sincerelly I thought this problem was solved in the newest versions, but I think it's not fair to say an operator does not work, without a deep research, so, my apologies. Nevertheless, so far I could not find a way to calculate AUC and f-measure for a multiclass problem, using for example the IRIS dataset, with 3 classes. What I could notice in other tools is: for f-measure, they make an average or a weighted average, and other choices of statistical mean for f-measure. For AUC basicall they are following the formulation existing in the literature:
Fawcett, T., 2006, An introduction to ROC analysis, Pattern Recognition Letters 27, 861-874.
Hand, D.J., Till, R.J., 2001. A simple generalization of the area under the ROC curve to multiple class classification problems. Machine Learning 45 (2), 171-186.
If you could point me out what operators should I use to do this in RapidMiner it will be great, as I'm evaluating this for a company.
To overcome this problem in my personal research with multiclass problem I made an operator to integrate the RM predictions with the Weka evaluator, then I've got all the predictive results from Rapidminer, measured by the Weka classifier evaluator. However, just as a constructive thought, even if the solution for this is already there in RapidMiner, I think it's time to rethink how to do this rather than using the concept of binomial and polynomial, as others DM tools are progressing well without this.
Hi again,
i need to disagree. I do think that the difference between polynominal and binominal is pretty clear. There are simply models like an SVM which can not cope with binominal data. That is a fact. LibSVM is simply using a internal wrapper to do one-vs-all and thus make a binominal algorithm runable on a polynominal data set. That is fully supported with Polynominal by Binominal Classification operator
The point you raise, is that you would also like to be able to use a binominal performance measure for a polynominal problem. While i see that this is a possible approach, i would argue that logloss is a better measure. The missing operator in RM would be Polynominal by Binominal Performance, which would be a similar nested operator like the learner. Should not be hard to write.
~Martin
Dortmund, Germany
Hi,
I could make a process to calculate AUC and F-Measure for 3 classes with Iris dataset (Thanks Brian T. for the tip with Map), using the Polynomial by Binomial Classification and SVM, find below. You can also get the ROC plots by class. The Polynomial by Binomial Classification is working fine.
The con with this process is: to calculate the performaces with Performance Binomial I need to remap each label agains all, to calculate the performance averages by class I transformed the performance to data, but it delivers what is wanted.
In order to contribute to improve this scenario somehow, I'm preparing my Weka Classification Measure Operator to output performance (right now it returns data), then I'll offer that code to be integrated to Weka extension.
Anyway I'm packaging my Weka Classification Measure Operator as is in the next days, and post it here.
Regards,