Correlation in classification model - how to sort classes
Hello all,
I have a classification problem to solve. There are 10 classes (1, 2, 3, 4, ... , 10) to be predicted and I want to optimize my model parametres by highest correlation since in real life class 1 should have relatively similar characteristics to class 2 and at the same time very low similarity to class 10.
If I understand correctly in the Performance(Classification) operator correlation is calculated as follows:
Cov(L,P) / sqrt(V(L)*V(P))
where: P=prediction, L=label, V=Variance, Cov=Covariance.
However when I treat label classes 1, 2, 3 etc. as polynominals, RapidMiner gives them quite random integer index (based on which the correlation is later calculated) which I cannot control. Therefore correlation is not calculated properly.
Is there any way to force RapidMiner to treat polynominal label 1 as 1 (index), label 2 as 2 (index) etc.?
Thanks in advance!
Best Answer
-
land RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
Hi,
sounds to me, that a cost based approach with non uniform cost matrix would be easier and safer as it would work in the way RapidMiner was designed for. Alternatively you can replace the nominal values AFTER prediction with numbers and calculate standard Performance (Regression) correlation.
Greetings,
Sebastian
1
Answers
I think the Map or Remap operator is what you will need.
Hello Sebastian,
Thank you for suggesting to convert nominal label and prediction to numerical value and then proceed with the performance (regression) operator. It seems like a immediate solution to the problem. However, can you elaborate more on the cost based approach?
Thanks!
Actually I found the answer to "cost based approach" myself. Instead of using performance (classification) operator one could go for performance (costs) operator and set-up proper weights accordingly.
Thank you anyway!
Adam