The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Interpretation of ROC Analysis
Muhammed_Fatih_
Member Posts: 93 Maven
Hello Community,
I have derived the following ROC curves by considering four classification models:
As you see, SVM and k-NN generates a curve where shades respectively exist.
Would it be a correct implication out of the graph to say that only k-NN and SVM were able to learn based on the given dataset and the resting two (DT and NB) were not?
What does the shade mean in detail? I would interpret them as the learning interval deviation which generated the curve between the shade course in mean.
I thank you in advance for your help!
Best regards,
Fatih
I have derived the following ROC curves by considering four classification models:
As you see, SVM and k-NN generates a curve where shades respectively exist.
Would it be a correct implication out of the graph to say that only k-NN and SVM were able to learn based on the given dataset and the resting two (DT and NB) were not?
What does the shade mean in detail? I would interpret them as the learning interval deviation which generated the curve between the shade course in mean.
I thank you in advance for your help!
Best regards,
Fatih
1
Best Answer
-
varunm1 Member Posts: 1,207 UnicornHello @Muhammed_Fatih_Do you think that the marked ROC course is common if the ROC curve goes hand in hand with the optimum?Yep, you got an optimal curve.
Is it common? Not very common in my works, but I got some ideal results in my studies. Most of the times it relates to a strong hypothesis and what we are looking for in data.
Can you get an optimal ROC?
Yes, if the data is very good for the model to train and predict.
I understand the reason you are skeptical about good results and it is good to be worried when we get good results. You should analyze deeply when you get these sorts of results. There are many reasons why a model can give very good results, you should carefully check your data, model building and your hypothesis to see if there are no conceptual error while building a model.
You should consider some pitfalls in analysis, like ignoring temporal relations in data and predictor which is a replica of the target variable. There are many others that you can search on google.
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
7
Answers
you can watch this video and I hope can help you
https://academy.rapidminer.com/learn/video/finding-the-right-model
All the best
mbs
Are you sure Decision tree and NB are not learning? I see that their AUC values are 1 or closer to 1 based on the ROC curves. If what I think is correct, then DT and NB are discriminating classes with very high accuracy compared to SVM and KNN.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
thank you for the link!
Helloo @varunm1,
I am not sure whether they learn or not. But it looks like an indicator for Overfitting when I see that such high values are reached in comparison to SVM and k-NN. How do you see that? Would you interprete DT and NB also as appropriate solutions here? If yes, why?
I can comment that based on data and the type of analysis you were doing. If its a split validation, then there is a chance you might get high performance like this randomly. There are also other factors like temporal characteristics in data and many other checks that you need to do when you get this kind of very good results.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
thank you for your answer! I have used Cross Validation because studies have shown that it generates more accurate predictions in comparison to Split validation.
Cross-validation is a good validation method, but if your data has some temporal (time-dependent) characteristics and confounding relationships then it might overestimate performance some times. But if you think there is none, then the models might be doing good. Different models work well for different types of data.
You can also split your original data 70:30 or 80:20 based on the size of your data and then cross-validated on the major portion and test the minor portion to see how the model is doing.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Hello
This is for more information . A good article from @sgenzer
https://community.rapidminer.com/discussion/55112/cross-validation-and-its-outputs-in-rm-studio
Good luck
hello @mbs,
thank you for your answers!
To come back and to refine the initial question: Do you think that the marked ROC course is common if the ROC curve goes hand in hand with the optimum? Is this possible in general?