Inconsistency of ROC curves

bernardo_pagnon · March 2020

Hello,

I generated a ROC curve for a logistic regression with a data set by using the performance operator, then clicking on criterion and AUC. Fine.

Then I used the same data set and use the Compare ROCs operator, picking logistic regression and decision tree as models. The ROC curves appear, and the ROC curve for the logistic regression is different from the one I obtained before! How can this be?

Best,
Bernardo

bernardo_pagnon · March 2020

What can I say?

1 - Big ,big thanks

2 - I was indeed training and testing on the same data to illustrate that one should not do that (it is for a class)

3 - Great idea of putting 1-fold to be able to compare both cases.

Best,

Bernardo

varunm1 · March 2020

Hello @bernardo_pagnon

Can you share the process here? You can download it using FILE --> Export process and attach .rmp file here. Please also attach the data. I suspect change in some samples of test data. Are you using same type of validation for both compare ROC and regular model with performance metric and with a random seed? I will check and let you know if provided with details of the process and data.

If you cant share it here, you can send me a PM with requested files

bernardo_pagnon · March 2020

There it goes!

varunm1 · March 2020

Hello @bernardo_pagnon

Thanks for sharing your process files and data.

I used the complete datasheet in the excel file attached, I believe its the correct file. Now coming to the problem.

Case 1 Process: In the case-1 process, I can see that you are training and testing on the same data. This is is not correct as you need to test on data that is independent of training data. If you are purposefully doing this for your requirement then it's fine.

Case-2 Process: In case 2, you were using compare ROC operator. Based on the parameter settings as shown below, it uses 10 fold cross-validation that divided your dataset into 10 subsets and train on 9 subsets and test on 1 subset. This will happen until all subsets were tested and final performance is an aggregate of performance from all subsets.

Image: https://us.v-cdn.net/6030995/uploads/editor/2q/undarl87o8uf.png

This is the reason you are getting different ROC curves. As your test data are different and processes are different in both cases the results (AUC and ROC) are different.

I modified your case 1 to 10 fold cross-validation and now you can see in below image that the ROC curves of case 1 and case 2 are similar. The left side is for case 1 and the right side is for case 2. I attached the modifed process, you can open them in your rapidminer using FILE --> Import Process.

Image: https://us.v-cdn.net/6030995/uploads/editor/9z/dslzszq9btg6.jpg

Modified Case 1 process image: Added 10 fold cross-validation with Local random seed in parameters. I also added local random seed for compare ROC operator in Case 2 process with roc bias set to neutral

Hope this helps. Please let us know if you need more information

varunm1 · March 2020

@bernardo_pagnon I got 100 points in this assignment in your class then

bernardo_pagnon · March 2020

ahahahahahaha

Can't argue with that!!!

Best,

Bernardo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Inconsistency of ROC curves

Best Answer

Answers

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing

Be Safe. Follow precautions and Maintain Social Distancing