The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Leave One Out results in AUC of 0.5
erik_van_ingen
Member Posts: 8 Learner I
My target label is binominal, number of examples is 553. Running supervised classification deep learning with cross validation:
- 10-fold results in AUC = 0.846 and Accuracy = 76%
- Leave 3 Out (180 fold) results in AUC = 0.5 and Accuracy = 42%
Tagged:
1
Best Answers
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 UnicornHello @erik_van_ingen,
It may sound a bit mind-boggling, but I wouldn't trust any of these. Why? Because you are using supervised Deep Learning and your number of examples is not big enough to justify it.
First of all, I would check if the classes are balanced enough to provide meaningful training and repeat the results.
Now, this raises another question: what kind of sampling are you using for your cross-validation? Try using stratified sampling and check how it performs. If you are using linear sampling, for example, and you have your data ordered by your label or target variable, leaving one out will probably not work well. Do this before beginning to work with SMOTE sampling or your sampling technique du jour.
Hope this helps,
Rodrigo.5 -
varunm1 Member Posts: 1,207 UnicornHello @erik_van_ingen
Would like to point two things. First, as the data set is small the higher number of folds give lesser test data which means you will have a lot of variance in your results. This is due to the inability of the test set to capture all the underlying distributions in data. I recommend going with a 3 or 5 fold for this dataset.
Second, I am not sure if you applied any feature selection techniques (forward, backward etc.) on your data, you can do that and see attributes that are helpful in predicting. This might improve your performances and reduce computational complexity as well.
Thanks for your understanding.Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
7
Answers
Furthermore, I used the generate weight operation to anticipate on the class imbalance. I tested this both outside the cross validation as within the cross validation.
I tested other ML operators as well as Naive Bayes, Gradient Boost and so forth. DL performed usually the best.
Yes, I am aware that the sample size is relatively low. What ML operator would best fit, given the sample size?
I still don't quite get it why AUC is close to chance whenever Leave-One-Out cross-validation is used.
I can see why accuracy measure has a high standard-deviation (each fold, you are either getting 100% correct prediction, or 0 percent correct prediction), but how is that also affecting AUC? Is it because how AUC is actually calculated (can you elaborate on this)?
By the way, class imbalance, modling techniques, and data size, seem not to have effect on this (try it with many variations of above in rapidminer) and same thing is observed about AUC.