The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Correlation value at 0 with leave-one-out cross validation
Hello,
I've been noticing a phenomenon I don't quite know how to explain that is somehow related to what is described in this previous post. I try training a linear regression model on a dataset (we can consider the Polynomial dataset for instance, with 200 examples), using cross-validation (with shuffled sampling): on the training side, there is simply a linear regression with the default parameters; on the testing side, an apply model and a performance evaluation. What I'm doing is trying to change the number of folds the model gets trained on. Here are some values I observed with that dataset:
- 5-fold CV: correlation = 0.894 +/- 0.026 (micro average: 0.892)
- 10-fold CV: correlation = 0.902 +/- 0.038 (micro average: 0.891)
- 20-fold CV: correlation = 0.909 +/- 0.080 (micro average: 0.894)
- 50-fold CV: correlation = 0.899 +/- 0.174 (micro average: 0.894)
- 100-fold CV: correlation = 0.960 +/- 0.197 (micro average: 0.894)
- 150-fold CV: correlation = 0.300 +/- 0.460 (micro average: 0.894)
- 200-fold CV: correlation = 0.000 +/- 0.000 (micro average: 0.894)
So does it mean the "best" value for the number of folds in that case is half the number of examples in the dataset? If so, why is that? Or should I only rely on the micro-averages which are pretty stable?
0
Answers
https://machinelearningmastery.com/how-to-configure-k-fold-cross-validation/