K-fold crossvalidation

k_vishnu772 · August 2018

Hi all i have a small data set of 90 rows i am using cross validation in my process but i am confused to decide on number of K folds

.I tried 3 ,5,10 and the 3 fold cross validation performed better could you please help me how to choose k.I am little biased on choosing 3 as it is small .

SGolbert · August 2018

Hi,

please post your process XML and describe the problem/data a bit. Your question is way too vague to be responded right now or to be useful to other people.

Best,

Sebastian

kypexin · August 2018

Hi @k_vishnu772

You should keep in mind that cross-validation is intended to estimate an averaged model performance, but choosing different k by itself will not make your model perform better. I think it is always better to look at the performance on a test holdout set but I am also afraid that dataset of only 90 rows is still too small to get a good performance estimation.

BalazsBarany · August 2018

Hi!

I agree with kypexin. Cross validation is not about getting the best performance.

Going with 10 is a good approach, as that has been shown again and again as a number providing stable results (not a lot of differences in the performance to 9 or 11 folds).

If you have that little data, you could even try the leave-one-out validation which would be a kind of 90-fold validation on your data set. It won't give you the best performance measure (as that came out from 3-fold validation by chance) but the most stable and reliable estimation for the performance you can expect.

Regards,

Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

K-fold crossvalidation

Answers