The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

K-fold crossvalidation

k_vishnu772k_vishnu772 Member Posts: 34 Learner III
edited November 2018 in Help

Hi all i have a small data set of 90 rows i am using cross validation in my process but i am confused to decide on number of K folds

.I tried 3 ,5,10 and the 3 fold cross validation performed better could you please help me how to choose k.I am little biased on choosing 3 as it is small .

Answers

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi,

     

    please post your process XML and describe the problem/data a bit. Your question is way too vague to be responded right now or to be useful to other people.

     

    Best,

    Sebastian

  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn

    Hi @k_vishnu772

     

    You should keep in mind that cross-validation is intended to estimate an averaged model performance, but choosing different k by itself will not make your model perform better. I think it is always better to look at the performance on a test holdout set but I am also afraid that dataset of only 90 rows is still too small to get a good performance estimation.

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi!

     

    I agree with kypexin. Cross validation is not about getting the best performance.

    Going with 10 is a good approach, as that has been shown again and again as a number providing stable results (not a lot of differences in the performance to 9 or 11 folds). 

    If you have that little data, you could even try the leave-one-out validation which would be a kind of 90-fold validation on your data set. It won't give you the best performance measure (as that came out from 3-fold validation by chance) but the most stable and reliable estimation for the performance you can expect.

     

    Regards,

     

    Balázs

Sign In or Register to comment.