The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to run a prediction model on a dataset without spliting it to train and test datasets
Hello everyone,
I hope this message finds you well. I am currently working on a project that involves running RapidMiner prediction models on a dataset. Specifically, I am interested in using tree induction, SVM, DM, and other models to predict outcomes and determine prediction accuracy.
However, I am faced with a challenge in that my dataset only contains 60 samples, which makes it difficult to split it into training and testing datasets. Therefore, I am reaching out to you to see if anyone has any suggestions on how I can proceed with running the models without having to split the dataset.
I greatly appreciate any insights or advice you may have on this matter.
Thank you,
Mansour
0
Answers
I might suggest starting with 5 folds, based off the size you said, and improve from there. Here’s a good blog on cross validation where you can get a little more information: https://rapidminer.com/blog/validate-models-cross-validation/. I believe we also have a video on the Academy.
Best,
Roland
you could also look into "Leave one out" validation. This is a cross validation with as many steps as there are data rows - in your case 60.
The Cross Validation operator has a parameter for switching this on.
This approach will take the first example as the test set and the rest of the data for training, then the second one, and so on. With this method each example will be tested with a model built on the rest of the data and you will get a robust estimate of the model quality.
A final model will be built on all data if you connect the model output of Cross Validation.
Regards,
Balázs