The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Cross validation
Hello,
Is there anybody who can solve me this problem?
in the first picture I have this:
Here I measure the performance on the same data, and the accuracy is 87,44%.
When I have the same procedure but inside cross validation like this:
(inside cross validation)
The accuracy I have here is 82.11%.
It is about the same procedure but inside a cross validation operator.
Why there is that difference on two cases?
What I have understand is that because in the second case my model is being trained and then it measures the performance in the testing section so it is more accurate.
So more training doesn't always means greater accuracy?
I hope my question is clear.
Thanks in advance.
Is there anybody who can solve me this problem?
in the first picture I have this:
Here I measure the performance on the same data, and the accuracy is 87,44%.
When I have the same procedure but inside cross validation like this:
(inside cross validation)
The accuracy I have here is 82.11%.
It is about the same procedure but inside a cross validation operator.
Why there is that difference on two cases?
What I have understand is that because in the second case my model is being trained and then it measures the performance in the testing section so it is more accurate.
So more training doesn't always means greater accuracy?
I hope my question is clear.
Thanks in advance.
Tagged:
0
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistThe first picture is measuring, how well you describe your training data. The second one is measuring how good you predict unknown (out-of-sample) data. You almost everytime want to do the second.
- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany7 -
varunm1 Member Posts: 1,207 UnicornHello @Papad
As Martin informed, in the first case you are training and testing the model on the same data, which is not useful to validate your model. In the second case, you are cross-validating a model, which means you are training on one data and testing on another data which the model never saw, this is the best method to validate your model.
To understand cross-validation, here is an excellent post from Scott.
https://community.rapidminer.com/discussion/55112/cross-validation-and-its-outputs-in-rm-studio
Thanks
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
1
Answers