Cross validation
I have a question about the output of cross validation. If we take 90% for training and 10% for testing, then why the result shows the whole data and doesn't show just 10% of test part?
I'll be thankful if someone answers my question.
Yasmin
Best Answers
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi @Yasmin,
Legitimate question !
Here a possible element of answer :
In reality for a 10-fold cross validation, RapidMiner performs 11 iterations.
During the last iteration, RapidMiner applies the model to the whole training Dataset. So the length of the training set and the
length of the test set are the same.
Regards,
Lionel
NB : You can visualize this behaviour by setting a "Breakpoint After" on the Apply Model operator (inside the Cross Validation operator)
1 -
tftemme Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
Hi @Yasmin,
As it is true that the Cross Validation operator builds the final model on the whole data set (and thus performs a 11th iteration of the Training subprocess, in case the model port is connected), the Test process is only performed 10 times. But that is also the reason you have all your input data at the test result port. For every iteration step 10% of your input data is used in the test set. So within the Cross Validation all Examples of your input data are used once for testing.
For the outer result port all test sets are appended together, so you have again your whole input data set. You can visualize this by adding a Generate Attribute operator in the Test subprocess of the Cross Validation and generate an attribute iteration with the value eval(%{a}) (the macro %{a} contains the number of times the current operator was applied).Best regards,
Fabian1