The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Why does Rapid Miner Studio reduce the number of rows in the model results
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi @jsdrew,
a basic principle of predictive modeling is that you shouldn't use the model that was built on a record to predict the outcome of that same record. This would favor overfitted models.
Therefore, AutoModel does a "split validation". It takes about 2/3 of the data for building the model and the rest for evaluating the model by comparing the known label to the predicted one.
If you take the process created by AutoModel and replace the split validation with a cross validation, the process will take longer (which is why AutoModel doesn't use it), as it is building 10 or 11 models. However, in this case you will get a prediction for every row in your data.
The Academy has videos for these topics if you need more information.
Regards,
Balázs5