The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Production model vs Model
When I search the difference between the model and the production model I found that "The ‘production model’ is using exactly the same preprocessing, feature sets, optimized parameters etc. - but is uses ALL labeled data for training. This is the model you should use in production and it makes use of all available information."
But If we use all labeled data in the training phase, how could we tell if the model overfits or not? As far as I know, the reason behind not using all the labeled data for training is to avoid overfitting. And of course to be able to measure the prediction performance metrics for the model.
But If we use all labeled data in the training phase, how could we tell if the model overfits or not? As far as I know, the reason behind not using all the labeled data for training is to avoid overfitting. And of course to be able to measure the prediction performance metrics for the model.
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
The general assumption behind cross validation is that a model built from all the data is not worse than the average of the models built from the validation subsets. With 10-fold cross validation you build models on 90 % of the data and validate them on the remaining 10 %, then do this again with a different subset. An overfitted model would give you suboptimal results in this scenario compared with a non-overfitted one.
When doing 10-fold cross validation and connecting the mod output, an eleventh model is built on all the data. This is the "production model".
Regards,
Balázs1