Production model vs Model

User36964 · December 2021

When I search the difference between the model and the production model I found that "The ‘production model’ is using exactly the same preprocessing, feature sets, optimized parameters etc. - but is uses ALL labeled data for training. This is the model you should use in production and it makes use of all available information."

But If we use all labeled data in the training phase, how could we tell if the model overfits or not? As far as I know, the reason behind not using all the labeled data for training is to avoid overfitting. And of course to be able to measure the prediction performance metrics for the model.

BalazsBarany · December 2021

Hi!

The general assumption behind cross validation is that a model built from all the data is not worse than the average of the models built from the validation subsets. With 10-fold cross validation you build models on 90 % of the data and validate them on the remaining 10 %, then do this again with a different subset. An overfitted model would give you suboptimal results in this scenario compared with a non-overfitted one.

When doing 10-fold cross validation and connecting the mod output, an eleventh model is built on all the data. This is the "production model".

Regards,
Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Production model vs Model

Best Answer