🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Production model vs Model

User36964User36964 Member, University Professor Posts: 14  University Professor
When I search the difference between the model and the production model I found that  "The ‘production model’ is using exactly the same preprocessing, feature sets, optimized parameters etc. - but is uses ALL labeled data for training.  This is the model you should use in production and it makes use of all available information."

But If we use all labeled data in the training phase, how could we tell if the model overfits or not? As far as I know, the reason behind not using all the labeled data for training is to avoid overfitting. And of course to be able to measure the prediction performance metrics for the model. 


Best Answer

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 736   Unicorn
    Solution Accepted
    Hi!

    The general assumption behind cross validation is that a model built from all the data is not worse than the average of the models built from the validation subsets. With 10-fold cross validation you build models on 90 % of the data and validate them on the remaining 10 %, then do this again with a different subset. An overfitted model would give you suboptimal results in this scenario compared with a non-overfitted one.

    When doing 10-fold cross validation and connecting the mod output, an eleventh model is built on all the data. This is the "production model". 

    Regards,
    Balázs 
    User36964
Sign In or Register to comment.