Split-Validation Issue

amitdamitd Member, University Professor Posts: 49 Maven
@sgenzer, I believe there may be some issue with the split-validation operator. The model output through the entire split-validation process does not correspond to the model with which the validation performance metrics are computed. 
I have attached an Excel spreadsheet to show the computations with a formula. The RMSE computed for the validation dataset (using the Performance operator) corresponds to the "ValidModel and ApplyModel" (in Excel worksheet) which is one of the models output by the process when dissected through a remember/recall operators and breakpoints. However, the RapidMiner process outputs a LinearRegression model that is same as the "TrainModel" (in Excel worksheet) whose RMSE does not match the one given by the Performance (Regression) operator. Why the discrepancy? Which is the correct model here?

I have tried this issue with multiple datasets and have documented it in a process with the sample Polynomial dataset. Any ideas on what may be going on here? 

Best Answer


  • amitdamitd Member, University Professor Posts: 49 Maven
    Thank you, that makes sense. Ideally, it would been better to get direct access to the model fit with the training data which is being used for evaluation on the validation partition.
Sign In or Register to comment.