Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Training and testing correlation
Hello,
When I test a model on known data, as a result I have an accuracy. Is this accuracy always greater than the accuracy I have when I use cross validation where I want to see how my model reacts to unknown data?
So, doing the steps in tutorial "Testing a model" , if I have bad accuracy, should I try to find another model or prepare different my data and not go on a cross validation or something else?
Thanks in advance.
When I test a model on known data, as a result I have an accuracy. Is this accuracy always greater than the accuracy I have when I use cross validation where I want to see how my model reacts to unknown data?
So, doing the steps in tutorial "Testing a model" , if I have bad accuracy, should I try to find another model or prepare different my data and not go on a cross validation or something else?
Thanks in advance.
Tagged:
0
Best Answer
-
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @Papad,
Most of the time training accuracy is better than the testing accuracy since the model is trained on it and worst case scenario, like k-nn (very small k), the model will try to memorize all the data points in training.
We usually ignore the training performances when we optimize the models, cos it would not be helpful for us to understand the predictive capabilities on unseen data. Some exceptions here like detecting outliers, verifying randomness of errors when we compare the predictions and labels for the training set.
If you had bad accuracy on the testing set, rethink about the validation process and any shifted concept between testing and training data.An insightful post from Ingo can also help you understand why we need cross validation and how to interpret it (strongly recommend also this)
Cheers,
YY
4