Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Regarding KNN performance"
Hello,
I am applying KNN with k=5. I split the data into two parts. One part is used in cross-validation and other is used to get the model from Cross-validation for testing.
I see that the Cross-validation performance is 0.619 (AUC) and for the test data set I separated its 0.812.
Is this because Cross-validation performance can be lower if some folds don't perform well?
Also, I learned that KNN is basically not a learning algorithm. which means it doesn't learn much from training but just uses the parameters to classify. Can this be the reason?
Thanks,
Varun
I am applying KNN with k=5. I split the data into two parts. One part is used in cross-validation and other is used to get the model from Cross-validation for testing.
I see that the Cross-validation performance is 0.619 (AUC) and for the test data set I separated its 0.812.
Is this because Cross-validation performance can be lower if some folds don't perform well?
Also, I learned that KNN is basically not a learning algorithm. which means it doesn't learn much from training but just uses the parameters to classify. Can this be the reason?
Thanks,
Varun
Regards,
Varun
https://www.varunmandalapu.com/
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
0
Best Answer
-
IngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi @varunm1,You should have a look into the performances of the single folds (just place a breakpoint after the performance operator within the cross validation). I would not be surprised if those performances fluctuate quite a bit. If they are, this is probably the reason: you have been "lucky" with the particular test data set, i.e. it was "easier" for the model to predict. This is exactly the reason why we prefer cross validation wherever possible: to reduce the impact of test data bias.Hope this helps,
Ingo7
Answers
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing