RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
question about k-nn (in production environment)
I have tested k-nn on my dataset and get pretty "good" results, about 85% with Camberra distance and k = 5...
my question now is, is k-nn also well suited to classify new instances in a lets say "real" production environment? where new test data comes in from time to time?
because I have read somewhere that knn should be hard to compute (however it just takes some seconds with 4500 datasets here), and that k-nn has to be computed in total from beginning on for every new instance that comes in, is that true?
I mean, if I already have placed my n training instances in my m-dimensional space, if one new test instance comes in, do I have to calculate the distance from this instance to ALL other n instances, or to the k nearest instances only? and if so, how does it know what the k-nearest instances are? I mean, the new instance cannot be "aware" of itself where it has to place itself in the m-dimensional space, and what its nearest members are, or is that somehow possible to "remember" the testing instances, and choose the k-nearest members according to some heuristics?