kNN prediction score

oneponep Member Posts: 20 Maven
edited November 2018 in Help


I work with a KNN-model for regression. I would like to evaluate the predictions my model does on my testset.


I imagine that you could evaluate how close the new point (with unknown label) is on an existing point from the training set. If the new point is exactly on top of it, the prediction score would be 1, and otherwise fall progressively the farther the point is from other points.


Is this possible in Rapid Miner?


Best Answer

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    Hi @onep, you certainly can do something like this.


    If you download the "anomaly detection" extension from the marketplace there is an operator called "k-NN global anomaly score".  This will produce a value called "outlier" which is the distance to the k-NN you specify.  So in your case, you would run your k-NN model with k=1, then run the k-NN global anomaly score also with k=1, and then you can transform the predicted score with the outlier value using whatever function you want (using "generate attributes").  


    One caution: I am not sure, conceptually, why the prediction would necessarily fall in magnitude the farther it is from its nearest neighbor.  I guess it depends on the structure of your dataset and what you are modeling.    Perhaps a more intuitive representation here is that as the k-NN distance grows, the confidence in the prediction accuracy falls, but that doesn't necessarily mean that the true value is lower than the predicted value.


    I hope this is helpful!




    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,517 RM Data Scientist

    Hi Mathias,


    maybe the weighted vote is something for you? It is not exactly what you want to do, but it weights the influence of every neighbour by it's distance.


    Another option would be to use a SVM with a radial kernel. Even though the math is different and more complex it often turns out to be similar to a k-NN in terms of decision bounderies.



    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    oneponep Member Posts: 20 Maven

    Thank you for your suggestion to look at k-NN global anomaly score - looks like what I was looking for!

    I totally agree with you that it's the confidence we are looking at and not the prediction accuracy - thank for making that clear :)

Sign In or Register to comment.