# How Rapidminer handle same distance for KNN Algorithm

**4**Contributor I

say k=5. Now I try to classify an unknown object by getting its 5 nearest neighbours.

I confused.. I try in excel, and the result is diferent with rapidminer for some data.

in case like that, how rapidminer sorting distance ?...

something wrong with my data ?, or rapidminer sorting random if same distance ?

Thanks in advance

## Answers

6Community Managerhttps://rapidminer.com/blog/k-nearest-neighbors-laziest-machine-learning-technique/

This link should answer your question but feel free to reach out if it did not!

4Contributor Imany thanks for your response,

based on what I read in other forums and the links that you provide,

so for KNN there are several ways to handle the same distance..?

looking average distance, or something like that.

and which one is used by rapidminer ? ..

I can't understand and find what kind of algorithm used by rapidminer in determining if the distances are the same?..

hmm..

maybe can be described like this

the results of calculating data testing against data training is :

data training 1st to 4th distance is 0 (count distance 0 is 4)

data training 5th to 10th distance is 1 (count of distance 1 is 6)

data training 11th to 15th distance is 2 (count of distance 2 is 5)

data training 16th to 20th distance is 3 (count of distance 3 is 5)

data training 21st to 25th distance is 4 (count of distance 4 is 5)

if the distance is sorted ascending, the result is so many same distance like that.

if k = 5

so in classification, the majority of labels from the data training will be used, which has the lowest 5 distance calculation..

in the rapidminer algorithm what do the majority of the 1st to 5th data labels use? I think not, because there are some different data when I compare it with manual calculations using MS Excel.

or is the majority of the 1st to 25th data labels?

beacause

the distance 0 is 1st

distance 1 is 2nd

distance 2 is 3rd

distance 3 is 4th

distance 4 is 5th

or is there averaged?

or is there another algorithm used by rapidminer?

and the result is different again if weighted vote is checked.

I have not found a suitable rapidminer calculation with my manual calculations with the distance as above.

I hope you understand what I mean..

thanks in advance for your help..

6Community Manager4Contributor Ithe calculation is only from column C until L

results from excel like this, the majority of the labels is "LU" :

result rapidminer weighted vote is checked is "LU" :

How rapidminer handle with case like that...

how rapidminer sorting the same distance ?...

something wrong with my data ?,

or rapidminer sorting random for same distance?

thanks you in advance for your help

4Contributor Iplease...

746UnicornIn order we can reproduce what you observe, and understand what's going on, can you please share :

- your process (XML)

- your dataset.

Unfortunately, I have no exact answer to your question....But in first ,approximation, considering k = 5, with no weighted vote :

You have for the four first closer neighbours 2 "LT" and 2 "LU" ...

...but for the fifth closer neighbour there is a lot of candidates which have the same distance to your test point (distance = 1).

My hypothese, in RapidMiner for the final choice of this fifth closer neighbour and thus for the final choice of the label of the test point are :

- the fifth neightbor is chosen randomly among the candidates (which have all a distance of 1 to the test point).

- if the probability of the 2 labels are the same (here 50% (LT) / 50%(LU)), then the first training point in the dataset, in the loop of the internal code of RapidMiner, is chosen. In other words, it is equivalent to a random choice.

- For equivalent candidates, the candidate are in alphabetic order classified so the "LT" label is chosen instead of "LU" label.

- and finally the more logical explanation from my point of view : there is a majority of label "LT" (and a minority of label "LU") in the candidates of the fifth closer neighbour (which have all a distance of 1 to the test point). So logically the final conclusion is label = "LT" for the test point...

Maybe some RapidMiner's developer(s) can dispel this mystery....?

Thanks you,

Regards,

Lionel

1,642RM Founder