Options

"The difference between using weighted vote and not using weighted vote learner"

Member Posts: 16 Maven
edited May 2019 in Help
I found that there is a k Nearest Neighbor learner in Group: Learner.Supervised.Lazy.
There is a parameter named weighted vote. I am not sure what's the difference between the weighted vote KNN and the KNN without weighted vote.
Would you like to let me know what's the difference between them? Where can I find some information on it?
It seems that there is a class named WeightedObject which has the weight. But how is the weight calculated?
I'd be very grateful if you give me a hint.
Thanks a million.

Regards

Amy
Tagged:

• Options
Moderator, Employee, Member Posts: 295 RM Product Management
Hi Amy,

well the answer is pretty easy. The parameter specifies whether the distance of the nearest neighbors should be considered in the voting decision during prediction. If it is not considered, every nearest neighbor has the same influence on the prediction. If the parameter is enabled, neighbors which have a lower distance to the example for which a prediction is made will get a higher influence than those with a higher distance.

Regards,
Tobias
• Options
Member Posts: 16 Maven
Hi Tobias,
Thank you so much for your kind reply. I have some ideas of it now.
May I ask some further questions here?
I found this topic here http://rapid-i.com/rapidforum/index.php/topic,249.0.html. It talked something about how  the weight being implemented.
May I ask some further questions?
You talked about weighting by the distance, how about other similarities which is not distance like cosine similarity? How is the weight calculated? What formula is used if the measure is not distance but cosine similarity?

Thanks a million.

Amy
• Options
Moderator, Employee, Member Posts: 295 RM Product Management
Hi Amy,

of course you can ask questions. That is the intention of this forum ...

The weight is calculated in the following lines in the class [tt]com.rapidminer.operator.learner.lazy.KNNClassificationModel[/tt]:
`// finding next k neighbours and their distancesCollection<Tupel<Double, Integer>> neighbours =	samples.getNearestValueDistances(k, values);for (Tupel<Double, Integer> tupel: neighbours) {	totalDistance += tupel.getFirst();}double totalSimilarity = 0.0d;if (totalDistance == 0) {	totalDistance = 1;	totalSimilarity = k;} else {	totalSimilarity = Math.max(k - 1, 1);}// counting frequency of labelsfor (Tupel<Double, Integer> tupel : neighbours) {	counter[tupel.getSecond()] += (1d - tupel.getFirst() / totalDistance) / totalSimilarity;}`
The weight calculation is pretty straightforward and should be easily understandable from the source code. In principal, the weighting scheme should also be the same for every distance/divergence measure.

Kind regards,
Tobias