How is Jaccard / Dice similarity defined for numerical variables?

Fred12Fred12 Member Posts: 344 Unicorn
edited September 2019 in Help


as stated here: http://www.stata.com/manuals13/mvmeasure_option.pdf

Jaccard is TP/(TP+FP+FN)... for as it seems binary variables...

but how is it defined for numerical values?? as it can be chosen e.g as numerical distance measure in k-NN operator..


and similar how is it defined for Dice similarity?

edit: I found the implementation here: https://github.com/rapidminer/rapidminer-studio/tree/master/src/main/java/com/rapidminer/tools/math/similarity/numerical


edit2: ok it seems its simply 2 * x*y / x+y

where X and Y are two vectors with attributes x_i and y_i,

2 * wxy / (wx + wy);

where wxy is the product of the corresponding attributes of the two vectors summed up,

and wx , wy is just the sum of the attributes values of x or y respectively...


looks like some weird distance measure to me, don't know if that makes a lot of sense...



  • Options
    ameiamei Member Posts: 1 Newbie
    with this definition, both Jaccard and Dice can have lower similarity for identical vectors than for different vectors. [1,0] is more similar to [2,0] than to [1,0].
    It looks like a bug, the computation for the nominal similarity is used for numerics. But the correct definition for numerical Dice similarity would be 2 * |x y| / (|x|^2 + |y|^2).
    You can apply the numerical definition for binary vectors but not vice versa.
Sign In or Register to comment.