RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
"Item Distribution Performance Bug?"
i am just working on a small project and found a mysterious thing. In my opinion the Item Distribution Performance (Cluster Performance) should become one if the whole data set belongs to exactly one cluster (out of n) and should become zero if the data is uniformely distributed.
Unfortunately my observations with the GiniCoefficient brought out another result.
To find out what was going wrong I read the GiniCoefficient.java file and tried to implement the function in Open Office calc.
My example to test the functionality of my implementation:
Three clusters C1=108, C2=247, C3=44 Members.
My Gini-Coefficient and the one of Rapidminer brought out: 0.997, but the clusters are nearly uniformely distributed, so the result should not become nearly one!
I then tried out finding the error and compared the Gini Function to the Squared Error Function. I think that the mean in the GiniCoefficient.java has been implemented in the wrong way, because it becomes always one:
double mean = sum / n;The n in row 39 should become x.length, otherwise the n is 399 (the numbers of rows) in my example and not 3 (the number of clusters). When changing n to 3 my result for the given example is: 0.491.
double mean = sum / x.length;Can anybody else approve my test? I hope that I have made a mistake :-).