"Item Distribution Performance Bug?"

Fabian_Wewers · April 2011

Hello everybody,

i am just working on a small project and found a mysterious thing. In my opinion the Item Distribution Performance (Cluster Performance) should become one if the whole data set belongs to exactly one cluster (out of n) and should become zero if the data is uniformely distributed.
Unfortunately my observations with the GiniCoefficient brought out another result.
To find out what was going wrong I read the GiniCoefficient.java file and tried to implement the function in Open Office calc.

My example to test the functionality of my implementation:
Three clusters C1=108, C2=247, C3=44 Members.
My Gini-Coefficient and the one of Rapidminer brought out: 0.997, but the clusters are nearly uniformely distributed, so the result should not become nearly one!
I then tried out finding the error and compared the Gini Function to the Squared Error Function. I think that the mean in the GiniCoefficient.java has been implemented in the wrong way, because it becomes always one:

double mean = sum / n;

The n in row 39 should become x.length, otherwise the n is 399 (the numbers of rows) in my example and not 3 (the number of clusters). When changing n to 3 my result for the given example is: 0.491.

double mean = sum / x.length;

Can anybody else approve my test? I hope that I have made a mistake :-).

Greetings

Fabian

land · April 2011

Hi Fabian,

I hope this, too, but nevertheless, please post a bug report on bugs.rapid-i.com for that. We will check that as soon as possible.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Item Distribution Performance Bug?"

Answers