Hello friends of the community I have doubts about the distances: Why the average distances are negative (Avg. Within distance_cluster centroid)? One of the properties of the distances is not always positive?
Hello, I wanted to ask if they could see the issue of negative distances It is a bug? Please confirm that I need so I see if I keep using this tool to measure the distance, because it is for an academic and my deadlines expire Regards
the problem is not in the process, but in the data, when there is a null or zero yields seemingly negative distances. I change the tools in case XD regards
Here are the sample data in vector form, because I can not attach the excel I can not excel adjuntarte one, either the data vector because it exceeds the capacity of the message, is there any alternative way to send the data? regards
I could reproduce your negative distances with the Performance (Cluster Distance Performance) operator. But this is not a bug, it is meant to work this way because the distances are multiplied by -1 to allow using them for optimization. If you want to see the positive distances you should select the 'maximize' parameter. But you should not use the resulting performance objects for optimization if you have selected this parameter!
The reason for multiplying by -1: The Performance (Cluster Distance Performance) calculates the average distance within centroids. The smaller the distances are the better the clustering works (in theory). But our optimization operators always try to maximize the performance of an algorithm. This means if you don't multiply be -1, the optimization algorithm would always prefer cluster results with a higher average distance within centroids.
Answers
It is a bug?
Please confirm that I need so I see if I keep using this tool to measure the distance, because it is for an academic and my deadlines expire
Regards
can you please post a process so that we can reproduce this?
Best,
Nils
I change the tools in case XD
regards
Best,
Nils
I can not excel adjuntarte one, either the data vector because it exceeds the capacity of the message, is there any alternative way to send the data?
regards
I could reproduce your negative distances with the Performance (Cluster Distance Performance) operator. But this is not a bug, it is meant to work this way because the distances are multiplied by -1 to allow using them for optimization. If you want to see the positive distances you should select the 'maximize' parameter. But you should not use the resulting performance objects for optimization if you have selected this parameter!
The reason for multiplying by -1: The Performance (Cluster Distance Performance) calculates the average distance within centroids. The smaller the distances are the better the clustering works (in theory). But our optimization operators always try to maximize the performance of an algorithm. This means if you don't multiply be -1, the optimization algorithm would always prefer cluster results with a higher average distance within centroids.
Best,
Nils
now I understand
Regards