Clustering with GPS cordinates but now in addition with the population?

CausalityvsCorr · May 2017

Earlier I posted a question about how to cluster buildings with GPS their coordinates. Based on the feedback I managed to get clustering outputs which make sense also in practise. Out of many methods available in RapidMiner, k-means procuded most useful results.

Now I would like to extend the clustering by taking the population in those building involved in to the clustering. Not as clustering attribute, but by defining the population min-max number for the clusters. So that for example the average of population in all the clusters is between 300 to 500.

Is there any ways to define this kind of process?

MartinLiebig · May 2017

Hi,

have you considered to take the population as a weight? This should yield to something very similar.

~Martin

CausalityvsCorr · May 2017

Thank you for the reply.

I have not considered but will test how weighting with population works in this specific situation.

To be exact, how should I proceed with "population as a weight, meaning what operator(s) should I use?

MartinLiebig · May 2017

Hi,

use Set role and set your pop attribute to role weight. Afterwards make sure that the used clustering algorithm support weights.

Best,

Martin

Telcontar120 · May 2017

K-means clustering does support weights. However, I don't think k-means by itself will do what the original request was asking for because based on my understanding of k-means, it does not do anything to ensure that the resulting clusters are the same size (whether weighted or unweighted). @mschmitz am I missing something about the algorithm?

So if you want to constrain each cluster to have a minimum and maximum weighted size, how would you implement those constraints with k-means?

CausalityvsCorr · May 2017

Thank you for the feedback.

Regarding the weighting, I did not see any difference in the clustering results, with weighting versus without it. I tested with k-means and k-means (kernel). I think we are talking in this case about sample weighting, not attribute weighting?

CausalityvsCorr · May 2017

I tend to agree with Telcontar120 that sample weighting in connection with k-means clustering is not a fruitful way to "regulate" the clustering results. At least in my special case, when clustering the buildings based on their GPS coordinates but so, that the population in each cluster will be on the average, say 300.

Proposals how to proceed...or should I give up

Telcontar120 · May 2017

I am not a clustering expert, so comments from others are welcome here @mschmitz

If you know the total population you have represented, and you know you want each cluster to be between 300-500, what does that imply about the total number of clusters you are looking for?

It is possible that you may be able to approximate a solution using "DBSCAN with weights" from the Mannheim toolbox extension. That operator will at least interpret the weights as instance counts, although you will need to play with the epsilon parameter values to see whether you can get it to produce a set of clusters that satisfy your conditions.

Otherwise, you may need to program your own routine to do this (perhaps in python?) since it is not really a conventional clustering problem. In this case, you basically want to impose constraints on weighted cluster size, iteratively generating clusters based on proximity but that are neither too small nor too large. In theory doing something like k-means although making the size constraints override the proximity metrics.

CausalityvsCorr · May 2017

Excellent thinking!

thanx

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Clustering with GPS cordinates but now in addition with the population?

Answers