# Clustering methods that are less sensitive to scale than others?

Hi,

I'm working with an unsupervised clustering problem for the first time, and had a conceptual question about how different algorithms handle attributes that are on different scales.

Given the different clustering methods and distance calculations available in RM, are there some that are less sensitive than others to the attributes being on different scales (or normalized, but with applying different attribute weights)? That is, they tend to give the same cluster results whether the attributes are normalized (or weighted) or not.

You probably can't get complete scale-insensitivity because you ultimately have to compute a distance measure of some kind, but in practice do certain approaches give more stable results when given unnormalized (or weighted-attribute) data?

Thanks,

Keith

I'm working with an unsupervised clustering problem for the first time, and had a conceptual question about how different algorithms handle attributes that are on different scales.

Given the different clustering methods and distance calculations available in RM, are there some that are less sensitive than others to the attributes being on different scales (or normalized, but with applying different attribute weights)? That is, they tend to give the same cluster results whether the attributes are normalized (or weighted) or not.

You probably can't get complete scale-insensitivity because you ultimately have to compute a distance measure of some kind, but in practice do certain approaches give more stable results when given unnormalized (or weighted-attribute) data?

Thanks,

Keith

Tagged:

0

## Answers

68RM Founderyou basically already said it yourself: All clustering techniques depend on a distance measure and this distance metric again depends on whether the attributes are normalized or not and on whether they are weighted or not. So, to the best of my knowledge, the results of

allclustering techniques heavily depend on these data preprocessing steps. As a consequence, I would highly recommend to use normalization before clustering and, if the attributes are not equally important, I would also recommend attribute weighting.If somebody has another view on this, I would also be interested in learning more about it.

Best regards,

Ralf

157GuruFWIW, some poking around online with the famous search engine did uncover a few papers that seem to be addressing the same (or similar) problems:

http://jorlin.scripts.mit.edu/docs/publications/115-ScaleInvariantClustering.pdf

http://www.robots.ox.ac.uk/~vgg/publications/papers/fitzgibbon02.pdf

http://www.autonlab.org/autonweb/14651/version/3/part/5/data/wong-efficient.pdf?branch=main&;language=en

http://bioinformatics.oxfordjournals.org/cgi/reprint/19/7/818

http://www.gene-quantification.de/tichopad-bioinf-2006.pdf

http://dimacs.rutgers.edu/Workshops/Depth/meer.pdf

68RM Founderthanks for the links to these research papers: The cited scale-invariant clustering techniques are only invariant to affine transformations, but not invariant to normalization and arbitrary weighitng of attributes, because these are non-affine data transformations.

Best regards,

Ralf

157GuruAfter consulting this geometry book, http://books.google.com/books?id=q49lhAzXTFEC, I see the following definition of an affine transformation: So if I had two attributes X and Y, and transformed them using:

A = [ 1/sigmaX 0

0 1/sigmaY]

b = [-meanX/sigmaX -meanY/sigmaY]

If I did my algebra right (never a sure thing), I think I get the normalized values: (X-meanX)/sigmaX and (Y-meanY)/sigmaY

Matrix A is invertible, having a nonzero determinant = 1/(sigmaX*sigmaY)

Similarly, attribute weights would seemingly be just a diagonal matrix A with no b vector, which would be invertible if all the weights were nonzero.

So my naive guess would have been that normalization is affine. But I trust your knowledge much more than my own, and I'm clearly in over my head. Where did I go astray?

Thanks,

Keith

68RM Founderthanks for pointing this out. The cited definition of affine transformations as well as your example are both correct. Hence both, normalization and attribute weighting, are indeed affine transformations. This makes the cited scaling invariant clustering techniques quite worth-while to look at.

Sorry for the confusion my previous reply may have caused. I had an error in my thinking. I guess I have done to much text mining lately.

In text mining, examples (document vectors, i.e. rows in the data table) are typically normalized to unit lenght, i.e. the normalization there is row-wise, while usually normalizations refer to one attribute (column, variable) at a time over all examples (rows), i.e. in most non-text-mining cases, the normalization is column-wise. While the standard column-wise (attribute-wise) normalization is an affine transformation, the row-wise document length normalization to unit length is a non-affine transformation of the data. Or in other words: For the standard column-wise normalization, each example (row) is re-scaled with the same normalizing matrix A, while in the row-based normalization, each row is re-scaled with its own individual factor, which does not preserve co-linearities across data points (document vectors) and hence is no affine transformation. So, I mixed up a few things in my head...

Sorry for that.

So, the publications you cited are really worth a look. Thanks again for the links and for pointing this out.

Best regards,

Ralf

157GuruThanks,

Keith