Clustering methods that are less sensitive to scale than others?

keith · December 2009

Hi,

I'm working with an unsupervised clustering problem for the first time, and had a conceptual question about how different algorithms handle attributes that are on different scales.

Given the different clustering methods and distance calculations available in RM, are there some that are less sensitive than others to the attributes being on different scales (or normalized, but with applying different attribute weights)? That is, they tend to give the same cluster results whether the attributes are normalized (or weighted) or not.

You probably can't get complete scale-insensitivity because you ultimately have to compute a distance measure of some kind, but in practice do certain approaches give more stable results when given unnormalized (or weighted-attribute) data?

Thanks,
Keith

RalfKlinkenberg · December 2009

Hi Keith,

you basically already said it yourself: All clustering techniques depend on a distance measure and this distance metric again depends on whether the attributes are normalized or not and on whether they are weighted or not. So, to the best of my knowledge, the results of all clustering techniques heavily depend on these data preprocessing steps. As a consequence, I would highly recommend to use normalization before clustering and, if the attributes are not equally important, I would also recommend attribute weighting.

If somebody has another view on this, I would also be interested in learning more about it.

Best regards,
Ralf

keith · December 2009

Thanks Ralf. Your answer is what my intuition was telling me.

FWIW, some poking around online with the famous search engine did uncover a few papers that seem to be addressing the same (or similar) problems:

http://jorlin.scripts.mit.edu/docs/publications/115-ScaleInvariantClustering.pdf
http://www.robots.ox.ac.uk/~vgg/publications/papers/fitzgibbon02.pdf
http://www.autonlab.org/autonweb/14651/version/3/part/5/data/wong-efficient.pdf?branch=main&;language=en
http://bioinformatics.oxfordjournals.org/cgi/reprint/19/7/818
http://www.gene-quantification.de/tichopad-bioinf-2006.pdf
http://dimacs.rutgers.edu/Workshops/Depth/meer.pdf

RalfKlinkenberg · December 2009

Hi Keith,

thanks for the links to these research papers:

keith wrote:

FWIW, some poking around online with the famous search engine did uncover a few papers that seem to be addressing the same (or similar) problems:

http://jorlin.scripts.mit.edu/docs/publications/115-ScaleInvariantClustering.pdf
http://www.robots.ox.ac.uk/~vgg/publications/papers/fitzgibbon02.pdf
...

The cited scale-invariant clustering techniques are only invariant to affine transformations, but not invariant to normalization and arbitrary weighitng of attributes, because these are non-affine data transformations.

Best regards,
Ralf

keith · December 2009

Ralf Klinkenberg wrote:

The cited scale-invariant clustering techniques are only invariant to affine transformations, but not invariant to normalization and arbitrary weighitng of attributes, because these are non-affine data transformations.

Thanks Ralf. I was not familiar with the concept of affine transformations, and missed that implication when I came across those articles. I'm trying to wrap my head around it now. Something isn't making sense to me still as to why normalization isn't an affine transformation.

After consulting this geometry book, http://books.google.com/books?id=q49lhAzXTFEC, I see the following definition of an affine transformation:


An affine transformation of R(2) is a function t:R(2) -> R(2) of the form

t(x) = A x + b

where A is an invertible 2x2 matrix and b is in R(2)

(Affine geometry can be defined in R(n) for any n>=2; we restrict our attention here to the case when n=2)

So if I had two attributes X and Y, and transformed them using:

A = [ 1/sigmaX 0
0 1/sigmaY]

b = [-meanX/sigmaX -meanY/sigmaY]

If I did my algebra right (never a sure thing), I think I get the normalized values: (X-meanX)/sigmaX and (Y-meanY)/sigmaY

Matrix A is invertible, having a nonzero determinant = 1/(sigmaX*sigmaY)

Similarly, attribute weights would seemingly be just a diagonal matrix A with no b vector, which would be invertible if all the weights were nonzero.

So my naive guess would have been that normalization is affine. But I trust your knowledge much more than my own, and I'm clearly in over my head. Where did I go astray?

Thanks,
Keith

RalfKlinkenberg · December 2009

Hi Keith,

thanks for pointing this out. The cited definition of affine transformations as well as your example are both correct. Hence both, normalization and attribute weighting, are indeed affine transformations. This makes the cited scaling invariant clustering techniques quite worth-while to look at.

Sorry for the confusion my previous reply may have caused. I had an error in my thinking. I guess I have done to much text mining lately.

In text mining, examples (document vectors, i.e. rows in the data table) are typically normalized to unit lenght, i.e. the normalization there is row-wise, while usually normalizations refer to one attribute (column, variable) at a time over all examples (rows), i.e. in most non-text-mining cases, the normalization is column-wise. While the standard column-wise (attribute-wise) normalization is an affine transformation, the row-wise document length normalization to unit length is a non-affine transformation of the data. Or in other words: For the standard column-wise normalization, each example (row) is re-scaled with the same normalizing matrix A, while in the row-based normalization, each row is re-scaled with its own individual factor, which does not preserve co-linearities across data points (document vectors) and hence is no affine transformation. So, I mixed up a few things in my head...
Sorry for that.

So, the publications you cited are really worth a look. Thanks again for the links and for pointing this out.

Best regards,
Ralf

keith · December 2009

Ralf Klinkenberg wrote:

thanks for pointing this out. The cited definition of affine transformations as well as your example are both correct. Hence both, normalization and attribute weighting, are indeed affine transformations. This makes the cited scaling invariant clustering techniques quite worth-while to look at.

That's very interesting and unexpected. I'll be very curious to see whether your further investigation determines whether this is a useful future addition to RM.

Sorry for the confusion my previous reply may have caused. I had an error in my thinking.

Not a problem. You'll have to make a lot more errors to catch up to me! :-)

In text mining, examples (document vectors, i.e. rows in the data table) are typically normalized to unit lenght, i.e. the normalization there is row-wise, while usually normalizations refer to one attribute (column, variable) at a time over all examples (rows), i.e. in most non-text-mining cases, the normalization is column-wise. While the standard column-wise (attribute-wise) normalization is an affine transformation, the row-wise document length normalization to unit length is a non-affine transformation of the data. Or in other words: For the standard column-wise normalization, each example (row) is re-scaled with the same normalizing matrix A, while in the row-based normalization, each row is re-scaled with its own individual factor, which does not preserve co-linearities across data points (document vectors) and hence is no affine transformation. So, I mixed up a few things in my head...

That makes sense. Two rather different computations that are both called "normalization", but one is affine, and the other isn't.

Thanks,
Keith

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Clustering methods that are less sensitive to scale than others?

Answers