The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Clustering methods that are less sensitive to scale than others?
Hi,
I'm working with an unsupervised clustering problem for the first time, and had a conceptual question about how different algorithms handle attributes that are on different scales.
Given the different clustering methods and distance calculations available in RM, are there some that are less sensitive than others to the attributes being on different scales (or normalized, but with applying different attribute weights)? That is, they tend to give the same cluster results whether the attributes are normalized (or weighted) or not.
You probably can't get complete scale-insensitivity because you ultimately have to compute a distance measure of some kind, but in practice do certain approaches give more stable results when given unnormalized (or weighted-attribute) data?
Thanks,
Keith
I'm working with an unsupervised clustering problem for the first time, and had a conceptual question about how different algorithms handle attributes that are on different scales.
Given the different clustering methods and distance calculations available in RM, are there some that are less sensitive than others to the attributes being on different scales (or normalized, but with applying different attribute weights)? That is, they tend to give the same cluster results whether the attributes are normalized (or weighted) or not.
You probably can't get complete scale-insensitivity because you ultimately have to compute a distance measure of some kind, but in practice do certain approaches give more stable results when given unnormalized (or weighted-attribute) data?
Thanks,
Keith
Tagged:
0
Answers
you basically already said it yourself: All clustering techniques depend on a distance measure and this distance metric again depends on whether the attributes are normalized or not and on whether they are weighted or not. So, to the best of my knowledge, the results of all clustering techniques heavily depend on these data preprocessing steps. As a consequence, I would highly recommend to use normalization before clustering and, if the attributes are not equally important, I would also recommend attribute weighting.
If somebody has another view on this, I would also be interested in learning more about it.
Best regards,
Ralf
FWIW, some poking around online with the famous search engine did uncover a few papers that seem to be addressing the same (or similar) problems:
http://jorlin.scripts.mit.edu/docs/publications/115-ScaleInvariantClustering.pdf
http://www.robots.ox.ac.uk/~vgg/publications/papers/fitzgibbon02.pdf
http://www.autonlab.org/autonweb/14651/version/3/part/5/data/wong-efficient.pdf?branch=main&;language=en
http://bioinformatics.oxfordjournals.org/cgi/reprint/19/7/818
http://www.gene-quantification.de/tichopad-bioinf-2006.pdf
http://dimacs.rutgers.edu/Workshops/Depth/meer.pdf
thanks for the links to these research papers: The cited scale-invariant clustering techniques are only invariant to affine transformations, but not invariant to normalization and arbitrary weighitng of attributes, because these are non-affine data transformations.
Best regards,
Ralf
After consulting this geometry book, http://books.google.com/books?id=q49lhAzXTFEC, I see the following definition of an affine transformation: So if I had two attributes X and Y, and transformed them using:
A = [ 1/sigmaX 0
0 1/sigmaY]
b = [-meanX/sigmaX -meanY/sigmaY]
If I did my algebra right (never a sure thing), I think I get the normalized values: (X-meanX)/sigmaX and (Y-meanY)/sigmaY
Matrix A is invertible, having a nonzero determinant = 1/(sigmaX*sigmaY)
Similarly, attribute weights would seemingly be just a diagonal matrix A with no b vector, which would be invertible if all the weights were nonzero.
So my naive guess would have been that normalization is affine. But I trust your knowledge much more than my own, and I'm clearly in over my head. Where did I go astray?
Thanks,
Keith
thanks for pointing this out. The cited definition of affine transformations as well as your example are both correct. Hence both, normalization and attribute weighting, are indeed affine transformations. This makes the cited scaling invariant clustering techniques quite worth-while to look at.
Sorry for the confusion my previous reply may have caused. I had an error in my thinking. I guess I have done to much text mining lately.
In text mining, examples (document vectors, i.e. rows in the data table) are typically normalized to unit lenght, i.e. the normalization there is row-wise, while usually normalizations refer to one attribute (column, variable) at a time over all examples (rows), i.e. in most non-text-mining cases, the normalization is column-wise. While the standard column-wise (attribute-wise) normalization is an affine transformation, the row-wise document length normalization to unit length is a non-affine transformation of the data. Or in other words: For the standard column-wise normalization, each example (row) is re-scaled with the same normalizing matrix A, while in the row-based normalization, each row is re-scaled with its own individual factor, which does not preserve co-linearities across data points (document vectors) and hence is no affine transformation. So, I mixed up a few things in my head...
Sorry for that.
So, the publications you cited are really worth a look. Thanks again for the links and for pointing this out.
Best regards,
Ralf
Thanks,
Keith