The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Clustering - how to determine
Hello,
could anyone point me how to do an unsupervised data clustering on data, where I am not sure how many clusters is present in data (i.e. how to determine k for e.g. k-means)?
Or is the best possible way to determine the k visually (I have 13 attributes and the data might be quite noisy)?
Thanks for any suggestion,
radone
could anyone point me how to do an unsupervised data clustering on data, where I am not sure how many clusters is present in data (i.e. how to determine k for e.g. k-means)?
Or is the best possible way to determine the k visually (I have 13 attributes and the data might be quite noisy)?
Thanks for any suggestion,
radone
Tagged:
0
Answers
Clustering always requires a human to look at and interpret the results but a helping hand can be given by using various cluster performance operators.
Here's an example showing the Cluster Distance Performance operator producing measures for "average within centroid distance" and Davies-Bouldin as k is varied in a k-means clustering experiment. The example data in this case contains 1000 examples that are grouped into 8 neat clusters in a three dimensional space. At the end of the experiment look at the Log tab in the results and plot the two recorded measures as a function of k and you should see that something interesting is happening at k = 8.
Fortunately, this corresponds to the "correct" answer but in real life, it won't be as easy. The characteristics of the input data such as cluster shape, noise and data size will determine what clustering approach to use as well as what performance measure could be appropriate. Guidance is hard to give because a) it depends on the data and b) I probably don't know
regards,
Andrew