The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"unsupervised cluster evaluation"
nguyenxuanhau
Member Posts: 22 Contributor II
Hi!
can I compare unsupervised cluster evaluations(clustermodel evaluation) each other on unlabeled data on RM?
what must I do to compare unsupervised cluster evaluations (clustermodel evaluation)each other on unlabeled data in RM?
Best regard
can I compare unsupervised cluster evaluations(clustermodel evaluation) each other on unlabeled data on RM?
what must I do to compare unsupervised cluster evaluations (clustermodel evaluation)each other on unlabeled data in RM?
Best regard
Tagged:
0
Answers
if you want to compare how much two cluster outcomes match each other, you can simply rename the first one and assign it the role "label" before actually performing the second evaluation. If you then would set the role of the second cluster attribute to prediction, you can use standard accuracy measure to measure the equality.
Greetings,
Sebastian
Please detail do that, how do the method chose the best cluster on my data ( my data is large but unlabled)?
Bestregard
it doesn't. How could you know what is the best cluster? Guessing?
There are some cluster evaluation heuristics available, but as their name says: They are just heuristics.
Greetings,
Sebastian
Best regard
For instance if the data is numeric and tends to form centre based clusters (data visualisation may give you an indication), then the solutions based on the same number of clusters can obviously be compared using the so called squared error (i.e. the sum of squared distances from the data instances to the corresponding cluster centre - which is computed by averaging the column values in each cluster). Smaller squared error means better clustering. This method is used even for the application of the same algorithm that may lead to more than one solution (as the K-Means algorithm). The method may be partly extended for mixed (numeric and non numeric) data (in which case specific metrics replace the Euclidian distance, as for instance in the K-Medoids algorithm, that extends K-Means).
Another solution may be based on the idea of evaluating the result of an unsupervised clustering via supervised learning evaluation. You can cluster your data obtaining a new column - let us call it clusterNo. Then you learn a decision tree (or another model issued from a supervised learning) using clusterNo as your label/output attribute, and then you evaluate this model/tree. The accuracy of the model may give an indication of the quality of the clustering. Obviously, no method based on heuristics is perfect, but may be quite useful in practice.
Dan
Dan