The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Cut dendogram at a certain similarity
Hello everyone,
I am in the process of evaluating some data and I wanted to compare different types of clustering. I found the 'Map Clustering on Labels' operator that lets me compare the results of the clustering with my sample data. This all works fine but I am concerned about the following scenario.
Let's say I have an example set that has examples from three different classes. I realize that using kmeans with an initial number of 2 expected clusters will obviously end up with two result clusters and probably with a higher failure in the performance evaluation. However the agglomerative clustering has the potential at performing better, as the number of clusters doesn't have to be known before the process. To make this work the resulting dendogram has to be cut at a certain point of similarity.
I have searched the list of operators but have only found the 'Flatten Clustering' operator that lets me choose the number of clusters. Searching the forums I have found this thread from 2010: http://rapid-i.com/rapidforum/index.php/topic,1734.0.html
Has there been any progress on this since 2010? Am I missing an operator to achieve this? Do I have to install a plugin?
Thanks in advance for your help,
Daniel
I am in the process of evaluating some data and I wanted to compare different types of clustering. I found the 'Map Clustering on Labels' operator that lets me compare the results of the clustering with my sample data. This all works fine but I am concerned about the following scenario.
Let's say I have an example set that has examples from three different classes. I realize that using kmeans with an initial number of 2 expected clusters will obviously end up with two result clusters and probably with a higher failure in the performance evaluation. However the agglomerative clustering has the potential at performing better, as the number of clusters doesn't have to be known before the process. To make this work the resulting dendogram has to be cut at a certain point of similarity.
I have searched the list of operators but have only found the 'Flatten Clustering' operator that lets me choose the number of clusters. Searching the forums I have found this thread from 2010: http://rapid-i.com/rapidforum/index.php/topic,1734.0.html
Has there been any progress on this since 2010? Am I missing an operator to achieve this? Do I have to install a plugin?
Thanks in advance for your help,
Daniel
0
Answers
Instance selection and Prototype based rules
This operator is available in ISPR -> Clustering and is called Flatten Clusters By Distance. This operator has a parameter which is a distance/similarity threshold.
As an input it requires the model of Agglomerative Clustring and an ExampleSet
In case of any problems please contact me.
Best
Marcin