The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Dynamically determine number of clusters k-means"
Farnoush_r
Member Posts: 5 Contributor II
Hi
I want to build a model in rapid miner that can predict the number of clusters automatically and then continue to the k-means algorithm. The below post has some great ideas but it is connected to a log table. Is there any way to do this dynamically and create a macro to calculate the number of clusters and give it to k-means?
http://rapid-i.com/rapidforum/index.php?topic=3447.0
I want to build a model in rapid miner that can predict the number of clusters automatically and then continue to the k-means algorithm. The below post has some great ideas but it is connected to a log table. Is there any way to do this dynamically and create a macro to calculate the number of clusters and give it to k-means?
http://rapid-i.com/rapidforum/index.php?topic=3447.0
Tagged:
0
Answers
It is possible to convert a log to an example set; use the Log to Data operator. For Davies-Bouldin, you could look for a minimum by sorting this example set by the validity measure and then simply using the value of k that is associated with it.
If you are confident that the data is well behaved in all cases then you could try that.
Regards
Andrew
Second, I did not understand your worry about my data, cause apparently I am determining k each time based on the imported data and with any data the process determines the best k. so what is the problem?
For the second question, the Davies Bouldin validity measure uses mathematics to create a measure to identify clusters that are relatively less scattered individually and are maximally separated from one another. Who is to say whether this mathematical algorithm matches what truly is the best clustering?