The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to extract distinct features of K-Means Cluster?
Hello altogether,
since I'm doing some cluster-analysis, I am mainly interested in the features of each cluster. How can each cluster be described by it's attributes?
When I think about a marketing-case, it's not enough to just cluster your customers. You also have to know how to treat each group, therefore you have to know what the main features are.
Is there a way to extract them from the K-Means algorithm or is there even a better approach to this?
Thanks in advance
Tagged:
0
Answers
Hello!
I think Extract Cluster Prototypes operator can help you/
Thank you for your answer @kershov!
But I think thats not exactly what I searched for, since the Prototypes don't really describe the clusters. E.g. when you plot the cluster you see that main group is in germany, but the prototype says it is norway, which seems contrary.
Is there another way to get features extracted? In a decision tree for example it is easier to identify the important features.
Thank you
Hi there, you have a couple of options to this common question.
You could turn your clusters into labels and then attempt to diagnose them using predictive modeling algorithms, using simple classifiers such as Naive Bayes or Decision Trees.
If you already have labels (not the clusters themselves) then you could use "Map Clustering on Labels" and do something similar. Or run a predictive model using only the cluster attribute against your existing labels.
You can also use the centroid output from clusters to determine which attributes score highly for a given cluster but not for other clusters. You could even use "Generate Attributes" to define a new metric of the difference in centroid values between one cluster and another.
You might also want to search through the forum on this topic since there are many existing threads that are related, and they might give you even more ideas. Here's one, for example: http://community.rapidminer.com/t5/RapidMiner-Studio-Forum/Cluster-Performance-DBScan-and-agglomerative-Clustering/m-p/40754#M27689
I hope this helps!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts