When k = 2, 3, 4?

tonyboy9 · December 2020

Below is my customer segmentation data which I ran in AutoModel.

Image: https://us.v-cdn.net/6030995/uploads/editor/j2/03tjqafedr6i.png

Below are screen shots when k = 2, 3 and 4. How can I tell which k is best?
I do not have access to the elbow method or silhouette analysis.
I looked at the three Davis-Bouldin indices which measure 5.415, 3.666 and 4.121.
Wikepedia calls this an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset. Due to the way it is defined, as a function of the ratio of the within cluster scatter, to the between cluster separation, a lower value will mean that the clustering is better.
Should I assume the index 3.666 means k = 3 is better?
Thanks for your time.
Tony

Image: https://us.v-cdn.net/6030995/uploads/editor/8q/672ktmfk22gt.png

Image: https://us.v-cdn.net/6030995/uploads/editor/30/oa49kqupbhtd.png

Image: https://us.v-cdn.net/6030995/uploads/editor/gi/qqyhdnp1hjf1.png

jacobcybulski · December 2020

The AutoModel does not perform cluster optimisation for k-Means, so if you run several experiments, the best distributed cluster model is the one which gives Davis-Bouldin measure closest to zero. However, if you select x-Means clustering it will return the optimum cluster number in between the specified range between minimum and maximum.

tonyboy9 · December 2020

Thank you for that. A follow up question, please. I need to interpret the k-means summary. I have no idea what the jumble of facts mean under each segment. How do I locate which segment has the problem attribute(s) I need to see. Given I now see the applicable segment, what does that mean in terms of problem solving?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

When k = 2, 3, 4?

Best Answer

Answers