"Deviation Chart and its interpretation."
I have a dataset on which K-means clustering is applied. I am trying to see through various visulaizations available if the value of K is justifiable, i.e. looking for non-overlapping clusters. One of the charts that explains it well is parallel chart, but the problem with the chart is that I cannot zoom-in to analyze closely if there is a overlap.
I also found deviation chart, which shows a line for each cluster that represents average of data points for every value of x in that cluster. But, apart from this, there is a shaded region around every line of each cluster. I am unable to understand what this shaded region represents. Can someone please explain this and significance of this chart.Please find attached snapshot for reference. Thanks
Best Answer
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
Hi,
The transparent areas around the bold line show the area of the standard deviation for each attribute / column. Imagine that you start with a parallel plot but instead of using all lines (i.e. examples / rows), for each group you will only get one line showing the average and the region where most of the lines for each group lie in.
This is often much easier to interpret. Especially in your case, where the bold lines represent the prototypical centroids and the transparent areas gives you some idea if the clusters are well separated or not. You can also see which are the columns which help most to differentiate between your clusters.
Hope that helps,
Ingo
2