The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Deviation Chart and its interpretation."

nidhi_s019nidhi_s019 Member Posts: 2 Learner III
edited June 2019 in Help

I have a dataset on which K-means clustering is applied. I am trying to see through various visulaizations available if the value of K is justifiable, i.e. looking for non-overlapping clusters. One of the charts that explains it well is parallel chart, but the problem with the chart is that I cannot zoom-in to analyze closely if there is a overlap.

I also found deviation chart, which shows a line for each cluster that represents average of data points for every value of x in that cluster. But, apart from this, there is a shaded region around every line of each cluster. I am unable to understand what this shaded region represents. Can someone please explain this and significance of this chart.Please find attached snapshot for reference. Thanks

Best Answer

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Hi,

     

    The transparent areas around the bold line show the area of the standard deviation for each attribute / column.  Imagine that you start with a parallel plot but instead of using all lines (i.e. examples / rows), for each group you will only get one line showing the average and the region where most of the lines for each group lie in.

     

    This is often much easier to interpret.  Especially in your case, where the bold lines represent the prototypical centroids and the transparent areas gives you some idea if the clusters are well separated or not.  You can also see which are the columns which help most to differentiate between your clusters.

     

    Hope that helps,

    Ingo

Sign In or Register to comment.