Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Deviation Chart and its interpretation."

nidhi_s019nidhi_s019 Member Posts: 2 Learner III
edited June 2019 in Help

I have a dataset on which K-means clustering is applied. I am trying to see through various visulaizations available if the value of K is justifiable, i.e. looking for non-overlapping clusters. One of the charts that explains it well is parallel chart, but the problem with the chart is that I cannot zoom-in to analyze closely if there is a overlap.

I also found deviation chart, which shows a line for each cluster that represents average of data points for every value of x in that cluster. But, apart from this, there is a shaded region around every line of each cluster. I am unable to understand what this shaded region represents. Can someone please explain this and significance of this chart.Please find attached snapshot for reference. Thanks

Best Answer

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Solution Accepted

    Hi,

     

    The transparent areas around the bold line show the area of the standard deviation for each attribute / column.  Imagine that you start with a parallel plot but instead of using all lines (i.e. examples / rows), for each group you will only get one line showing the average and the region where most of the lines for each group lie in.

     

    This is often much easier to interpret.  Especially in your case, where the bold lines represent the prototypical centroids and the transparent areas gives you some idea if the clusters are well separated or not.  You can also see which are the columns which help most to differentiate between your clusters.

     

    Hope that helps,

    Ingo

Sign In or Register to comment.