K means group centroid and visualisation options

timc03 · November 2014

I am running a k means clustering in v6.0.008.

I am looking to visualise the results of the clustering as shown here (k means clustering graph): http://en.wikipedia.org/wiki/K-means_clustering#mediaviewer/File:ClusterAnalysis_Mouse.svg

Any suggestions on how to achieve this? I would be happy to use PCA before K Means clustering if that helps.

Also, as an aside, where is the 'cluster centroid' or the mean for each cluster? I have the centroids for each attribute in each cluster in the Cluster Model - cetroid table, but cannot find the cluster mean.

Thanks

Marco_Boeck · November 2014

Hi,

I used the following process to import the mouse data taken from here: http://elki.dbs.ifi.lmu.de/wiki/DataSets


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.001-SNAPSHOT">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="6.1.001-SNAPSHOT" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\boeck\Desktop\mouse.csv"/>
        <parameter key="column_separators" value="\s"/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <parameter key="encoding" value="UTF-8"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="att1.true.real.attribute"/>
          <parameter key="1" value="att2.true.real.attribute"/>
          <parameter key="2" value="att3.true.polynominal.label"/>
        </list>
      </operator>
      <operator activated="true" class="k_means" compatibility="6.1.001-SNAPSHOT" expanded="true" height="76" name="Clustering" width="90" x="179" y="30">
        <parameter key="k" value="3"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

You can then simply use the Chart tab of the results to visualize this.

I'm not sure regarding your bonus question, I don't think there is an explicit option to see that, but I may be wrong there.

Regards,
Marco

MartinLiebig · November 2014

Hello timc03!

First regarding the centeroids. If you take a look at the model itself, it has an "centeroid table" tab. There you can find your centeroids.

Furthermore there is a way to display the "boarders" of the cluster. Therfore you apply the clustering on random values in a given range. The result is the picture below:

I modified marco's process a bit so it creates this picture and connected the model:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.1.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="6.1.000" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
        <parameter key="csv_file" value="C:\Users\Martin\Downloads\mouse.csv"/>
        <parameter key="column_separators" value="\s"/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations"/>
        <parameter key="encoding" value="UTF-8"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="att1.true.real.attribute"/>
          <parameter key="1" value="att2.true.real.attribute"/>
          <parameter key="2" value="att3.true.polynominal.label"/>
        </list>
      </operator>
      <operator activated="true" class="k_means" compatibility="6.1.000" expanded="true" height="76" name="Clustering" width="90" x="380" y="30">
        <parameter key="k" value="3"/>
      </operator>
      <operator activated="true" class="generate_data" compatibility="6.1.000" expanded="true" height="60" name="Generate Data" width="90" x="514" y="255">
        <parameter key="number_examples" value="10000"/>
        <parameter key="attributes_lower_bound" value="0.0"/>
        <parameter key="attributes_upper_bound" value="1.0"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="6.1.000" expanded="true" height="94" name="Multiply" width="90" x="514" y="120"/>
      <operator activated="true" class="apply_model" compatibility="6.1.000" expanded="true" height="76" name="Apply Model" width="90" x="715" y="165">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Clustering" to_port="example set"/>
      <connect from_op="Clustering" from_port="cluster model" to_op="Multiply" to_port="input"/>
      <connect from_op="Clustering" from_port="clustered set" to_port="result 1"/>
      <connect from_op="Generate Data" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Multiply" from_port="output 1" to_port="result 2"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>

timc03 · November 2014

Thanks for both answers - however, the mice data set used has 2 dimensions ie it has already had dimensions reduced by PCA or other. I am looking for a way to visualise k means clustering results without dimension reduction.

MartinLiebig · November 2014

Hi,

what about a deviation plot? This way you could show in which attributes the cluster differ.
That would look like this for the sonar data set:

I would recommend the local normalization option

Edit: There is a similar plot for the centeroids in the model..

timc03 · November 2014

So maybe I should rephrase - this using a text mining example. So, after K means, every term belongs more or less to a cluster. I want to chart the relative position of each term to each cluster. This should be able to be done in a low dimensional graphical space given each cluster has a mean centroid. I hope that helps

MartinLiebig · November 2014

Your cluster centoroids are given bei an n-dimensional vector. In case of textmining the vector has most likely some thousand entries. I guess there is no way do show a 1000-dimensional vector.

timc03 · November 2014

"First regarding the centeroids. If you take a look at the model itself, it has an "centeroid table" tab. There you can find your centeroids."

This table contains only values for each variable, not the mean group centroid - the mean group centroid is the value I am interested in. Any suggestions?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

K means group centroid and visualisation options

Answers