The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"k-means and its centroïde table values SOLVED"

John_DavisJohn_Davis Member Posts: 9 Contributor II
edited June 2019 in Help
Hi,

The k-means operator in Rapid-Minder gives us a centroïde table values in which each cluters contains items and corresponding values  . What are these values:  tf-idf, Chi2, information rate,...?    

Yours

John Davis
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi John,

    that are probably columns that have been present in your data.

    k-Means defines clusters by their central data point, i.e. the average of all elements in the cluster. These so called centroids are defined by the centroid table, where each column contains the attribute values of a centroid.

    Best regards,
    Marius
  • John_DavisJohn_Davis Member Posts: 9 Contributor II
    Hello,

    I think I was not so clear in my first post.

    I understand that when using k-means operator, one can have a look through the example set at  each cluster's centroïd. (i.e. the attribute values of each cluster's centroïd). My question is about the values that are given in the k-means spreed sheets. For example, when applying k-means on textual data (k=3 clusters), on could end up with a k-means spreed sheet like: 

    ATTRIBUTE    cluster_ 1  cluster_ 2  cluster_ 3
        word x          0.2            0.01            0.2
        word y          0,4            0,3            0.01
        word z            0            0.03          0.002

    What are the values fo each column

    Yours

    John
                                                                       
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi John,

    you mean how to interpret the values or the meaning of them? They are the normalized TD-IDF values of the centroids. The TF-IDF values are created by the process documents operator and you will find plenty of information if you google for TF-IDF. Basically it is a kind of smart counting of words in the documents.

    Best regards,
    Marius
  • John_DavisJohn_Davis Member Posts: 9 Contributor II
    Thanks a lot. I'am familiar with this numerical statistic.

    Yours
    John 
Sign In or Register to comment.