Options

"k-means and its centroïde table values SOLVED"

John_DavisJohn_Davis Member Posts: 9 Contributor II
edited June 2019 in Help
Hi,

The k-means operator in Rapid-Minder gives us a centroïde table values in which each cluters contains items and corresponding values  . What are these values:  tf-idf, Chi2, information rate,...?    

Yours

John Davis
Tagged:

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi John,

    that are probably columns that have been present in your data.

    k-Means defines clusters by their central data point, i.e. the average of all elements in the cluster. These so called centroids are defined by the centroid table, where each column contains the attribute values of a centroid.

    Best regards,
    Marius
  • Options
    John_DavisJohn_Davis Member Posts: 9 Contributor II
    Hello,

    I think I was not so clear in my first post.

    I understand that when using k-means operator, one can have a look through the example set at  each cluster's centroïd. (i.e. the attribute values of each cluster's centroïd). My question is about the values that are given in the k-means spreed sheets. For example, when applying k-means on textual data (k=3 clusters), on could end up with a k-means spreed sheet like: 

    ATTRIBUTE    cluster_ 1  cluster_ 2  cluster_ 3
        word x          0.2            0.01            0.2
        word y          0,4            0,3            0.01
        word z            0            0.03          0.002

    What are the values fo each column

    Yours

    John
                                                                       
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi John,

    you mean how to interpret the values or the meaning of them? They are the normalized TD-IDF values of the centroids. The TF-IDF values are created by the process documents operator and you will find plenty of information if you google for TF-IDF. Basically it is a kind of smart counting of words in the documents.

    Best regards,
    Marius
  • Options
    John_DavisJohn_Davis Member Posts: 9 Contributor II
    Thanks a lot. I'am familiar with this numerical statistic.

    Yours
    John 
Sign In or Register to comment.