K-Means Clustering for Text

svtorykh · January 2018

Hi RM Team! I have a quck question about application of K-Means clustering for text.

I have a set of ~2000 comments. Once I'm done with Text Processing (using TF-IDF) I have a word vector matrix of ~30 terms.

I then apply K-means operator, but I wonder what actually serves as input for clustering? Is it vector matrix? If so, does clustering algorythm uses values from TF-IDF Word Vectors or some other values?

Telcontar120 · January 2018

Exactly, it is the word vector matrix that is used. So if you created the vector using TF-IDF, it will use those values. You also have the option of using other methods to create the vector like binary term occurrences or term frequency percentage.

svtorykh · January 2018

Thanks much!

Telcontar120 · May 2018

Your cluster will be based on the pruned values of the word vector. If you are interested in the details you should be able to review the actual values for each cluster on the centroid table output of the k-means operator.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

K-Means Clustering for Text

Best Answer

Answers