K-means on CSV file


Hello everyone.

I have the following a csv file containing blogposts including author name, date posted etc.

Now I want to apply K-means clustering to the blog's content. I try to use the Rapidminer text tool to apply tf-idf vectorisation. However I can't figure out how to apply the tf-idf to every blog in the csv file. Any suggestions?


you need TF-IDF only if you have the actual contents of the blog, i.e. text. In this case you can find some useful video tutorials on text mining here: http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html

I would first focus on the text and add the other attributes like author and date later on. If you need help feel free to come back to this forum.

