K-means on CSV file


K-means on CSV file

Hello everyone.

I have the following a csv file containing blogposts including author name, date posted etc.

Now I want to apply K-means clustering to the blog's content. I try to use the Rapidminer text tool to apply tf-idf vectorisation. However I can't figure out how to apply the tf-idf to every blog in the csv file. Any suggestions?

Super Contributor

Re: K-means on CSV file


you need TF-IDF only if you have the actual contents of the blog, i.e. text. In this case you can find some useful video tutorials on text mining here: http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html

I would first focus on the text and add the other attributes like author and date later on. If you need help feel free to come back to this forum.

Best regards,