Twitter sub-topic cluster calculation
Hi! I am trying to perform a cluster analysis on a dataset of some 35.000 tweets in order to try and find clusters talking about similar sub-topics. I am not entirely sure how to approach this. So far I have tried using a DBSCAN clustering but it has essentially just returned with one giant cluster. Do I need to use another clustering method or somehow pre-define a number of clusters using the most common words? This is planned as an extension to my thesis and I am new to clustering and RapidMiner so any help would be greatly appreciated.