Process of X-means cluster with text data
I want to do x-means cluster with text data, but I am super new with Rapidminer. I followed several different tutorials and ended up with this process.
My data looks like the excel format at left hand side, where I have only one column with several single words.
If would be so nice if someone can confirm whether the process is right or wrong. I want to use X-means cluster because I want to see what is the ideal number of clusters. I am using TF-IDF, and Inside "process document from data", there are tokenize, transform cases, stopwords, and stem (poter). As for "X-Means", I set the k min of 10 and k max 60, with Cosine similarity.
However, the results appear weird to me because cluster 0 has almost all the data. Also, I expected that the results will tell me what would be the most ideal number of clusters? Or did I make any mistake in the process?
Thank you in advance!!!