🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
"advice on which clustering/classification operators to use"
I went through all of the normal document processing, tokenizing, filtering out stop words, etc. My first thought was to use k-nn to see how well the predicted labels would match up with the pre-assigned labels, then I could perhaps create an exception set of instances where k-nn thought the text might be misclassified. However, I'm not crazy about the lack of output/diagnostics from k-nn. I would prefer to have some additional information about how certain the algorithm is about the label it has assigned.
So, I started to look at some unsupervised methods. I tried k-means but it doesn't seem to offer that much more in diagnostics or output than k-nn. I'm looking at the Expectation Maximization Clustering but it seems to hang and not complete. It sounds like some sort of fuzzy clustering is what I want, but it doesn't sound like there are any operators like that right now for RapidMiner.
So, are there any operators or extensions that offer fuzzy clustering or something similar ? What I'm looking for is either a supervised method that returns some info on the certainty of each label assignment, or an unsupervised method that provides info on the certainty of each assignment, plus info on the characteristics of each cluster.
Any help would be much appreciated, thanks in advance !