"Text Mining Classification Problem"
I am able to create a model using RM5, but I do not believe the algorithm I chose is working well. I have tried a number of algorithms, but I have tried SVM, NaiveBayes, W-SMO.
For the document, I Tokenize, Filter Stopwords (english), then Filter Tokens (by length) which is then sent to the classification algorithm.
I then take unlabeled data and process it and it classifies all as the same value.
I have 4 classifications with 500 labeled data for each for training.
Please provide guidance.