RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
step by step optimization for multiclass classification
I am working on a
multiclass classification with different algorithms (decision tree, KNN, Naïve Bayes,
SVM, NN) and I am trying to optimize my results. I want to do this step by step
so that you can see a process. At first, I only use each algorithm in the cross
validation operator. The next step should be the optimization with the grid
operator (also inside the cross validation).
Now we come to my first problem:
I am not really sure,
which parameters I have to choose in the grid optimization. For Decision tree
and KNN ( Naïve Bayes hasn’t any parameters to set up) I took a few parameters
and had better results…So it’s fine for me.
But if I choose the following parameters for SVM the process doesn’t run (it runs for many hours, but without a result):
- SVM Gamma 0-1; 10 Steps; Linear
- SVM.C 0-1 ; 10 Steps; Linear
- SVM.epsilon 0.001-1; 10 Steps; Linear
I get the same problem with my neural net algorithm:
- learning rate
0.01-0.4; 10 Steps; log.
- momentum 0.01-0.4; 10 Steps; log.
Is there anything wrong, so that my process doesn’t work?
My next step to optimize my results is to use (next to the grid operator) the sample (balance) operator from the marketplace. I placed the operator before the cross validation. This operator upsamples my minor labels, so that the dataset is more balanced. My question here is:
Is it realistic, that I improve my Recall and Precision from around 35% up to 75%? For me, this happened for Decision Tree, KNN and Naïve Bayes.
So we come to my last question:
it a good way/ idea to show a improving process in this order:
1. Only each algorithm
2. Algorithm + grid
3. Algorithm + grid + sample (balance)
4. Algorithm + grid + sample (balance) + bagging/adaboost
Thank you very much.