Interpretation of different classification models - comprehension questions
I have some questions regarding the evaluation of my binary classification analysis. It would be great if you can share your thoughts about my consideration and decisions, because I am pretty knew to RapidMiner and machine learning at all. In addition, I want to clarify that I don't earn money with this analysis. It is just a case study and I hope that other people who are interested in this topic can learn from this thread and my questions too.
Background: The test and training set contains about 450 weighted examples, distribution of the label is 50:50, used 20 attributes for setting up the model
I noticed during the analysis that my data is very easy to split. Therefore the created decision trees have max. 3 levels and an accuracy of 98%-99%. This result sounds like a clearly overfitted model(marked with a red line in the results screenshot). In my opinion the result indicate that the models have a small variance and a small bias which leads to the overfitting correct? Even if I dont use pruning the tree has max. 4 levels. I tried to lower the pruning parameters to optimize the model but this was not successful.
Afterwards I created some models with different classification techniques. I created some multi layer perceptrons with the neural net operator and some SVMs with LIBSVM Operator. Please take a look at the attached screenshots. They describe the configuration of each parameter. Every parameter that is not mentioned on the screenshot is default.
The next screenshot displays the results of my models. I have used a 10, 5 fold and leave-one-out cross validation with stratified sampling.
accuracy of the models
Please take a look at the results above. My next step was to eliminate the models with a deviation higher then 20 to choose a stable model (blue lines in the result screenshot). The deviation is the criterium of the model stability isn't it? I think it looks like the model is more stable after a 5 fold cross validation than after a 10 fold or leave-one-out validation.
The models which are marked with a yellow line, are models which I prefer because their accuracy is high with a small deviation.
Furthermore I think that SVM 6 is better than SVM 7 because it uses only 48 support vectors. SVM 7 uses 258 support vectors.
What do you think about my suggestions? At the moment I am searching for the best MLP model, but I don't know how to find it. Is there a way to detect overfitting in neural networks?
Finally, I have a question regarding the evaluation charts. The following charts display the lift and ROC results of model SVM 6.Lift chartROC Chart
Can someone please explain me the ROC threshold curve? I think I don't understand it properly.
Thanks in advance for your replies! I really appreciate it.