i've been playing around with RM for about 2 months now and the more i work with the more i'm impressed about functionality and flexibility. Congratulations to all of the contributors of this great software package!
One problem that i'm currently struggling with is a multinominal classification task with three label classes of which only two are predicted. The dataset consists of performance measures that are categorized by a kind of rating - lets say A, B, C where A is the top and C the lowest rating. The dataset is fairly balanced (A - 35%, B - 30%, C - 35%) - nevertheless most of the classification learners (except the DT with gini_index criterion) do not predict the B rating at all.
From the contents of the dataset i know that boundaries between A - B and B - C heavily blur, but i can't explain why no example is assigned the B class. I'm able to influence the range of B's confidence by adding/removing features but it's never getting greater than any A or C confidence and hence is not taken as the prediction. What do you think? Is it possible that a single class simply has no predictive power? I read some posts about binominal classifications where only one class was predicted but these don't seem to relate to this problem.
Well, the process is basically designed so that the learner(s) "care" about the multiple classes. I've also tried the one-vs.-all strategy in order to map to a binominal approach. Interestingly then the model that handles the B class as positive target predicts around 10% as B rating although its proportion is much higher. By trying out several combinations of features i've now managed to predict around 1% as the B class even w/o the OvA. This leads me to the assumption that it might be rather a matter of data quality or data preparation. Therefore i would be interested in possible reasons that make a single class' characteristics be "swallowed" to such an extent and result in a class recall around zero
If you need to run the process in order to get deeper insights I'll post it together with some test data.
so I suppose you are using the Polynominal by Binominal Classification operator. That's generally fine, but it has one drawback: it does not change the class distributions for training, so if your data contains relatively few examples of class B the data that is used for training the B-vs.-all model is highly skewed, probably resulting in a bad model. You should rather try to create a one vs. all mechanism manually with some loops, creating one model for each class, and implement a mechanism to create a balanced class distribution in each training set.
For application, apply all models and predict the class with the highest confidence.
But you say that you let the learners care about multiple classes. What do you mean by this? Do you use models that support multiple classes out of the box? Which one are you using?
[SOLVED] Re: Multiclassification - one class not predicted
i meant that i didn't use a meta learner capable of mapping multiple classes via binominal learner but used different learners out of the box.
I've tried different approaches and i got it now. The problem was indeed the non equally distributed dataset. I couldn't believe that a distribution of about 35% - 25% - 40% leeds to such a contortion in the distribution of predicted classes. I have fully balanced the training dataset as mentioned in your FAQs and the B rating is now predicted to an extent comparable to the other classes.
Thanks, Marius, for your input and sorry for the question (should have tried out the balancing before asking but couldn't believe such an effect).
Unfortunately i have to reopen this thread as i run into the same problem again in the context of model appliance.
If you remember i had the problem that with a slightly unbalanced dataset (having three different label classes) the trained model was almost not able to predict one of the three label classes. When i manually balanced it and stratified the dataset (trained and validated via x-validation), the validation provided satisfiable results that cover all label classes as expected. In my opinion this balancing should be necessary only for the training of the model (in my case a neural network). Now i found out that the same problem occurs when the trained NN is applied to an unbalanced dataset. Model results are quite strange and one class is again not predicted. Provided that we know its label and balance the dataset as done for training then results are satisfactory again. So the model behaviour seems to depend on the data on which it is applied but this does not make sense to me. I wonder how to "balance" a dataset whose label is to be predicted as by nature we don't know the label, do we? ???
Ok, i think i can refine the problem to the NN learner. All other learners behave as expected when their generated models are applied. However, the NN model created on the basis of a stratified dataset works properly only if the dataset to be predicted has been stratified, too. (which is of course, is not possible, in practice).
With properly i mean that classes are predicted in a similar way as during model validation.
What i do now is to split the whole dataset into 80% training/validation data and 20% on which the model should be finally applied. The 80% are stratified (resulting in about 80% of the 80% training data) on which the NN is trained and tested via X-validation. This model shows satisfiable performance and is then applied to the remaining 20% of the original dataset (unstratified). As a result of not having stratified this part of the dataset the same class is predicted for all examples (with almost the same confidence distribution over the three label classes). As soon as i stratify the dataset part to be predicted the model behaves as expected by validation.
I have also tried to balance the dataset via example weights which lets the NN properly work when being applied, but with a much lower performance. That is why i've been following the stratifying approach yet. Any NN experts out there who are able to explain this behaviour?
There are at least two possible explanations. Maybe you have run into a bug of the Neural Net operator which can occur when the preprocessing before training and application is different, which it is for you because of stratification. This bug has already been fixed and will disappear with the next RapidMiner release.
On the other hand, Neural Nets are real beasts and at times are hard to optimize properly, and even then they tend to overfitting. If you get good results with other algorithms, just stick to those!
Glad to hear this explanation from you, Marius! Hoping that i ran into this bug - sounds promising. I basically come from the programmers field (having more than 10 years experience in Java programming and databases) - so i'm not a qualified data miner but have been intensifying my personal interest in it for the last two years. What i've learned yet is the essence of both data preparation and some algorithms' special behaviours (such as you mentioned - the sensitivity to overfitting). For my use-case i got best performance (by a wide margin) with the NN learner and hopefully will be able to manage the stratified modeling approach in the near future.
Thanks again - as soon as the fixed release will be out i'll let you know about further trials! Keep your fingers crossed for me till then! ;D
PS: since you wrote about bugs ... appropriate to my basic skills i enjoy to survey RM's code and think to have found a bug that occurs in the context of multi classification (recorded under bugs.rapid-i.com, id 1512). Are these (potential) bugs inspected at all? Some of them don't seem to be inessential to me ..