Multinominal class prediction, different ways to solve balance problems?

mafernmafern Member Posts: 5 Contributor II
edited November 2018 in Help
Hi there!

I was wondering what are your general tips on approaching multinominal prediction when you have one large category that chews through your other categories. In the past I had a problem with a relatively low frequency class getting almost disregarded by some algorithms - I solved the problem by using other algorithms. Now I have incorporated a new category that owns around 50% of the records, while the other 4 categories sit around 5%-15%. This new class is destroying the other classes predictions, specially the smaller ones.

As usual, I could just fiddle around with the individual scores until a desirable distribution of predictions is found. I always wondered though, is there a way to optimize this search with rapidminer? Its a complex optimization process, because one would be aiming to improve accuracy without disregarding recall.

How would you approach this problem?

Thank you, hope we can build some insight toguether.

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hi,

    which approach did you take in first place to tackle the multicategorical classification task? I would suggest to do a so called 1 vs all classification: create a model for each class ,where the training data consists of examples of the current class as positives and examples from all other classes as negatives. For training, use the same amount of positives and negatives.

    To get the final prediction, apply all models on the new examples, and predict the class with the highest confidence.

    If however in your training data you have one class that is significantly larger than the others, your models will be biased towards the majority class, as you have discovered earlier. In this case you could try to play around with the class ratios during training.

    Best regards,
    Marius
  • mafernmafern Member Posts: 5 Contributor II
    Hi Marius thank you for answer yes, the idea is to get to 1 v all models of course. For now I only explored a little bit with decision trees and naive, which automatically allow polynominal labels.

    I was thinking about re-defining the score output by generating new scores and using a grid optimizer to try for example, 100 values between 0 and 1 (coeficient to the score output). Its like doing a manual regression on the model's different outputs, optimizing for accuracy.

    The only problem I see is recall will naturally be somewhat disregarded, I'm wondering, what other performance measures are used on confusion matrixs?

    About 'the highest confidence', shouldn't one be very careful when comparing confidences from different models? If the best model for one of the sub-models (for each class) is a decision tree and another one is a svm, the score will never be directly 'comparable'. Even with the same algorithms shouldn't one be careful with this? That's why I'm thinking the score coefficient optimizer is probably a must to maximize overall performance, isn't it? Not just a simple vote bagging..

    Best regards.
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hey,

    in the 1 vs all approach you should of course use the same model class. For text classification problems this would typically be a linear svm, but of course you can choose any other algorithm. The confidences of models created with the same algorithm (e.g. svms) are comparable. This is commonly done, and there is even an operator for this - the Polynominal by Binominal classification meta operator (this operator, however, does not allow to balance the data for training yet, so you should probably rather design a custom process).

    Best regards,
    Marius
  • mafernmafern Member Posts: 5 Contributor II
    Great!

    I was wondering now, is there a way to optimize the score value before applying Generate Confidences?

    I made something with macros and grid optimization (set macro for multiplier + generate new confidence attribute original conf * multiplier + optimize set macro value), but several drawbacks, for some reason slow execution time, lots of memory used, and when trying to adjust scores for several classes the exponential growth of grid opt makes it useless. If it could be done evolutionary... maybe it could work.

    I was wondering also, does a polynominal algorithm internally optimize the final score output for accuracy, in combination with the other classes scores? In other words, is all of this I am doing completely unnecesary?

    Even if the polynominal algorithm did that, when going 1 v all, even with comparable scores, I would still find use in optimizing the final score output for each class before checking for the highest.

    Am I clear enough? Hope this helps.

    Best regards.

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hm, to be honest I have no idea what you are doing with the macros, and also a "polynomial" algorithm is too general to say anything about that.

    If you, however, want to optimize the model for each class separately, you have to place the Optimization operator (or whatever you are using to optimize your model) inside of the Polynominal by Binominal Classification operator.

    Of course that is a time consuming process, and you should think well about the parameters to optimize to keep your search space small. If you do not want the calculation happen on your work station, you should consider to install RapidAnalytics on a powerful server machine: thus you are able to design your processes with the RapidMiner GUI on you machine, but let them run on the RapidAnalytics server such that they don't disturb your daily work.

    Best regards,
    Marius
Sign In or Register to comment.