Multinominal class prediction, different ways to solve balance problems?
I was wondering what are your general tips on approaching multinominal prediction when you have one large category that chews through your other categories. In the past I had a problem with a relatively low frequency class getting almost disregarded by some algorithms - I solved the problem by using other algorithms. Now I have incorporated a new category that owns around 50% of the records, while the other 4 categories sit around 5%-15%. This new class is destroying the other classes predictions, specially the smaller ones.
As usual, I could just fiddle around with the individual scores until a desirable distribution of predictions is found. I always wondered though, is there a way to optimize this search with rapidminer? Its a complex optimization process, because one would be aiming to improve accuracy without disregarding recall.
How would you approach this problem?
Thank you, hope we can build some insight toguether.