Question about Decision Tree / WEKA SimpleCART
I have a lot of data which is labeled into 4-5 classification groups. I have 3-4 positive groups and 1 negative group. I'm really interested in classifying the 3-4 positive groups, but the negative group makes up > 99% of the data. So if I try to optimize for accuracy, I end up with a tree with 1000 nodes, basically just curve fitting the data. If I set a minimum number of instances per node very high, in the extreme case, it just assigns everything to the positive group. Does anyone have some suggestions for dealing with this issue? Anyone know of a good guide for WEKA parameter tuning?