Options

Class imbalance & GenerateWeight use

crojasmcrojasm Member Posts: 1 Contributor I
edited November 2018 in Help

Hi everyone,

 

I have a highly imbalance example and I want to use weighting to increase the performance of the classifiers I want to use in my project. The problem I have is that once I put the Generate Weight in my model, some classifiers operators like SVM, Logistic Regression, Random Forest, etc. show the messaje "Input example set has weights, but the learner will ignore them".

Could some body help to me how to use the Generate Weight in imbalance examples?

Any other ideas how to balance an example using RM are welcome.

 

Thanks.

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn

    Hi,

     

    yes, many learning methods do not support weighted examples and hence it can't be the way around this problem if you want to use them. You can right click on an operator and click on Show operator info to see the capabilities of each operator and what it supports (like weighted examples).

    So you either choose a method that does support example weights, or you cannot use them.

    Anyway, if you sample, weight or otherwise bias your training data set, please be aware that this will shift the class aprio probabilities. Let's say you will give each class 50%-50% weight. Then the algorithm will predict 50% of each data set as being the minor class. Not sure if that's what you want? 

    I would recommend the following approach:

    1. Select a useful performance measure (accuracy it is most likely not in a highly imbalanced data set)

    2. Optimize the classifiers according to that measure

    3. Optimize feature selection

    4. Find a good threshold on the confidence levels to split. Per default the split is always 50% in a 2 class problem. But perhaps you want to detect more true positives and rather suffer from false positives. Then you can shift the split value. If you have weights you can calculate that on training data. 

     

    Greetings,

      Sebastian

  • Options
    CraigBostonUSACraigBostonUSA Administrator, Employee, Member Posts: 34 RM Team Member

    Here's a good article on how to handle class imbalance:

     

    http://www.ele.uri.edu/faculty/he/PDFfiles/ImbalancedLearning.pdf

  • Options
    tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research

    Hi @crojasm,

     

    Besides the really good tips already given, you can also try out the SMOTE Upsampling operator from the Operator Toolbox Extension.

    It allows you to upsample your minority class. 

     

    Best regards,
    Fabian

Sign In or Register to comment.