Balanced classes in a unbalanced dataset with multiple classes

Liza123Liza123 Member Posts: 1 Newbie
edited January 2023 in Help
Hi all, 

I am new on this platform and I am struggling with balancing the classes.
When I create a model for my binary dataset I can use the sample operator or the SMOTE upsampling operator to balance my classes.
When I run a model with three (or more) classes the sample or SMOTE upsampling does not make my classes balanced.
Do one of you have any suggestions to make my classes balanced when I have multiple classes?

Thank you in advance. 



  • Options
    MNNikiforosMNNikiforos Member Posts: 6 Contributor II
    Hello @Liza123,

    I have faced a similar issue when trying to balance data with more than 2 classes. I have tried 3 things that usually work, depending on the problem/data set.

    1. Define the minority class as the class with the fewest examples and collapse all the other the classes into 1 class, therefore making it a 2-class problem.
    2. Use the SMOTE upsampling operator with auto_detect_minority_class activated as many times as the number of classes, and each time use the new data set as input. At the end, synthetic examples will be created for each class except for the majority one.
    3. Use the Sample operator by setting balance_data parameter to true, and then define the sample size for each class. In this case, you can undersample your majority class.

    I usually use a combination of 2 and 3, by undersampling the majority class first and then applying SMOTE as needed.

    I hope that you will find something that works well for you!

    Best Regards
Sign In or Register to comment.