The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Uneven distributed binominal data

MuehliManMuehliMan Member Posts: 85 Maven
edited August 2019 in Help
Dear RM community,

I have a problem handling my dataset. I am trying to build a random forest model with a binominal label. The only prblem is, that the dataset contains 50 positives and 200 negatives. If all examples are predicted als false the accuracy is still quite OK (80%).  And this is exactly what happens: Most models I get are predicting most as false.

So my question is, how to handle uneven distributed datasets. Is there for example a way to weight correct positives more than correct negatives negatives? A correct predicted postive should then be 200/50 times more valueable.



  • Options
    Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi Markus,

    you can use the 'Sample' Operator set to absolute sampling with checked 'balance data' option before starting the X-Validation. This way you can ensure you have equal sized data for all your classes.
    Furthermore you can try to use the 'Generate Weight (Stratification)' Operator. Classes with less examples will get higher weights then classes with more examples.

Sign In or Register to comment.