2016004120160041 Member Posts: 6 Newbie
Could you please tell me how I can achieve downsampling with imbalanced data in RM. I have used the random sampling and sampling bootstrap operators would also like to know the difference between the two.
  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    In the Mannheim Toolbox extension, there is a Sample - Balance operator that does just this.

    (Opinions and fundamental techniques aside, but you might want to work with weighting instead of sampling.)

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    I second the idea that weighting is my preferred approach, and that downsampling should be used primarily when you have many more cases than needed (either in general, or specifically of the majority class).  There are diminishing returns to larger and larger samples, so if your development population is hundreds of thousands of cases then you likely don't need them all.  But if you have an absolutely small number of your minority class then you probably don't want to downsample the majority class to match it as too much information would be lost.
