RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

"Balanced sampling decision trees"

ddrddr Member Posts: 1 Contributor I
edited June 2019 in Help
Hi everyone,

I'm just starting to use rapidminer and I have a problem with decision trees. I am working with a somewhat large dataset (approximately 500.000 cases). I am trying to use decision trees to predict if users are willing or not to buy a product. The problem is that the buying rate is very low 0.5%. When using stratified sampling with a ratio of 50% with the "sample" operator as pointed out somewhere in a similar thread in the forum, my tree is always biased towards the majority class so the results are totally useless. Is there any way I can balance the outcome variable with a rate of 50-50% do the modeling, and then rebalance the samples to their original rates? I have searched over the forum but trying all the answers and searching over many operators in rapidminer didn't gave me any results.

Thanks a lot in advance!

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    The Sample operator can be used to sample the majority class down (i.e. discarding some of the examples) if you use the balance_data option. Then you can specify how many examples of each class you want to use for learning.

    Is that sufficient for you?

    Best regards,
    Marius
  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    Hello
    How can I equal the number of classes (50 50) for two feature

    please help me

    thanks

    Β 

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,933  Community Manager
    Answered in the other thread where you posted the same question.

    Scott

Sign In or Register to comment.