"Balanced sampling decision trees"

ddrddr Member Posts: 1 Contributor I
edited June 2019 in Help
Hi everyone,

I'm just starting to use rapidminer and I have a problem with decision trees. I am working with a somewhat large dataset (approximately 500.000 cases). I am trying to use decision trees to predict if users are willing or not to buy a product. The problem is that the buying rate is very low 0.5%. When using stratified sampling with a ratio of 50% with the "sample" operator as pointed out somewhere in a similar thread in the forum, my tree is always biased towards the majority class so the results are totally useless. Is there any way I can balance the outcome variable with a rate of 50-50% do the modeling, and then rebalance the samples to their original rates? I have searched over the forum but trying all the answers and searching over many operators in rapidminer didn't gave me any results.

Thanks a lot in advance!

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    The Sample operator can be used to sample the majority class down (i.e. discarding some of the examples) if you use the balance_data option. Then you can specify how many examples of each class you want to use for learning.

    Is that sufficient for you?

    Best regards,
    Marius
  • abbasi_samiraabbasi_samira Member Posts: 9 Contributor I

    Hello
    How can I equal the number of classes (50 50) for two feature

    please help me

    thanks

     

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Answered in the other thread where you posted the same question.

    Scott

Sign In or Register to comment.