🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
"Balanced sampling decision trees"
I'm just starting to use rapidminer and I have a problem with decision trees. I am working with a somewhat large dataset (approximately 500.000 cases). I am trying to use decision trees to predict if users are willing or not to buy a product. The problem is that the buying rate is very low 0.5%. When using stratified sampling with a ratio of 50% with the "sample" operator as pointed out somewhere in a similar thread in the forum, my tree is always biased towards the majority class so the results are totally useless. Is there any way I can balance the outcome variable with a rate of 50-50% do the modeling, and then rebalance the samples to their original rates? I have searched over the forum but trying all the answers and searching over many operators in rapidminer didn't gave me any results.
Thanks a lot in advance!