Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Classification of highly imbalanced data

xxxyxxxy Member Posts: 1 Learner III
edited November 2018 in Help
Hi guys,

I'm working on churn prediction problem and I'm having a problem with highly imbalanced data (only 0.1% churners in data set). I have tried different types of pre-processing and modeling, but still cannot get decent results (maximum 20 % real churners in 10% of highest propensity records).

I tried to use upsampling, downsampling, something in between, clustering set before classification, normalization, PCA, feature selection... And different modeling techniques, decision trees, neural nets, SVM... Bagging and boosting and missclassification cost. This has helped me to improve accuracy of my model from 2% to 20 % in highest propensity segment, but this is the most i got.


Did anyone work on similar problems? Which technique did you find most helpful?

Thank you in advance,

Bojana
Sign In or Register to comment.