Setting penalty or prior probabilities

Ras94Ras94 Member Posts: 3 Learner I

I have a data set with prior probabilities of 75% and 25%. I would like to set a penalty or the probabilities, so that the models will account for the skewed distribution - right now my decision tree, for example, is just predicting 100% towards the larger class, resulting in a 75% accuracy. As my data set is not very large, I would prefer not to undersample.


  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    If you don't want to downsample, you may take advantage of the SMOTE Upsampling operator, present in the Operator Toolbox.

    However, I don't know what you are doing. If you may share a bit more information...
  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hello @Ras94

    Did you try any feature selection techniques? If not, I recommend you to try feature selection techniques and cross validate your model to check performance before sampling your dataset as 75 to 25 is not a highly imbalanced dataset and this sort of data need to be dealt in the real world. 

    Also, why are you trying only decision tree? you can go with other algorithms like logistic regression, SVM etc which could probably provide you better classification results. You can interpret results using explain predictions operator that helps you in factor analysis.


    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    Ras94Ras94 Member Posts: 3 Learner I
    @varunm1 Thank you - I just went ahead with it and have been trying to evaluate on precision/recall/AUC. I have tried plenty of predictive models, but I was just wondering if there were a way to fix the decision tree since it is "broken" (e.g. see my issue).
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    The tree is actually not "broken" but tries to generalize from the data without success.  In those cases, it uses the majority class as prediction in all cases which is the only sensible thing to do.  Sometimes a tree-based model is simply not a good fit for your data, sometimes the default parameters are not a good fit.  You will probably get a different behavior if you change the pruning behavior, but that does not mean that this then is a good model in terms of predictive power (it can be better though).
Sign In or Register to comment.