Options

[SOLVED] evaluation of resampled dataset

makakmakak Member Posts: 13 Contributor II
Hello everyone!

This is my first post, so first things first. This is a great peace of software and you guys deserves nobel price for making it free for community. THANK YOU.

Now, here is my question, probably little stupid but I want to be sure. So I have unbalanced dataset, so I overcome this by undersampling majority class, or by applying weights or somehow make it balanced for training. But I must evaluate the performance, by cross-validation or split validation on original unbalanced dataset ,right? The balanced part is only for training, right?

Thank you in advance.

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    for most classifiers it is indeed important to have a more or less balanced training set. If you use the true distribution for testing depends - measures like the ROC plots or the AUC and also the Recall are independent of the distribution, whereas the accuracy and the precision (and many other measures) depend highly on the distribution. So to get a good estimate for those you should use the true distribution for testing.

    Best regards,
    Marius
  • Options
    makakmakak Member Posts: 13 Contributor II
    Hi Marius,

    thank you for prompt and really helpful answer.

    I would like to ask one more explanatory question, if you don't mind. I am  just trying to understand why is recall the same. Is it because, when classifier is learned on balanced dataset, it is able to predict minority class on unbalanced test set equally well. It simply gets more "opportunities" to missclasify majority class as minority in skewed dataset and that is why precision for minority class is lower?

    Thank you.

    Matus
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Matus,

    exactly right.

    Best regards,
    Marius
  • Options
    makakmakak Member Posts: 13 Contributor II
    Thank you Marius, this was really helpful.

    Best regards,
    Matus
Sign In or Register to comment.