W-NaiveBayesMultiNomial vs NaiveBayes

erkerk Member Posts: 4 Contributor I
edited November 2018 in Help
Does anyone already have an idea what are the main differences between the two NB classifier implementations?

There should be some, because I am obtaining totally different results using them.
Most of the time weka implementation yields to a lot better results with my dataset which has around 200 attributes and 100 samples.



  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    which RapdiMiner version do you use?
    If someone could say me, what the WEKA operator does, I could explain the differences. But it seems to me, they aren't doing just NaiveBayes...

  • Options
    erkerk Member Posts: 4 Contributor I
    Hi Sebastian,

    I am using RM v4.4. I already gave a look to the source code of RM NaiveBayes and I agree that it implements pretty straight forward NaiveBayes, whereas my feeling about Weka (without reading the source code) is also in line with yours. It seems that it implements some other things (at least more numerical  tricks)which helps quite a lot to make NB more robust.

    I also added a few things to the original RM NB to make it more robust (and/or suit better to my dataset), i.e. homogeneous priors assumption, Poisson dist. assumption instead of Gaussian, calculation of log-likelihoods instead of likelihoods. And these helped me to get better prediction accuracy but I am still not as good as the Weka implemantation.

    Does anyone have any idea what exactly Weka NaiveBayes is doing??

  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Erk,

    actually, there had been some issues with the Naive Bayes implementation around the time we released RM 4.4 and we put some effort into stabilising NB numerically. As far as I remember, this was shortly after the release of 4.4. We also added the calculation of log-likelihoods then. If I remember correctly, Weka does not compute log-likelihoods, but rescales the probabilities during the multiplication of the conditional attribute value probabilities if the product becomes to small. Both these ways to gain numerical stability should be possible and yield relatively similar results - which is what we observed in numerous tests we have run to test our implementation. To sum up, the momentary version of NB should be more stable than the 4.4 version and you might want to have a look at it.

    Hope that helps,
    kind regards,
  • Options
    erkerk Member Posts: 4 Contributor I
    Hi Tobias,

    Thanks for the reply. Unfortunately, using RM 5.0 I still observe the similar trend that Weka NB outperforms RM one significantly. A Weka savy user may bring some light on this issue, I hope.

Sign In or Register to comment.