The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Gaussian Naive Bayes Formula on RapidMiner

Member Posts: 17 Contributor II
edited September 2019 in Help

Hello everyone!

I know that RapidMiner is using Gaussian distribution in Naive Bayes. But after I compare my result that I count manually and my result on RapidMiner, it's really different. So I am wondering maybe RapidMiner uses a different formula or I just count it wrongly.

I use this formula to count the mean : 1/n*(sum of xi), and this one to count the variance : 1/n-1*sum of(xi-mean)^2.

I want to know what's the formula that RapidMiner uses to count Gaussian NB? Is it just same with the formula that I use above?

Thank you.

Tagged:

• Options
Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

You can find an Excel file used to calculate the probabilities from the "Golf" dataset using NB formulas by following this link.

If you obtain differents results is maybe because RapidMiner calculate by default the probabilities with Laplace correction

and you without Laplace correction.

Regards,

Lionel

• Options
Member Posts: 17 Contributor II

Hello @lionelderkrikor . It's works nicely, thank you.

But some data still have different result. For example some of the standard deviations, in RapidMiner they display it as 0,001, but in Ms. Excel it's come out as 0. I wonder if RapidMiner and Ms. Excel have different way to count it (?)

• Options
Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

Have you set the number of digits after the decimal point to 3 or more in Excel ?

Regards,

Lionel

• Options
Member Posts: 17 Contributor II

@lionelderkrikor Yes, I have set the type in format cells into number and added several decimal places. But the result still the same. I tried to browse the formula on other website, and people said that Ms. Excel is using Bessel's correction to count the standard deviation.