Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community Home
- :
- Data Science Corner
- :
- Academia SIG
- :
- Naive Bayes (Kernel) - optimizing parameters for P...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
Options
# Naive Bayes (Kernel) - optimizing parameters for PhD

####
See more topics labeled with:

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-04-2017 07:11 AM

08-04-2017 07:11 AM

Hi! I've been using NB (Kernel) algorithm for my classification problem and I choose a greedy estimation mode.

I also used operator Optimize Parameters (Grid) in order to find the best combination of bandwidth and number of kernels. So, I put that the range of a bandwidth parameter will be from 0.01 to 0.1, and for kernel parameter from 1 to 20.

I've been wondering if these values are in good range and what exactly "number of kernels" parameter stands for? I've been searching the literature for the past few days in order to find some recommended ranges of this parameter and also to find an explanation of the "number of kernels" parameter, but it didn't result in any success.

I would appreciate your help and insights.

Solved! Go to Solution.

2 REPLIES

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-04-2017 04:30 PM

08-04-2017 04:30 PM

Solution

Accepted by topic author MinaT

08-04-2017
06:37 PM

Hi,

Let's start on the meaning of the parameters first.

I assume you know how Naive Bayes works in general. If not, I recommend the following blog post:

https://rapidminer.com/naive-bayes-not-naive/

For nominal / categorical values, we derive probabilities for the combination of attribute values by simply counting the possible values and dividing them by the number of all possibilities. But what do we do for numerical values? In a simple implementation, the probabilities for numerical values are derived from a single distribution (usually Gaussian) which is fitted to the data.

A kernel-based distribution is now replacing this simple single-modal distribution by one consisting of an additive overlay of multiple gaussian distributions. See here fore more information: https://en.wikipedia.org/wiki/Kernel_density_estimation

The "number of kernels" is now simply the number of distributions which is used. If the number is high, the distribution becomes more complex / wiggly which might fit to a sort of overfitting to your data. If it is too small, you might miss important peaks in your data.

The width parameter is simply the width of those single kernels. Wider kernels will lead to smoother distribution curves, more narrow kernels will again wiggle more.

Of course there is not really a great range value which works for all data sets. I typically try numbers between 1 and 10 for the number of kernels and a width range between 0.1 and 0.5 so that the distribution is not getting too wiggly.

Hope this helps,

Ingo

How to load processes in XML from the forum into RapidMiner: Read this!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-04-2017 06:37 PM