# Naive Bayesian models

Hi,

As I understand it, the weight given to a descriptor in a naive Bayesian model is proportional to the enrichment of that descriptor in the "active" or "good" set compared with the "bad" or "inactive" set. I would like to know how descriptors with only a very few instances in the training set are treated. With the approach described, you would end up with certainties one way or the other often (or in the extreme case of only one instance, all the time). Are these simply discarded?

thanks for any enlightenment,

Andy

As I understand it, the weight given to a descriptor in a naive Bayesian model is proportional to the enrichment of that descriptor in the "active" or "good" set compared with the "bad" or "inactive" set. I would like to know how descriptors with only a very few instances in the training set are treated. With the approach described, you would end up with certainties one way or the other often (or in the extreme case of only one instance, all the time). Are these simply discarded?

thanks for any enlightenment,

Andy

0

## Answers

2,531Unicornyou somehow seem to confuse the algorithms. Naive Bayes does not calculate any weights. Naive Bayes assumes an independence of all attributes, so that they are all exactly of the same weight. And it does not have any bad or good set. And what do you mean by descriptor? Now I'm confused by your question

Greetings,

Sebastian

6Contributor IIcategory

member1 110001001.... active

member2 0110000100....active

member3 110001001... active

member4 001100000.. inactive

member5 001000000.. inactive

So in this case, the structural feature represented by bit number 2 is enriched in the active members compared with inactive so the presence of this feature in any future chemical I see should weight that chemical to the active category. The size of that weight is (as I understand it) proportional to the enrichment in the active category compared with inactive and these weights are then used to categorise unseen compounds. Is this right?

If so, how are bits with very rare instances treated?

2,531Unicornyes I think this is correct. I would have used different terms, but it somehow comes down to this, I think. But the different weight is not linearly, but comes from the proportion of two nomal distribution densities...But however, I don't know what you mean with "very rare instances". Undersampled classes? Attributes (that's how your descriptors are called within RapidMiner) with only very few 1 and the rest 0s? In the latter case they aren't treated at all, because NaiveBayes does not differ between them.

Greetings,

Sebastian

6Contributor II849MavenIt is not a major intellectual breakthrough to spot that learning will not be brilliant when training and test sets are completely different. So your real point is?

6Contributor II"Such a situation might arise, for example, in the case of under-represented bits. Suppose that a given feature occurs only once in a given data set and for a compound in the training set for which the hypothesis is false (e.g., likely to be absorbed in the intestine). The resulting probability that the hypothesis would be true for any test compound having this feature would be 0. (In our trivial example, this would lead to the rather absurd conclusion that no compounds containing the feature will be absorbed in the intestine.) A Laplacian estimator is therefore applied by adding a value of 1 to each Pr[Ei|H] in the numerator and a value of N to the denominator, where N is the total number of pieces of evidence. This gives each E which occurs with a frequency of 0 a small, nonzero value"

849MavenI think that is what the Laplace correction is there for, you add a liitle to the top and bottom to prevent zero probabilities. The code is in SimpleDistributionModel.java.

6Contributor II849MavenRM has many virtues, documentation is not one of them! Luckily there is a forum for the Brave...

2,531Unicornif I wouldn't spend so much time here in the forum, we would have a much better documentation and much less user. So there's always a trade-off But it's the bare truth: We could use a few additional hands down here...

By the way, know I understand what's your problem was...

Greetings,

Sebastian

6Contributor II7Contributor IHi,

I´m trying to get into the topic of this Model but i can´t find a good introduction to this topic. Does anyone know a good Tutorial,Video or Literature which explains this model for beginner? In German or English?

I´m analysing speeches to analyse the ton of the text. I want to compare the dictionary methode with the Naive Bayes model. The dictionary Model should work but i have no idea how to handel the other one or what is the main different.

3,507RM Data ScientistDear Fabian,

what exactly are you interested in? In the naive bayes classifier or text mining? Naive Bayes is a standard technique for classification and is explained in most text books.

For text mining in RapidMiner my old friend @MariusHelf recommended this blog post: http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html

~Martin

Dortmund, Germany

7Contributor IDear Martin,

thank you for your help. My topic is text mining. I try to compare the results from the dictionary part with the text mining of naive bayes to compare both. I have a couple of textes and i try to find out how many positiv, neutral and negativ words are in it. I will have a look to the tutorials so far.

2,959Community Managerhi...if you're looking for a textbook-type help, you may find this book helpful. It has a whole section on Naive Bayes with screenshots from RapidMiner.

https://www.amazon.com/Predictive-Analytics-Data-Mining-RapidMiner/dp/0128014601/

Scott