Options

# naive bayes classification, what "simple distribution" means?

Hello,

I have used the tutorial (http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html) to obtain a bayesian classificator on text data.

It works properly but I would like to understand the meaning of one of the output called "simple distribution (naive Bayes)" . It looks like gaussian curves for each word and for different categories.

I guessed it represents the density of probability of the TF-IDF of the words knowing the category but I don't understand why it can be negative. Moreover, it looks like for words that are caracteristics from the category, the gaussian curve is flat although for non meaningfull words, it is straight and centered on zero.

Can anyone give me some clues about all that?

Thanks a lot,

Barthélémy

I have used the tutorial (http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html) to obtain a bayesian classificator on text data.

It works properly but I would like to understand the meaning of one of the output called "simple distribution (naive Bayes)" . It looks like gaussian curves for each word and for different categories.

I guessed it represents the density of probability of the TF-IDF of the words knowing the category but I don't understand why it can be negative. Moreover, it looks like for words that are caracteristics from the category, the gaussian curve is flat although for non meaningfull words, it is straight and centered on zero.

Can anyone give me some clues about all that?

Thanks a lot,

Barthélémy

0

## Answers

2,531Unicornplease refer to the Wikipedia article at http://en.wikipedia.org/wiki/Naive_bayes to understand how NaiveBayes works.

I think then the meaning of the distribution plots should become clear, if you take into account that there's a small minimal deviation for attributes that are always zero.

Greetings,

Sebastian

20Contributor IIBarthélémy