The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Naive Bayesian models

AndyVAndyV Member Posts: 6 Contributor II
edited November 2018 in Help
Hi,
As I understand it, the weight given to a descriptor in a naive Bayesian model is proportional to the enrichment of that descriptor in the "active" or "good" set compared with the "bad" or "inactive" set.  I would like to know how descriptors with only a very few instances in the training set are treated.  With the approach described, you would end up with certainties one way or the other often (or in the extreme case of only one instance, all the time).  Are these simply discarded?
thanks for any enlightenment,
Andy

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Andy,
    you somehow seem to confuse the algorithms. Naive Bayes does not calculate any weights. Naive Bayes assumes an independence of all attributes, so that they are all exactly of the same weight. And it does not have any bad or good set. And what do you mean by descriptor? Now I'm confused by your question :)

    Greetings,
      Sebastian
  • Options
    AndyVAndyV Member Posts: 6 Contributor II
    Apologies for lack of clarity.  What I have is a training set of 50000 members, each member being described by 1024 "descriptors" defining the presence or absence of a chemical structural feature.  All 1024 features are treated as independent and, to begin with, all have equal weights.  Then the training data is queried.  I have members classified in 2 categories : active and inactive.  The presence of the descriptor in the molecule sets a bit at a certain position to 1 and the absence sets it to 0.  So I have a 2D matrix e.g.:

                                            category
    member1 110001001....  active
    member2 0110000100....active
    member3 110001001...  active

    member4 001100000..    inactive
    member5 001000000..    inactive


    So in this case, the structural feature represented by bit number 2 is enriched in the active members compared with inactive so the presence of this feature in any future chemical I see should weight that chemical to the active category.  The size of that weight is (as I understand it) proportional to the enrichment in the active category compared with inactive and these weights are then used to categorise unseen compounds.  Is this right?
    If so, how are bits with very rare instances treated?
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    yes I think this is correct. I would have used different terms, but it somehow comes down to this, I think. But the different weight is not linearly, but comes from the proportion of two nomal distribution densities...But however, I don't know what you mean with "very rare instances". Undersampled classes? Attributes (that's how your descriptors are called within RapidMiner) with only very few 1 and the rest 0s?  In the latter case they aren't treated at all, because NaiveBayes does not differ between them.

    Greetings,
      Sebastian
  • Options
    AndyVAndyV Member Posts: 6 Contributor II
    So if, say, attribute 3 had two bits set to 1 in the active training set and all set to zero in the inactive training set wouldn't it appear certain that this attribute was associated with activity?  Which seems to be overly certain
  • Options
    haddockhaddock Member Posts: 849 Maven
    G'Day,

    It is not  a major intellectual breakthrough to spot that learning will not be brilliant when training and test sets are completely different. So your real point is?
  • Options
    AndyVAndyV Member Posts: 6 Contributor II
    Laplacian-Modified Naive Bayesian models as it turns out.  I am aware that my point is no intellectual breakthrough but it relates to a problem that I might well encounter with the data I am using, and the LMNB is designed to deal with it automatically.  In order to use any software, it helps to know how it will deal with particular features of my data and this is what I was looking to clarify.  Below is an extract from a paper which met the same problem and how they dealt with 1/0 probabilities.  I'm interested to know if similar feature is in Rapidminer


    "Such a situation might arise, for example, in the case of under-represented bits. Suppose that a given feature occurs only once in a given data set and for a compound in the training set for which the hypothesis is false (e.g., likely to be absorbed in the intestine). The resulting probability that the hypothesis would be true for any test compound having this feature would be 0. (In our trivial example, this would lead to the rather absurd conclusion that no compounds containing the feature will be absorbed in the intestine.) A Laplacian estimator is therefore applied by adding a value of 1 to each Pr[Ei|H] in the numerator and a value of N to the denominator, where N is the total number of pieces of evidence. This gives each E which occurs with a frequency of 0 a small, nonzero value"
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi again,

    I think that is what the Laplace correction is there for, you add a liitle to the top and bottom to prevent zero probabilities. The code is in SimpleDistributionModel.java.
  • Options
    AndyVAndyV Member Posts: 6 Contributor II
    Thank you.  Exactly what I was looking for.  And which I had failed to find searching on "Laplacian" in the documentation
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Andy,

    RM has many virtues, documentation is not one of them! Luckily there is a forum for the Brave...

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Yes,
    if I wouldn't spend so much time here in the forum, we would have a much better documentation and much less user. So there's always a trade-off :) But it's the bare truth: We could use a few additional hands down here...

    By the way, know I understand what's your problem was...

    Greetings,
      Sebastian
  • Options
    AndyVAndyV Member Posts: 6 Contributor II
    ...and thanks Sebastian for your help.  I will do better in framing my questions next time! 
  • Options
    fabian_preisfabian_preis Member Posts: 7 Contributor I

    Hi,

     

    I´m trying to get into the topic of this Model but i can´t find a good introduction to this topic. Does anyone know a good Tutorial,Video or Literature which explains this model for beginner? In German or English?

     

    I´m analysing speeches to analyse the ton of the text. I want to compare the dictionary methode with the Naive Bayes model. The dictionary Model should work but i have no idea how to handel the other one or what is the main different.


  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist

    Dear Fabian,

     

    what exactly are you interested in? In the naive bayes classifier or text mining? Naive Bayes is a standard technique for classification and is explained in most text books.

     

    For text mining in RapidMiner my old friend @MariusHelf recommended this blog post: http://vancouverdata.blogspot.de/2010/11/text-analytics-with-rapidminer-loading.html

     

    ~Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    fabian_preisfabian_preis Member Posts: 7 Contributor I

    Dear Martin,

     

    thank you for your help. My topic is text mining. I try to compare the results from the dictionary part with the text mining of naive bayes to compare both. I have a couple of textes and i try to find out how many positiv, neutral and negativ words are in it. I will have a look to the tutorials so far.

     

     

  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hi...if you're looking for a textbook-type help, you may find this book helpful.  It has a whole section on Naive Bayes with screenshots from RapidMiner.

     

    https://www.amazon.com/Predictive-Analytics-Data-Mining-RapidMiner/dp/0128014601/

     

    Scott

Sign In or Register to comment.