"Text Classification using Text Plugin - NaiveBayes, Updateable Models"

pserpser Member Posts: 8 Contributor II
edited May 2019 in Help
This post refers to http://rapid-i.com/rapidforum/index.php/topic,368.0.html and http://rapid-i.com/rapidforum/index.php/topic,369.0.html. It adresses the problems I experienced when trying to update models.

Given I created a wordlist and saved it to disk. Then I can use StringTextInput several times, each time loading and vectorizing only a part of the database texts. I want to give the word vectors to a learner that learns to classify texts. It should be a learner that produces an updatable model. I tried NaiveBayes.

Problem 5: The UpdateModel Operator throws an error that says the corresponding model (which is a DistributionModel) is not updatable. Adding the line "public boolean isUpdatable() { return true; }" to the DistributionModel.java solved the problem. I did not find any learner/model that worked with UpdateModel without modifying the sourcecode. Did I do something wrong?

Problem 6: NaiveBayes does not work properly on my examples. It classifies all texts with the same class. I looked into the sourcecode and I think the problem is that NaiveBayes multiplies all probabilities (when handling numerical attributes). Since I used about 1600 attributes the product was probably too small for a double and was rounded to 0.

I decided to use some Bayes Operator from Weka which computes probabilities in terms of sums of logarithms instead. Again I get the problem with UpdateModel, I haven't tried if the same fix as mentiond above works again.

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi,
    pser wrote:

    Problem 5: The UpdateModel Operator throws an error that says the corresponding model (which is a DistributionModel) is not updatable. Adding the line "public boolean isUpdatable() { return true; }" to the DistributionModel.java solved the problem. I did not find any learner/model that worked with UpdateModel without modifying the sourcecode. Did I do something wrong?
    No, as far as I know the naive bayes model is the only updatable model at the moment. Unfortunately we forget to mark it as updatable by adding the appropriate method. We will add the method [tt]isUpdatable()[/tt].
    pser wrote:

    Problem 6: NaiveBayes does not work properly on my examples. It classifies all texts with the same class. I looked into the sourcecode and I think the problem is that NaiveBayes multiplies all probabilities (when handling numerical attributes). Since I used about 1600 attributes the product was probably too small for a double and was rounded to 0.

    I decided to use some Bayes Operator from Weka which computes probabilities in terms of sums of logarithms instead. Again I get the problem with UpdateModel, I haven't tried if the same fix as mentiond above works again.
    I don't know whether the problem is that the probabilities are too small, but it may be. Did you check the learners already without the [tt]ModelUpdater[/tt] but the [tt]ModelApplier[/tt]?

    Regards,
    Tobias
  • pserpser Member Posts: 8 Contributor II
    Hi Tobias,
    Tobias Malbrecht wrote:

    Did you check the learners already without the [tt]ModelUpdater[/tt] but the [tt]ModelApplier[/tt]?
    Yes I did. In fact I did not do anything with the [tt]ModelUpdater[/tt] so far except for testing that it does not throw an error.

    Regards,
    Daniel
  • asiulanaasiulana Member Posts: 6 Contributor II
    Hi everyone!

    Daniel, I don't know if you solved your problem but the same thing happens to me.

    I'm classifying microarrays and i've tried with several data sets and what happens is that i get all the examples classified with the same class and i find that very odd.

    Do you know what is goind one? I have thousand of numerical attributes too (sometimes 22500) but i checked the laplace_correction (in the tuturial says it helps)

    If someone could help me out i would appreciate it very much.

    All the best
    Ana Luisa
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Daniel, Hi Ana Luisa,
    I think thats a valid drawback of our current implementation of naive Bayes. The double value might not be exact enough for this many attributes. I add this to my (already long) todo list.
    Espacially for Ana it could be a solution to check how a SVM with linear kernel performs. On gene expression data this works typically very well.

    Greetings,
    Β  Sebastian
  • asiulanaasiulana Member Posts: 6 Contributor II
    Hi Iand

    thanks for your soon reply

    in this topic http://rapid-i.com/rapidforum/index.php/topic,400.msg1537.html#msg1537 they say that the same thing happens and they've used the Weka version of NaiveBayes.

    I have some questions now:

    1-in the tutorial and in the output window of rapidminer i get the message that is not recommended to use the weka version, should i use it ?or you don't really recommend it ?

    2-If i use the W-NaiveBayes i can't have negative numeric values for the attributes? because i get an error message.
    Can I change the setting in order to work with negative values?

    3- I've used the W-NaiveBayesUpdateable and I had no error messages coming out but i still get the warning message "W-NaiveBayesUpdateable: Deprecated: please use NaiveBayes instead".

    I'll check out the SVM.

    Thanks for your help.

    Greetings
    Ana Luisa
Sign In or Register to comment.