"Text Classification using Text Plugin - NaiveBayes, Updateable Models"

pser · October 2008

This post refers to http://rapid-i.com/rapidforum/index.php/topic,368.0.html and http://rapid-i.com/rapidforum/index.php/topic,369.0.html. It adresses the problems I experienced when trying to update models.

Given I created a wordlist and saved it to disk. Then I can use StringTextInput several times, each time loading and vectorizing only a part of the database texts. I want to give the word vectors to a learner that learns to classify texts. It should be a learner that produces an updatable model. I tried NaiveBayes.

Problem 5: The UpdateModel Operator throws an error that says the corresponding model (which is a DistributionModel) is not updatable. Adding the line "public boolean isUpdatable() { return true; }" to the DistributionModel.java solved the problem. I did not find any learner/model that worked with UpdateModel without modifying the sourcecode. Did I do something wrong?

Problem 6: NaiveBayes does not work properly on my examples. It classifies all texts with the same class. I looked into the sourcecode and I think the problem is that NaiveBayes multiplies all probabilities (when handling numerical attributes). Since I used about 1600 attributes the product was probably too small for a double and was rounded to 0.

I decided to use some Bayes Operator from Weka which computes probabilities in terms of sums of logarithms instead. Again I get the problem with UpdateModel, I haven't tried if the same fix as mentiond above works again.

TobiasMalbrecht · October 2008

Hi,

pser wrote:

Problem 5: The UpdateModel Operator throws an error that says the corresponding model (which is a DistributionModel) is not updatable. Adding the line "public boolean isUpdatable() { return true; }" to the DistributionModel.java solved the problem. I did not find any learner/model that worked with UpdateModel without modifying the sourcecode. Did I do something wrong?

No, as far as I know the naive bayes model is the only updatable model at the moment. Unfortunately we forget to mark it as updatable by adding the appropriate method. We will add the method [tt]isUpdatable()[/tt].

pser wrote:

Problem 6: NaiveBayes does not work properly on my examples. It classifies all texts with the same class. I looked into the sourcecode and I think the problem is that NaiveBayes multiplies all probabilities (when handling numerical attributes). Since I used about 1600 attributes the product was probably too small for a double and was rounded to 0.

I decided to use some Bayes Operator from Weka which computes probabilities in terms of sums of logarithms instead. Again I get the problem with UpdateModel, I haven't tried if the same fix as mentiond above works again.

I don't know whether the problem is that the probabilities are too small, but it may be. Did you check the learners already without the [tt]ModelUpdater[/tt] but the [tt]ModelApplier[/tt]?

Regards,
Tobias

pser · October 2008

Hi Tobias,

Tobias Malbrecht wrote:

Did you check the learners already without the [tt]ModelUpdater[/tt] but the [tt]ModelApplier[/tt]?

Yes I did. In fact I did not do anything with the [tt]ModelUpdater[/tt] so far except for testing that it does not throw an error.

Regards,
Daniel

asiulana · December 2008

Hi everyone!

Daniel, I don't know if you solved your problem but the same thing happens to me.

I'm classifying microarrays and i've tried with several data sets and what happens is that i get all the examples classified with the same class and i find that very odd.

Do you know what is goind one? I have thousand of numerical attributes too (sometimes 22500) but i checked the laplace_correction (in the tuturial says it helps)

If someone could help me out i would appreciate it very much.

All the best
Ana Luisa

land · December 2008

Hi Daniel, Hi Ana Luisa,
I think thats a valid drawback of our current implementation of naive Bayes. The double value might not be exact enough for this many attributes. I add this to my (already long) todo list.
Espacially for Ana it could be a solution to check how a SVM with linear kernel performs. On gene expression data this works typically very well.

Greetings,
Sebastian

asiulana · December 2008

Hi Iand

thanks for your soon reply

in this topic http://rapid-i.com/rapidforum/index.php/topic,400.msg1537.html#msg1537 they say that the same thing happens and they've used the Weka version of NaiveBayes.

I have some questions now:

1-in the tutorial and in the output window of rapidminer i get the message that is not recommended to use the weka version, should i use it ?or you don't really recommend it ?

2-If i use the W-NaiveBayes i can't have negative numeric values for the attributes? because i get an error message.
Can I change the setting in order to work with negative values?

3- I've used the W-NaiveBayesUpdateable and I had no error messages coming out but i still get the warning message "W-NaiveBayesUpdateable: Deprecated: please use NaiveBayes instead".

I'll check out the SVM.

Thanks for your help.

Greetings
Ana Luisa

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Text Classification using Text Plugin - NaiveBayes, Updateable Models"

Answers