The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Text Classification using Text Plugin - NaiveBayes, Updateable Models"
This post refers to http://rapid-i.com/rapidforum/index.php/topic,368.0.html and http://rapid-i.com/rapidforum/index.php/topic,369.0.html. It adresses the problems I experienced when trying to update models.
Given I created a wordlist and saved it to disk. Then I can use StringTextInput several times, each time loading and vectorizing only a part of the database texts. I want to give the word vectors to a learner that learns to classify texts. It should be a learner that produces an updatable model. I tried NaiveBayes.
Problem 5: The UpdateModel Operator throws an error that says the corresponding model (which is a DistributionModel) is not updatable. Adding the line "public boolean isUpdatable() { return true; }" to the DistributionModel.java solved the problem. I did not find any learner/model that worked with UpdateModel without modifying the sourcecode. Did I do something wrong?
Problem 6: NaiveBayes does not work properly on my examples. It classifies all texts with the same class. I looked into the sourcecode and I think the problem is that NaiveBayes multiplies all probabilities (when handling numerical attributes). Since I used about 1600 attributes the product was probably too small for a double and was rounded to 0.
I decided to use some Bayes Operator from Weka which computes probabilities in terms of sums of logarithms instead. Again I get the problem with UpdateModel, I haven't tried if the same fix as mentiond above works again.
Given I created a wordlist and saved it to disk. Then I can use StringTextInput several times, each time loading and vectorizing only a part of the database texts. I want to give the word vectors to a learner that learns to classify texts. It should be a learner that produces an updatable model. I tried NaiveBayes.
Problem 5: The UpdateModel Operator throws an error that says the corresponding model (which is a DistributionModel) is not updatable. Adding the line "public boolean isUpdatable() { return true; }" to the DistributionModel.java solved the problem. I did not find any learner/model that worked with UpdateModel without modifying the sourcecode. Did I do something wrong?
Problem 6: NaiveBayes does not work properly on my examples. It classifies all texts with the same class. I looked into the sourcecode and I think the problem is that NaiveBayes multiplies all probabilities (when handling numerical attributes). Since I used about 1600 attributes the product was probably too small for a double and was rounded to 0.
I decided to use some Bayes Operator from Weka which computes probabilities in terms of sums of logarithms instead. Again I get the problem with UpdateModel, I haven't tried if the same fix as mentiond above works again.
Tagged:
0
Answers
Regards,
Tobias
Regards,
Daniel
Daniel, I don't know if you solved your problem but the same thing happens to me.
I'm classifying microarrays and i've tried with several data sets and what happens is that i get all the examples classified with the same class and i find that very odd.
Do you know what is goind one? I have thousand of numerical attributes too (sometimes 22500) but i checked the laplace_correction (in the tuturial says it helps)
If someone could help me out i would appreciate it very much.
All the best
Ana Luisa
I think thats a valid drawback of our current implementation of naive Bayes. The double value might not be exact enough for this many attributes. I add this to my (already long) todo list.
Espacially for Ana it could be a solution to check how a SVM with linear kernel performs. On gene expression data this works typically very well.
Greetings,
Sebastian
thanks for your soon reply
in this topic http://rapid-i.com/rapidforum/index.php/topic,400.msg1537.html#msg1537 they say that the same thing happens and they've used the Weka version of NaiveBayes.
I have some questions now:
1-in the tutorial and in the output window of rapidminer i get the message that is not recommended to use the weka version, should i use it ?or you don't really recommend it ?
2-If i use the W-NaiveBayes i can't have negative numeric values for the attributes? because i get an error message.
Can I change the setting in order to work with negative values?
3- I've used the W-NaiveBayesUpdateable and I had no error messages coming out but i still get the warning message "W-NaiveBayesUpdateable: Deprecated: please use NaiveBayes instead".
I'll check out the SVM.
Thanks for your help.
Greetings
Ana Luisa