The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Feature selection and SVM modeling for text classification"
Hi!
I'm working on a project of text classification with Rapid Miner. I want to predict the category of some questions,so I have a polynominal label(with 4 classes) as target attribute.
I tried to do modelization with Naive Bayes and the process works in a good way,but I have some questions:
1)My dataset(that is divided in training and test set with cross-validation) is composed of .txt documents. Now all words of these documents are single attributes("features") used for modelization. I've tried "Select attribute" for improving my selection of the features but I think that is quite limitated.
I would want to select part of speech or bigrams as features.. Do you know some instruments or extension that permit this selection of features?
2)I can't use Support Vectore Machines methods because they don't support polynominal labels. I think that is very strange because SVM are good methods for text classification. How can I use them for multi-class prediction?
Thank you for the help.
Stefano.
I'm working on a project of text classification with Rapid Miner. I want to predict the category of some questions,so I have a polynominal label(with 4 classes) as target attribute.
I tried to do modelization with Naive Bayes and the process works in a good way,but I have some questions:
1)My dataset(that is divided in training and test set with cross-validation) is composed of .txt documents. Now all words of these documents are single attributes("features") used for modelization. I've tried "Select attribute" for improving my selection of the features but I think that is quite limitated.
I would want to select part of speech or bigrams as features.. Do you know some instruments or extension that permit this selection of features?
2)I can't use Support Vectore Machines methods because they don't support polynominal labels. I think that is very strange because SVM are good methods for text classification. How can I use them for multi-class prediction?
Thank you for the help.
Stefano.
Tagged:
0
Answers
1) You could try to play around with the tokenize operator.
2) LibSVM supports polynominal labels.
Regards,
Marco
I have tried libSVM,I'm not sure that is reliable..
It supports polynominal label but in my project executes the prediction with an accuracy of 100%.Probably it is too high.
I've tried making a prediction with the standard Support Vector Machine,with a binominal label and only two classes,and the accuracy was 96%. So I think it is strange..