Sentiment Analysis with SVM

mtd · November 2015

Hello All, I am newbie in Rapid Miner. I am trying to classify twitter data set with Linear SVM. But I got the following errors. Anyone can help me,please.

"The input Exampleset does not match the training ExampleSet. Missing Attribute: "aaronecarroll".
The operator expects the input Exampleset to have a set of Attributes which is equal or a superset of the Exampleset used for training of the input model. Please make sure that the attributes of the two examples satisfy this condition."

MartinLiebig · November 2015

Hi mtd,

are you sure you tokenized training and testing data set the same way? Have you used the word vector to assure the same words in the testing/apply phase

Best,
Martin

Elisa0815 · November 2015

Hello mtb,

I'm actually doing the same, also with twitter data

I've got the same problem when I wanted to use RapidMiner for a sentiment analysis. I guess that you use TF-IDF for preprocessing the data, right?
You need to connect the words of the testset (there's an output-point at the process-document-operator with label "words") to the operator, which preprocess the trainingset. That makes sure that the attributes that are used to train the classifier are the same attributes that are used to apply to testset.

I furthermore have a question myself about this topic. I also posted this question in another theme but maybe we can also discuss my problem here:
This solution that I mentioned works but my problem now is that I don't understand WHY I need to do that.

A classifier is in the end a mathematical function, containing of numbers and operators. After be trained, it doesn't need any attributes of the trainingset anymore, right? After training, the parameters, like C, are set, so it only needs to read the unknown X of the testset and compute the result, which is the label.
So why does it need the words of the testset?

Can someone may help me to understand that?

MartinLiebig · November 2015

Hi,

You are right for the model. But you need to tokenize your test set first. In order to do this you need to know which words need to be present in the test set so the model can be applied. If e.g. the word RapidMiner does not exsist in the test set, you still need to create the col. with 0.

Does this help?

~Martin

Elisa0815 · November 2015

Yes, that helps. Thank you very much

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Sentiment Analysis with SVM

Answers