The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Sentiment Analysis with SVM
Hello All, I am newbie in Rapid Miner. I am trying to classify twitter data set with Linear SVM. But I got the following errors. Anyone can help me,please.
"The input Exampleset does not match the training ExampleSet. Missing Attribute: "aaronecarroll".
The operator expects the input Exampleset to have a set of Attributes which is equal or a superset of the Exampleset used for training of the input model. Please make sure that the attributes of the two examples satisfy this condition."
"The input Exampleset does not match the training ExampleSet. Missing Attribute: "aaronecarroll".
The operator expects the input Exampleset to have a set of Attributes which is equal or a superset of the Exampleset used for training of the input model. Please make sure that the attributes of the two examples satisfy this condition."
0
Answers
are you sure you tokenized training and testing data set the same way? Have you used the word vector to assure the same words in the testing/apply phase
Best,
Martin
Dortmund, Germany
I'm actually doing the same, also with twitter data
I've got the same problem when I wanted to use RapidMiner for a sentiment analysis. I guess that you use TF-IDF for preprocessing the data, right?
You need to connect the words of the testset (there's an output-point at the process-document-operator with label "words") to the operator, which preprocess the trainingset. That makes sure that the attributes that are used to train the classifier are the same attributes that are used to apply to testset.
I furthermore have a question myself about this topic. I also posted this question in another theme but maybe we can also discuss my problem here:
This solution that I mentioned works but my problem now is that I don't understand WHY I need to do that.
A classifier is in the end a mathematical function, containing of numbers and operators. After be trained, it doesn't need any attributes of the trainingset anymore, right? After training, the parameters, like C, are set, so it only needs to read the unknown X of the testset and compute the result, which is the label.
So why does it need the words of the testset?
Can someone may help me to understand that?
You are right for the model. But you need to tokenize your test set first. In order to do this you need to know which words need to be present in the test set so the model can be applied. If e.g. the word RapidMiner does not exsist in the test set, you still need to create the col. with 0.
Does this help?
~Martin
Dortmund, Germany