The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Sentiment Analysis with SVM

mtdmtd Member Posts: 2 Contributor I
edited November 2018 in Help
Hello All, I am newbie in Rapid Miner. I am trying to classify twitter data set with Linear SVM. But I got the following errors. Anyone can help me,please.

"The input Exampleset does not match the training ExampleSet. Missing Attribute: "aaronecarroll".
The operator expects the input Exampleset to have a set of Attributes which is equal or a superset of the Exampleset used for training of the input model. Please make sure that the attributes of the two examples satisfy this condition."

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi mtd,

    are you sure you tokenized training and testing data set the same way? Have you used the word vector to assure the same words in the testing/apply phase

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Elisa0815Elisa0815 Member Posts: 10 Contributor II
    Hello mtb,

    I'm actually doing the same, also with twitter data :)

    I've got the same problem when I wanted to use RapidMiner for a sentiment analysis. I guess that you use TF-IDF for preprocessing the data, right?
    You need to connect the words of the testset (there's an output-point at the process-document-operator with label "words") to the operator, which preprocess the trainingset. That makes sure that the attributes that are used to train the classifier are the same attributes that are used to apply to testset.


    I furthermore have a question myself about this topic. I also posted this question in another theme but maybe we can also discuss my problem here:
    This solution that I mentioned works but my problem now is that I don't understand WHY I need to do that.

    A classifier is in the end a mathematical function, containing of numbers and operators. After be trained, it doesn't need any attributes of the trainingset anymore, right? After training, the parameters, like C, are set, so it only needs to read the unknown X of the testset and compute the result, which is the label.
    So why does it need the words of the testset?

    Can someone may help me to understand that?
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,

    You are right for the model. But you need to tokenize your test set first. In order to do this you need to know which words need to be present in the test set so the model can be applied. If e.g. the word RapidMiner does not exsist in the test set, you still need to create the col. with 0.

    Does this help?

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Elisa0815Elisa0815 Member Posts: 10 Contributor II
    Yes, that helps. Thank you very much :)
Sign In or Register to comment.