SVM Extract keywords used for sentiment

asav_yu · November 2018

Good afternoon,

Hopefully somebody can help. I am playing around with sentiment analysis using SVM and results are very promising. My question is how can I easily extract a list of words from the document that I score to see exactly why the sentiment is negative or positive.

Example: I score a 100 word paragraph I want to see all keywords that SVM identified as being important. It would be great to have count as well, for example "bad" 4 times, "poor" 3 times.

Any advice is very much appreciated.

B00100719 · November 2018

Assuming you have used the 'Tokenize' operator, 'filter stopwords', 'transform cases', maybe also 'filter by length' experiment with and without 'stem' (probably not useful for such a small document), be sure to check "Create Word Vector" when using the 'Process Documents From Data Opeartor" which contains all these operators and also set the lower and upper pruning on that operator too - also requires experiments with different values. The interesting words will likely be those that appear a medium number of times.

Telcontar120 · November 2018

You can also build your SVM model using the tokenized words and then use the Explain Predictions operator afterwards, which will help identify the terms that are most strongly associated with the label prediction for different groups of examples.

MartinLiebig · November 2018

Hi,
in a linear svm you can also use the attribute weights which are delivered as a measure for the importance of a word for the overall descision.
Best,
MArtin

HeikoeWin786 · July 2020

Hello there,

Can I use the word list I generated after pre-processed document from data operator as an input for SVM operator?
I am having a dataset which label is binominal and review text is polynominal. I am not sure which column I need to convert to numerical to work with SVM? Sentiment label column or cutomer review text column?

Thanks much in advance.

HeikoeWin786 · July 2020

@Telcontar120
Hi,

Could you please kindly explain the mentioned approach?
I would like to use SVM to extract the aspects that are associated with label (e.g. aspect = 'service', label = positive) for each examples in the dataset.
I am having an issue with inputting my dataset as the training dataset. It said SVM cannot accept polynomial data. However, I have 3 columns in the dataset i.e. airlines, customer review and sentiment. Could you please advise how I can transform this dataset to work with SVM? do I need to transform nominal to numeric for all 3 columns? for my data pre-processing, I am only processing the customer review by setting it as nominal to text.
Could you please advise what I am missing here?

thanks.

Telcontar120 · July 2020

To really understand what you need to do, I think you need to look over some of the text mining tutorials from the RapidMiner academy. Basically it sounds like you are going to want to process the text of your reviews and produce word vectors to then predict the sentiment, which you will set as your label. When you do the text processing, it will become numerical through the word vector representation.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

SVM Extract keywords used for sentiment

Best Answers

Answers