Sentiment Analysis using Wordnet Dictionary
Rapidminer textmining capabilities provide several methods for Sentiment Analysis. One of the popular methods when dealing with English text is using the wordnet dictionary and relevant operators from Rapidminer Wordnet Dictionary. This article gives an overview of doing sentiment analysis using Rapidminer and the Wordnet Dictionary.
Prerequisites
You will need to download and install the "Wordnet Extension" from here
You will also need the "Text Processing" Extension from here
You will need to download the wordnet dictionary from here
Setup steps for wordnet dictionary
The wordnet dictionary file is a file with extension "gz". You will need to use utility like 7Zip to extract it. Once you have the "WordNet-3.0.tar" file, you will unzip that further using the same 7Zip tool. You should then have a folder "Wordnet-3.0" with folders like dict, doc, include etc.
Once you have done this you should be ready to build a text mining process with Rapidminer and using the Wordnet Dictionary.
In the screen shot below we are searching twitter, then changing data type of the column we want to use for "text processing" and then passing the dataset(Exampleset) to "Process Documents from Data". You can replace the search twitter step with any datasource of your choice like database, excel files etc. If you would like to utilize files from a folder you can also use the "Process documents from files" or in case of email use the "process documents from mail store" operator
Then double click on the "Process documents from data" operator to build your text processing steps. You will add your standard text processing steps like tokenize, transform cases, filter stops words, filter tokens etc based on your specific needs. Then the two operators you need to get the sentiment score are "Open WordNet dictionary" and "Extract Sentiment(English) both coming from the Wordnet extension.
Configure the "Open Wordnet Dictionary" operator l
to select directory in the "resource type" parameter and then confugure the directory parameter to point to the ....\WordNet-3.0\dict folder
Please explore the additional help provided with the "Extract Sentiment(Dictionary)" operator to understand the various parameters.
You can also use tthe wordnet operators for Synonyms, Hyoernyms, Hyponyms to improve on your process.
This process adds a new column 'sentiment" that provides a numeric value for sentiment, Negative sentiment are scored less than zero and positive sentiments are code greater than zero.
One can use the sentiment score and "Generate Attributes" operator to flag documents as Positive, Neutral, Negative etc based on the actual score value itself
See the attached process for the complete example.
You can open the process in RapidMiner Studio using File(Menu) >> Import Process.
Comments
Facing following issue.. If anyone can adress that would be really great
Hi,
I also have the same problem. Hope someone can help.
Thx!
So, in the "Open WordNet Dictionary" operator, in the "directory" option, you have to put something like: "C:\...\WordNet-3.0\dict"
Note: you cannot open the wordnet dictionary IN your loop - it trys to open it multiple times and fails. Follow the instructions from @awchisholm in http://community.rapidminer.com/t5/forums/v3_1/forumtopicpage/board-id/Studio/thread-id/15219/page/4 to resolve that issue.
hi...following up on this KB. Can someone explain what "hyponyms" and "hypernyms" are with examples? I'm having a hard time getting my head around them. @bhupendra_patil? @Thomas_Ott ?
So a hyponym tries to group a word into it's higher level taxonomy. Like knife is part of cutlery. The same goes for spoon, it's part of cutlery. Here's a great example: https://en.wikipedia.org/wiki/Hyponymy_and_hypernymy
ah that's perfect - exactly what I'm looking for. Thanks, @Thomas_Ott. Have you used those operators from the Wordnet extension? I'm trying to experiment and I can load the dictionary but cannot get any kind of result. Like this:
@sgenzer yes, I've used this extension quite a bit but since I've moved machines i haven't had a chance to reinstall the Wordnet libraries. This extension is quite nice, it gives users access to some powerful sentiment capabilities but it's often overlooked and underused IMHO.
Someone knows how to solve the problem?
Hey,
Is there any way to remove certain words or terms from text contained within a Excel file then save a new version of the file with the same layout but with these words removed?
I am in the process of analysing the text content of Tweets for language analysis and I want to remove external links (https) and tags (@...) before I run it through a different software.
I have used data to documents, tokenize and delete document parts to find specific word frequencies and remove the above but I was wondering if I could then generate a new excel file with these words removed.
Thanks,
Ethan.
Hi,
I have a question concerning extract sentiment (WordNet). In the result window, I only found one sentiment score for the whole document, can I expect to find scores for each row instead of just one score for the whole document?
Thank you.