Analyse sentiment process
Dear all,
There are nuggets in our community and it is an excellent idea to share them !
I allow my self to inaugurate this new "ideas section" with a fictive and experimental sentiment analysis process :
Let me explain this process and the associated results :
1. Object :
The goal of this study is to perform a sentiment analysis of 40 tweets including the keyword « Iphone ».
2. Tools :
For this study, we will use « RapidMiner » 9.0 (Beta) and its extension « AYLIEN Text Analysis » 0.20.
3. RapidMiner's process :
The RapidMiner’s process execute the following actions :
- Recovery of tweets including the keyword « Iphone » using the Search Twitter operator.
- Analysis of the sentiment of each tweet using the Analyze Sentiment operator of the AYLIEN extension. This operator determine the sentiment polarity of each tweet according to its content (negative, neutral, positive).
- Creation of a words matrix using the Process Documents from Data. In this step, tweets are tokenized, stop-words are filtered, the words are filtered by lenght (we only retain words whose number of characters is greater than 4).
- Aggregation of the words according to their sentiment polarity using the Aggregate operator inside a Loop Attribute subprocess.
- Sort and filtering of the 5 top words for each sentiment polarity value (negative, neutral, positive) using Sort and Filter Example Range operators.
4. Results
4.1 Sentiment analysis
We can see that for the 40 tweets :
- 4 tweets are considered as « negative ».
- 28 tweets are considered as « neutral ».
- 8 tweets are considered as « positive ».
If we add the neutral and positive tweets, we can consider that 90 % of the tweets express an « overall positive » opinion of the Iphone.
4.2 Top words :
We sorted and filtered the 5 top words for each sentiment polarity value :
We can see that the words « Apple » and « Iphone » are in the 5 top words of the neutral and positive exprimed tweets.
5. Conclusions
After performing the sentiment analysis and the study of the 5 top words of each sentiment, we can conclude that the Tweeter members have an « overall positive » sentiment of the brand "Apple" and its flagship product the « Iphone ».
However we saw that in the most frequent words in the negative tweets are « word », « auto », « correctly », « corrects ». We can interpret that the Iphone’s users are not totally satisfied of the auto-correction tool of the Apple’s smartphone.
So we recommend to Apple to work on the improvement of this tool and to update the firmware of the Iphone.
Beyond this challenge of "the best of the best", I hope that this process will inspire/help RapidMiner's users which have similar
problematics/tasks...
Best regards,
Lionel
NB : The process :
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.000-BETA" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="social_media:search_twitter" compatibility="9.0.000-BETA" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="136">
<parameter key="connection" value="dkk"/>
<parameter key="query" value="iphone"/>
<parameter key="limit" value="40"/>
<parameter key="language" value="en"/>
</operator>
<operator activated="true" class="replace" compatibility="9.0.000-BETA" expanded="true" height="82" name="Replace" width="90" x="179" y="136">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Text"/>
<parameter key="replace_what" value="@|#"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Select Attributes" width="90" x="313" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Text"/>
</operator>
<operator activated="true" class="com.aylien.textapi.rapidminer:aylien_sentiment" compatibility="0.2.000" expanded="true" height="68" name="Analyze Sentiment" width="90" x="447" y="136">
<parameter key="connection" value="Aylien_dkk"/>
<parameter key="input_attribute" value="Text"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.0.000-BETA" expanded="true" height="103" name="Multiply (2)" width="90" x="581" y="136"/>
<operator activated="true" class="nominal_to_text" compatibility="9.0.000-BETA" expanded="true" height="82" name="Nominal to Text" width="90" x="715" y="136">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Text"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="916" y="136">
<parameter key="vector_creation" value="Term Occurrences"/>
<parameter key="keep_text" value="true"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="313" y="34"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="8.1.000" expanded="true" height="68" name="Filter Stopwords (English)" width="90" x="447" y="34"/>
<operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="581" y="34"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:data_to_documents" compatibility="8.1.000" expanded="true" height="68" name="Data to Documents" width="90" x="1050" y="136">
<list key="specify_weights"/>
</operator>
<operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="1184" y="136">
<parameter key="text_attribute" value="Category"/>
</operator>
<operator activated="true" class="concurrency:loop_attributes" compatibility="8.2.000" expanded="true" height="103" name="Loop Attributes" width="90" x="1318" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value="AMOLED"/>
<parameter key="attributes" value="Category|polarity_confidence|subjectivity|subjectivity_confidence|polarity|Id|text"/>
<parameter key="invert_selection" value="true"/>
<process expanded="true">
<operator activated="true" class="handle_exception" compatibility="9.0.000-BETA" expanded="true" height="82" name="Handle Exception" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="aggregate" compatibility="8.2.000" expanded="true" height="82" name="Aggregate" width="90" x="112" y="34">
<list key="aggregation_attributes">
<parameter key="%{loop_attribute}" value="sum"/>
</list>
<parameter key="group_by_attributes" value="polarity"/>
</operator>
<connect from_port="in 1" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="transpose" compatibility="9.0.000-BETA" expanded="true" height="82" name="Transpose" width="90" x="447" y="34"/>
<operator activated="true" class="rename_by_example_values" compatibility="9.0.000-BETA" expanded="true" height="82" name="Rename by Example Values" width="90" x="581" y="34"/>
<connect from_port="input 1" to_op="Handle Exception" to_port="in 1"/>
<connect from_op="Handle Exception" from_port="out 1" to_op="Transpose" to_port="example set input"/>
<connect from_op="Transpose" from_port="example set output" to_op="Rename by Example Values" to_port="example set input"/>
<connect from_op="Rename by Example Values" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="source_input 3" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
<portSpacing port="sink_output 3" spacing="0"/>
</process>
</operator>
<operator activated="true" class="append" compatibility="9.0.000-BETA" expanded="true" height="82" name="Append" width="90" x="1452" y="136"/>
<operator activated="true" class="parse_numbers" compatibility="9.0.000-BETA" expanded="true" height="82" name="Parse Numbers" width="90" x="1586" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="positive|neutral|negative"/>
</operator>
<operator activated="true" class="multiply" compatibility="9.0.000-BETA" expanded="true" height="124" name="Multiply" width="90" x="1720" y="136"/>
<operator activated="true" class="sort" compatibility="9.0.000-BETA" expanded="true" height="82" name="Sort (3)" width="90" x="1854" y="238">
<parameter key="attribute_name" value="negative"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="sort" compatibility="9.0.000-BETA" expanded="true" height="82" name="Sort (2)" width="90" x="1854" y="136">
<parameter key="attribute_name" value="neutral"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="sort" compatibility="9.0.000-BETA" expanded="true" height="82" name="Sort" width="90" x="1854" y="34">
<parameter key="attribute_name" value="positive"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="9.0.000-BETA" expanded="true" height="82" name="Negative" width="90" x="1988" y="238">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="5"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="9.0.000-BETA" expanded="true" height="82" name="Neutral" width="90" x="1988" y="136">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="5"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="9.0.000-BETA" expanded="true" height="82" name="Positive" width="90" x="1988" y="34">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="5"/>
</operator>
<connect from_op="Search Twitter" from_port="output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Analyze Sentiment" to_port="Example Set"/>
<connect from_op="Analyze Sentiment" from_port="Example Set" to_op="Multiply (2)" to_port="input"/>
<connect from_op="Multiply (2)" from_port="output 1" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Multiply (2)" from_port="output 2" to_op="Loop Attributes" to_port="input 2"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Data to Documents" to_port="example set"/>
<connect from_op="Data to Documents" from_port="documents" to_op="Documents to Data" to_port="documents 1"/>
<connect from_op="Documents to Data" from_port="example set" to_op="Loop Attributes" to_port="input 1"/>
<connect from_op="Loop Attributes" from_port="output 1" to_op="Append" to_port="example set 1"/>
<connect from_op="Loop Attributes" from_port="output 2" to_port="result 4"/>
<connect from_op="Append" from_port="merged set" to_op="Parse Numbers" to_port="example set input"/>
<connect from_op="Parse Numbers" from_port="example set output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Sort" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 2" to_op="Sort (2)" to_port="example set input"/>
<connect from_op="Multiply" from_port="output 3" to_op="Sort (3)" to_port="example set input"/>
<connect from_op="Sort (3)" from_port="example set output" to_op="Negative" to_port="example set input"/>
<connect from_op="Sort (2)" from_port="example set output" to_op="Neutral" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Positive" to_port="example set input"/>
<connect from_op="Negative" from_port="example set output" to_port="result 3"/>
<connect from_op="Neutral" from_port="example set output" to_port="result 2"/>
<connect from_op="Positive" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>
Comments
great addition! Thanks @lionelderkrikor!
Thanks so much for sharing the explanation and this example with the community. It's really helpful. I'm pretty new to RM and tend to learn best by tinkering and tearing apart. That said, I was playing around with the file you created in the community examples section in RM. Aside from changing the two suggested parameters, "Aylien and twitter" the template seems to be kicking out an 'input is missing' error. I was wondering if you might be able to provide some insight with how to go about fixing this.
Thanks so much!
-js
I'm not able to reproduce the error you encounred
Can you please share your process ?
Regards,
Lionel
OK I understood : It's an inversion of the link between Subprocess (2) and Subprocess (3).
Here the updated process :
Regards,
Lionel
NB : @sgenzer can you update the process in the community directory (Community Samples /Community Data Science / Sentiment Analysis) thanks you
Thanks for this! I greatly appreciate the quick response and fix.
-js