The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
operator for removal of emotions from twitter data
Arupriya_Sen
Member Posts: 21 Contributor II
which operator am I supposed to use in order to remove emoticons or emotions from twitter data to conduct sentiment analysis? I use Rapidminer version 9.3.0
Tagged:
0
Best Answers
-
varunm1 Member Posts: 1,207 UnicornHello @Arupriya_Sen
I am not so sure, but I think tokenization removes these emoticons as they are represented in symbols with punctuations. Give it a try.
@kayman or @sgenzer any suggestions here.
ThanksRegards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5 -
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community ManagerI would convert the text to UTF-8 (use Encode URL operator) like this:
Tweet:
Before encoding:
After encoding:
Look up UTF-8 to Unicode chart (like https://apps.timwhitlock.info/emoji/tables/unicode):
So 'face with tears of joy' emoji is %F0%9F%98%82 which makes sense as you see this from the encoded text:
and so on. So then it's just a matter of using Replace with %F0%9F%98%xx with the encoded text, then decode back. Something like this:<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve TwitterSG" width="90" x="45" y="85"> <parameter key="repository_entry" value=""/> </operator> <operator activated="true" breakpoints="after" class="social_media:search_twitter" compatibility="9.3.000" expanded="true" height="82" name="Search Twitter" width="90" x="179" y="85"> <parameter key="connection_source" value="predefined"/> <parameter key="query" value="LakeShowYo watch spin"/> <parameter key="result_type" value="recent or popular"/> <parameter key="limit" value="10"/> <parameter key="filter_by_geo_location" value="false"/> <parameter key="radius_unit" value="miles"/> </operator> <operator activated="true" breakpoints="after" class="filter_example_range" compatibility="9.3.001" expanded="true" height="82" name="Filter Example Range" width="90" x="313" y="85"> <parameter key="first_example" value="1"/> <parameter key="last_example" value="1"/> <parameter key="invert_filter" value="false"/> </operator> <operator activated="true" class="web:encode_urls" compatibility="9.0.000" expanded="true" height="82" name="Encode URLs" width="90" x="447" y="85"> <parameter key="url_attribute" value="Text"/> <parameter key="encoding" value="SYSTEM"/> </operator> <operator activated="true" class="replace" compatibility="9.3.001" expanded="true" height="82" name="Replace" width="90" x="581" y="85"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Text"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="nominal"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="file_path"/> <parameter key="block_type" value="single_value"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="single_value"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="replace_what" value="[%]F0.*\w[+]"/> <parameter key="replace_by" value=""/> </operator> <operator activated="true" class="web:decode_urls" compatibility="9.0.000" expanded="true" height="82" name="Decode URLs" width="90" x="715" y="85"> <parameter key="url_attribute" value="Text"/> <parameter key="encoding" value="UTF-8"/> </operator> <connect from_op="Retrieve TwitterSG" from_port="output" to_op="Search Twitter" to_port="connection"/> <connect from_op="Search Twitter" from_port="output" to_op="Filter Example Range" to_port="example set input"/> <connect from_op="Filter Example Range" from_port="example set output" to_op="Encode URLs" to_port="example set input"/> <connect from_op="Encode URLs" from_port="example set output" to_op="Replace" to_port="example set input"/> <connect from_op="Replace" from_port="example set output" to_op="Decode URLs" to_port="example set input"/> <connect from_op="Decode URLs" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Scott5