The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Extracting Emoji from tweets in tiwtter
Extracting Emoji from tweets in twitter
Hello every one .....
I need help or answer aboout if it is poosible to extrcat just emoji from the tweets in twitter which I chose it from the populer hashtages and if it is , I need the tpis please .
thanks
Tagged:
0
Answers
Cross posting everywhere will not get you the answer sooner.
I will delete the other topics.
You would need to set your encoding to the appropriate type under Preferences. For example UTF-8 will extract a lot of emoticon short codes, i.e. ": )" for :smileyhappy:
but I dont need spicific code , I am trying to check the using of emoji in tweets so I expect all the kinds of emoji , in this way I should add all the unicode of the emoji ???
thanks
If you want to do text processing and extract out the emoji's and hashtags, you'll have to transform them into something that won't be destroyed during tokenization. For example, the smiley emoji is typically represented as ": )" (space and quotes added for clarity). If you use the default tokenization settings, that will be wiped out and you won't be able to extract information from it.
What I typically do is use a few Replace operators to replace the ": )" with "smiley_face" and "#myawesomehashtag" with "hashtag_myawesomehastag." Then when you tokenize it, it will still remain in the text processing.
Hello! Let's say I have a large set of examples that includes a 'comment' attribute, and that attribute original data (.xlsx) looks like so:
What I'd like as a result is a set where the examples are unique emoji and a count of the appearances of that emoji, as found in the 'comment' attribute for all examples in the set, something like:
✨ - 1
❤️ - 5
? - 5
? - 1
? - 1
? -1
This is a data prep step for some other processing I (am pretty sure) know how to perform in RapidMiner. Note that I need to see the actual emoji as entered by the user for my use case.
I've tried a lot of Google-fu and RapidMiner trial-and-error (and more error) but have come up stumped. Any thoughts here to guide a relative newcomer? Thank you for your consideration.
hello @gjagiello - welcome to the community. I love this kind of ETL ju-jitsu. The trick that I always use in situations like this is to convert the text to UTF-8 Hex, replace to something recognizable like @Thomas_Ott suggested, and convert back. So for example if you look at your heart emoji, that gets converted using "Encode URL" into "%E2%9D%A4%EF%B8%8F" (look at data after breakpoint of Encode URL). Then I use Replace to convert to something normal, and then find word occurrences. If you have a lot of emojis, you can use a replace dictionary.
Thank you for the entertainment. I love this stuff.
Scott
[EDIT: oh sorry - if you only want a list of occurences of the emojis instead of all the tokens, you could simply filter for them only.
Scott, thanks for the reply and the great suggestion! I'm going to try this out and report back...you gave me an idea I'll share if I can get it to work. Glad you enjoy this data sparring!