The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Supervised Sentiment Analysis - Removing @
Hi there,
I'm currently working on doing a supervised sentiment analysis with Instagram comments. One of the issues I'm having is that there are a lot of comment replies, which start by mentioning the name of the person that the reply is directed at.
So one person comments on something and another person replies to this comment by starting their reply with @nameofthecommenter . This name though, by being part of the excel sheet and thus the data I'm taking into consideration, is being taken into the analysis and is thus influencing the outcome of it, because the name is also being rated. I know that I can remove whole cells containing an @ , but that would also remove the rest of the comment and thus valuable data.
Is there any way to only remove what follows the @ right away, thus only removing the name of the person that is being replied to, without deleting the whole comment?
Thanks in advance!
Anna May
I'm currently working on doing a supervised sentiment analysis with Instagram comments. One of the issues I'm having is that there are a lot of comment replies, which start by mentioning the name of the person that the reply is directed at.
So one person comments on something and another person replies to this comment by starting their reply with @nameofthecommenter . This name though, by being part of the excel sheet and thus the data I'm taking into consideration, is being taken into the analysis and is thus influencing the outcome of it, because the name is also being rated. I know that I can remove whole cells containing an @ , but that would also remove the rest of the comment and thus valuable data.
Is there any way to only remove what follows the @ right away, thus only removing the name of the person that is being replied to, without deleting the whole comment?
Thanks in advance!
Anna May
Tagged:
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornThis appears to be the same question as related in this other thread: https://community.rapidminer.com/discussion/58087/removing-mentions-with-and-emojis-from-excel-data#latest
I think the solution should work here as well.5
Answers
there are multiple ways to achieve this.
Do you have the data in a table, with the comment in one nominal column? In that case, use Blending/Values/Replace with a regular expression. E. g. you would replace "^\@[a-zA-Z0-9]+ *" (without the quotes) by nothing. This expression means:
^ Begin of the string
\@ The at sign, escaped with a backslash to remove any special meaning
[a-zA-Z0-9_]+ One or more of the mentioned character classes, following the @ sign.
* Zero or more spaces (so the remaining text won't start with a space)
The regular expression editor window has a drop-down menu with hints for these and other parts of regular expressions.
You can leave the replacement empty, because you replace the user name with nothing.
If you work with already tokenized data (split to single words), you can use Replace Tokens with the same regular expression.
Best regards,
Balázs
thanks a lot for your reply. I have tried your suggestions but they sadly didn't work for me. Not sure whether I did it the right way.
I have attached my process as well as the raw data.
The goals I'm trying to achieve is:
-remove any word (not the whole row) starting with "@".
-remove empty rows
-remove duplicates
-remove emojis (right now, with this process I ended up with question marks instead of the emojis as output, so I'd rather remove the emojis right away)
Do you have any input for me as to how to achieve that?
Have a lovely day!
Kind regards
Anna May