The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Cleaning twitter data
I'm new to RapidMiner, and I am struggling to understand how the Filter commands can be used to clean up twitter feeds. I am importing these from a CSV file and am trying to create sub-processes within the process documents operator to remove twitter handles (@), RT and hashtags. I have tried for example to use Filter Tokens by Content specifying that the condition is contains the string @. Although the process runs without errors I cannot see in the results that the twitter handles were removed. Can anybody please advise on how to go about cleaning up the data?
Tagged:
0
Answers
When you load in the tweets from CSV they will come in as a Nominal datatype. To use the Filter Tokens by Content, you would need to convert those tweets into a Text data type via a Nominal to Text operator.
Here's a sample using the Search Twitter operator that does some cleaning.