The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
how can i extract unique URL in a set of tweets for each user in twitter data set with rapidminer?
ramzanzadeh72
Member Posts: 14 Learner III
hi
i have twitter data set and i want to extract URLs from tweets and count unique url for each user... can i do this process in rapid miner?? how??
i share tweets send by one user from my data set
thanke you
0
Answers
Hi @ramzanzadeh72,
To extract URL from tweets you can use the Extract Entities operator from Aylien extension (to download from Marketplace and you have
to obtain an API key on the Aylien site).
However, in your case, you have to purchase a paid license because the free license is limited to 1000 examples / day.
then you can use Aggregate operator to count the unique URL by user.
and be patient....... the Extract Entities operator is very long to compute.
Here the process :
Regards,
Lionel
If you don't want to pay for the Aylien plan, you could also try to extract URLs with specific regular expressions. Search the forum for several examples of how to do this (it has been mentioned in a couple of other threads). The manual method is a bit more cumbersome but should be able to extract any URL with the standard format of http://... or https://... or www....
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
My problem is that in some tweet exist two or more url and in this case what can I do?? I need to first store urls of each user and then count unique urls, is this posible in rapidminer??
My problem is that some tweet contain two or more url and when I extract urls in this tweets and then use aggregate only first url considered, what can I do??
@ramzanzadeh72,
How mentionned by @Telcontar120, a free method to extract URLs is to use specific regular expressions.
However, I don't know if it is possible to perform what you want to do with RapidMiner's native operators.
So I propose a process with 2 branches using 2 Python scripts :
- one branch used to extract all the URLs :
- one branch to extract the URLs and count them :
In your dataset, the URLs seems to be very simple, so I choose a simple regex to extract URLs (but you can look up a better pattern
and set it in the Set Macro operator) :
Here the process :
To execute this process, you need to :
- install Python on your computer.
- install Execute Python operator (from the marketplace).
I hope it helps,
Regards,
Lionel
Hi @ramzanzadeh72,
As @Telcontar120 and @lionelderkrikor mentioned, you may want to use regular expressions to identify your matches. A few days ago I wrote about identifying and removing URL's through regular expressions here. Long story short, you can use the Replace operator to apply a regular expression. This was the final expression:
https?://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]
However, I've been playing with the most common patterns I know for at least 30 minutes now, and couldn't find a way to find everything that isn't this pattern (so you can remove the rest and get URL's only). It appears that in Java (hence, in RapidMiner) you can't use negative matching, because the idea is to actually create the pattern you want matched and then either replaceAll("") the matches or find() the next one and do something (among other methods).
Sorry I couldn't come up with a solution, but at least you know that regular expressions with pure RapidMiner might not be the place to look at to do what you want (and btw, this looks like a nice to have feature, ain't it?).
All the best,
How to get in rapidminer
For example a word
meeseg< - message
or
veeeery gooood - >very good
Does anyone know
This last post should be in a new thread.
You can use "replace token" to swap a misspelling for a correct one.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hello. I know . But my words are not fixed, and I've taken those examples. There is no way?
Hi there @jozeftomas_2020,
Please search first, as there a few posts in the Community on replacing text.
Can you please post this question in a new thread under the Getting Started Forum? This way others that have the same question will be able to find it at a later date.
Thanks,
Allie Tamulewicz