New: how can I filter Key words after getting Twitter user statues
Dear All,
First of all, please forgive my interruption, I am a 100% new for Rapidmaner, I'm doing a Twitter content analysis for an urgent paper, after obtaining data through the "Get Twitter User statures"function, I hope to continue the "Data analysis". I want to collect the corresponding topic texts by setting some Key Words in this obtained data.
But I have been searched for a long time, do not know how to operate.
Because I need to collect content posted by specific Twitter users and then look for specific topics in this data for analysis,I tested "Search Twitter", yeah, I could use query for searching different posts, but I cound't setting the specific Twitter acount. Maybe you could give me some advice or solution.
I'm waiting for your suggestions,
Thank you all,
Best regards,
Z. H
Answers
When you refer to keywords, do mean the hashtags each user tends to tweet out? Like #soda, #beer, etc? Or just in general?
I think a lot of this is going to depend on how you tokenize each status. Did you see my video and process here? http://www.neuralmarkettrends.com/use-rapidminer-discover-twitter-content/
Dear Thomas,
Yes. I want to find out some special evenments in form of hastags that each user tends to tweet, but I think it will be the same way than what you mentioned, I will check your video as soon as possible !
Thank you for sharing,
Best,
Z.H
@zhao_huang I use the Specifiy Characters parameter in the Tokenize operator and set it to
.!;:[,' ?]
That helps me preserve hashtags when tokenizing.
I read your post, and I'm trying to creat the process, but I think I do not really understand how do these things function, I'm sorry about my ignorance... Maybe I will have more questions to ask you in the furture...
hello @zhao_huang welcome to the community! I'd recommend posting your XML process here (see https://youtu.be/KkgB5QXWXJ8 and "Read Before Posting" on right when you reply) and attach your dataset. This way we can replicate what you're doing and help you better.
Scott
Dear Thomas,
Thank you for you reaction, I'm sorry about my late response, 'cause I was taking an international flight to Nairobi for my fieldwork research.
I watch your tutorial video, and I followed you to set Marcos in order to find my target tweets in three specific accounts. But, I don't know how can I run the research.
If I use your XML, there's a question: I only focus on three twitter accounts, and I just need to find intersting posts in these three accounts, if so, how can I do that ? How can I focus on three tweeter accounts with these key words? I change "search twitter for key word' to "get twitter user statue" ? Or I have to do both? And for the period, do you have some advice? Sorry about my thousands questions...
I tried to do it, but not so sure:
I copied my XML in attachment:
I look forward to hearing from you,
Best,
ZH
Dear All,
A bit desprate...
I spend all nitht on my search with rapidminer on Twitter, but failed...
Here's my need:I want to reach some tweets from a particular Twitter account and these tweets contain special phrase (such as sport OR Football OR Swimming OR Pingpang) during a particular period (between 01/01/2016-02/28/2018).
So, I tried to use this way to get tweets, but, failed... Do you have some solutions ? THANK YOU VERY MUCH !!! You will save me...
hi @zhao_huang - I just looked at your process and honestly at quick glance it looks fine EXCEPT I am almost certain you're going to hit the API quota limit for Twitter. No question. See this page for Twitter REST API rate limits for free tier users...
Scott
Well I'm glad I saw this @zhao_huang, if you want to get someone's attention in the forums, you should use the '@' symbol.
That said, you are probably @sgenzer is right, you're probably rate blocked. If that's the case you have to create a whole new Twitter connection.
Dear @Thomas_Ott and @sgenzer , thank you all for the help.
Yes, I found that Twitter has a limitation for capturing data, I cound collecting all data from three diplomatic accounts that I'm focusing on. But I'm not able to collect the data that I need from three media accounts, every time, Rapidminer told me "error on connecting to API", or I receive a part of tweets.
Do you have any solution about that ?
I found another way to collect the data that I nned, here is my code:
@zhao_huang you're trying to collect too many tweets OR the account doesn't have that many tweets available. It's throwing errors because of that.
You will be limited as to how many tweets you can extract for free, that's just how Twitter does things. You got to pay for the entire tweet history.
The work around is to start collecting tweets on a daily basis and append them into one big data file over time.
@Thomas_Ott Yes, Thomas, I have already realized that. For these three accounts, they've got lots of tweets, that's why I cound't collect these data... Twitter wants to do some business...
Dear @Thomas_Ott,
I tried to use your XML for analyzing my case, but it doesn't work, I ran for testing, but there were some potentiel for "determining influence factor" and "sort", in the ETL process, "The retweet account is unknown". So , I just want to know if you have another video for more details about these question?
And I'd like also to know how may I analyze which Twitter account has been most frequently reposted by the account that I obseved by filtering the keywords such as "RT" or "@the name of account".
Thank you !
@zhao_huang That's because you just can't attach your subprocess to the process I created and expect it to work. So you're going to have to 1) understand how the data flows through my process and 2) modify it to make it work with what you want to do.
@Thomas_Ott I'm sorry about so many questions of mine...
But when I re-run your XML, il seems have the same problems that I've met in "dertermine influence factors" and "sort"...
@zhao_huang that's not a problem, the metadata didn't propogate all the way through. It should run fine.