The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Web scraping & sentiment analysis in non-English language
linn_ansved_636
Member Posts: 4 Learner III
Hi,
I'm new to Rapidminer and I'm hoping to use RapidMiner and Aylien to web scrape and perform sentiment analysis on many different news pages. The problem is that I want to gather the information from articles written in Swedish. Does anyone know if this is possible and if so, where can I find more information? I've already checked these tutorials out:
https://docs.aylien.com/textapi/rapidminer-extension/#step-3-categorizing-tweets
I've also looked at Aylien's news API, but don't know if that could help.
Would really appreciate some guidance on this!
Tagged:
0
Answers
hi @linn_ansved_636 - welcome to the community. So webscraping websites in Swedish is no problem at all. Just use the various operators in the Web Mining extension as you would do in English.
The sentiment analysis is more of an interesting question. Aylien does not appear to support native sentiment analysis in Swedish (see https://docs.aylien.com/textapi/#language-support). And it does not seem that IBM Watson Tone Analyzer does Swedish either. So if you want to use one of these tools, I'd recommend pre-processing the text through a translation engine first (although some of the "tone" will likely be inaccurate due to the translation).
Scott
Hi @linn_ansved_636,
Most of the steps in text processing are language agnostic. The only steps that are specific for a language are stop words and stemming. In both cases you can use the Filter Stopwords (Dictionary) and Stemming (Dictionary) operators with external dictionaries.
I hope that that helps!
Great, thanks for the reply! Do you know if RapidMiner has a built-in translate function? If so, I could scrape websites written in Swedish, then translate them into English, and then perform the sentiment analysis. My hope is that all of this would be able to do in RapidMiner. Any thoughts?
Thanks,
Linn
Hi,
RM itself does not yet have this build in - maybe a nice feature to add?
Maybe @koen can help?
Best,
Martin
Dortmund, Germany
there is no current built-in feature but hopefully our Google Cloud custom operators will improve over time so that we can include Google Translate. Meanwhile I did write this KB article a while back that will do the trick (albeit without an "out-of-the-box" custom operator).
https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-to-interact-with-Google-Cloud-APIs-with-the-Web-Mining/ta-p/35280
Scott
Great, thanks. Finally, do you know if there are any tutorials on how do web scrape and perform a sentiment analysis in English using RapidMiner?
/Linn
Hi again,
I'm also interested in getting a graph of how the sentiment changes over time, e.g. in may the number of positives is X, in june... etc
Any guidance?
Thanks,
/Linn
hi @linn_ansved_636 - sure lots of resources on that. Have you first checked out our YouTube channel?
https://www.youtube.com/channel/UCxneJBWWNLs-A6ckls1Rrug?view_as=subscriber
Scott