The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Is it possible to crawl the links on the "IBM Watson News Explorer"?
jonas_boersch
Member Posts: 1 Learner I
Hello Community,
I can't manage to crawl the links to the news articles on IBM Watson's News Explorer. The operator "crawl web" just stops after crawling the header of the web page, the links to the articles are in the "details" window on the left side of the web page.
Can someone help me find a solution, I would be very thankful. The link to the webpage is: http://news-explorer.mybluemix.net/?query=ipcc&type=unconstrained
Kind regards,
Jonas
I can't manage to crawl the links to the news articles on IBM Watson's News Explorer. The operator "crawl web" just stops after crawling the header of the web page, the links to the articles are in the "details" window on the left side of the web page.
Can someone help me find a solution, I would be very thankful. The link to the webpage is: http://news-explorer.mybluemix.net/?query=ipcc&type=unconstrained
Kind regards,
Jonas
Tagged:
0
Best Answer
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 UnicornGood Sir @jonas_boersch, I deeply apologise to inform you that your requirement is currently not feasible to achieve with the current RapidMiner tooling, because the operators developed for "Get Page" and "Crawl Web" were developed before the proliferation of JavaScript-built, API-driven websites with Vue.js, Angular.js, Ember.js or React.js. The sun has not set and to my knowledge there are two other choices:
- Explore the code and find the original data sources. Seems feasible to find the REST servers on the IBM Watson's code, after a quick inspection I have made for you
- Use the Selenium Web Browser, a headless Web browser that obtains the entire code and then gets the page. I would call this the hard way, because it is not easy to set up but worth the time if you retrieve pages frequently.
Have a good day,
Rodrigo.1
Answers
Scott