Web crawling of https pages - not working by using

Move_on2 · March 2019

Hey everbody of the community :-)

I have just started to use RapidMiner and now I would like to crawl the www by using the web crawling process in RapidMiner 9.2

Unfortunately I do not get any results.

I have tried it by crawling the URL https:xxxx ( I am not allowed yet, to include links yet, got an error message even posting here in the community) the URL can be found in the attachment.

Did I do any input in a wrong way or are there missing input value's?

In some user communities I found out, that the web crawler in RapidMiner is not working for https URL's, is that correct?

Is there any work around available?

Thanks a lot for your kind support in advance. I am really eager to learn the usage of RapidMiner and I am curious to find results.

Tanja @Move_on2

varunm1 · March 2019

Hello @Move_on2

There is a similar question asked recently in this community. Here are the threads for a workaround provided by @Telcontar120 . Please take a look at the below links.

https://community.rapidminer.com/discussion/54662/how-can-i-crawl-more-than-one-web-page
https://community.rapidminer.com/discussion/54656/crawl-web-operator-does-not-return-any-results

Move_on2 · March 2019

Hey @varunm1

thanks for the links, the first one I have not found so far. Will try the loop using get pages, too.

sgenzer · March 2019

This is a known issue. Please see https://community.rapidminer.com/discussion/54662/how-can-i-crawl-more-than-one-web-page

Scott

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Web crawling of https pages - not working by using

Answers

Be Safe. Follow precautions and Maintain Social Distancing