The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Get Pages with Pagination
Legacy User
Member Posts: 0 Newbie
Hi everyone,
I am desperatly trying to crawl a list of Web Sites that all have a different number of Pages.
I read the Tutoriaal on http://www.simafore.com/blog/bid/112223/Text-mining-How-to-fine-tune-job-searches-using-web-crawling-2-of-4 and I can now store and process multiple Pages of one URL.
In the next step I want to process a list of URLs using the Get Pages Operator but I can't get it to also process the pages of these sites?
I know this is a probably hard to understand, so here an example ;-)
I want to extract Customer Reviews from yelp.com. For Example: http://www.yelp.com/biz/hertz-san-francisco-9
This site has 5 Pages with 40 reviews each. Using a Loop Operator I am capable of extracting all these Reviews. So far so good. ;-)
But how can I crawl multiple URLs with multiple pages each? For Example :
http://www.yelp.com/biz/hertz-san-francisco-9
http://www.yelp.com/biz/hertz-philadelphia
As you will see, I already tried to work with macros collecting the number of pages for each URL but I am missing something.
Any help would be greatly appreciated
Thank you ;D
I am desperatly trying to crawl a list of Web Sites that all have a different number of Pages.
I read the Tutoriaal on http://www.simafore.com/blog/bid/112223/Text-mining-How-to-fine-tune-job-searches-using-web-crawling-2-of-4 and I can now store and process multiple Pages of one URL.
In the next step I want to process a list of URLs using the Get Pages Operator but I can't get it to also process the pages of these sites?
I know this is a probably hard to understand, so here an example ;-)
I want to extract Customer Reviews from yelp.com. For Example: http://www.yelp.com/biz/hertz-san-francisco-9
This site has 5 Pages with 40 reviews each. Using a Loop Operator I am capable of extracting all these Reviews. So far so good. ;-)
But how can I crawl multiple URLs with multiple pages each? For Example :
http://www.yelp.com/biz/hertz-san-francisco-9
http://www.yelp.com/biz/hertz-philadelphia
As you will see, I already tried to work with macros collecting the number of pages for each URL but I am missing something.
Any help would be greatly appreciated
Thank you ;D
0
Answers
any update on this topic?
Normally you will need to find the root URL of the next page and then write a regular expression for it.