The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Web crawling a difficult webpage (Airbnb)
Hello,
I need to webscrap Airbnb webpage. I need to get all the punctuations from all the acommodations in a city ("Veracidad":5,"Comunicacion":5, etc.).
First, I thought about getting all the urls for all the acommodations in a city, for example . Then make the web crawler do the scraping to all those links and get the individual punctuations.
But when I use a max crawl depth of 1 with the url in the example link I don't get the acommodations' urls ...
Could you help me, please? :womanhappy:
Tagged:
0
Answers
Hello @21763289 please note that webscraping commercial websites is generally illegal and/or violates the Terms of Service of these companies. Here is the specific language from airbnb.com:
(source: https://www.airbnb.com/terms)
I STRONGLY advise any RapidMiner users to please check the Terms of Service of any website when using our software or any other means of webscraping.
Scott
Ok, thanks, I understand.
So, if someone'd want to reply me privately about how to do it hypothetically... It is just for doing a research for my university.
Hi @21763289,
Have you checked if you can do it legally through the AirBnB API? Looks like they do have one:
https://www.airbnb.com/partner?c=tumblr&af=746240
I haven't worked with it, but this might be a good beginning.
All the best,
Rodrigo.
Nice idea. Thanks!!