The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
I have list of urls and data should be crawl only from that urls using xpath
Dear Team,
I am very much confused and stuck.
I have 1000 urls and i need to extract data from this 1000 urls.
I have stored 1000 urls in csv.
I also seen tutorial from http://vancouverdata.blogspot.com/2011/04/rapidminer-web-crawling-rapid-miner-web.html and http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html. It is excellent but i am not sure where i am lost to understand.
I have enable all extensions.
Do we have one video tutorial which explains process of import url and getting data.
I must learn about this and i am very much interested. please guide me.
I have been trying this from past 2days but i am missing.
I am very much confused and stuck.
I have 1000 urls and i need to extract data from this 1000 urls.
I have stored 1000 urls in csv.
I also seen tutorial from http://vancouverdata.blogspot.com/2011/04/rapidminer-web-crawling-rapid-miner-web.html and http://vancouverdata.blogspot.com/2011/04/web-scraping-rapidminer-xpath-web.html. It is excellent but i am not sure where i am lost to understand.
I have enable all extensions.
Do we have one video tutorial which explains process of import url and getting data.
I must learn about this and i am very much interested. please guide me.
I have been trying this from past 2days but i am missing.
0
Answers
I am not sure where exactly you got stuck, but if your problem is to access the urls stored in your file at first place, the Get Pages operator is for you. Just load your csv file containing the urls, then pass that data to get pages and specify in the link_attribute parameter which column contains the urls.
Best regards,
Marius
Can rapid miner do a automated regular research (say daily) of a list of words in a list of url, and get each page link?
I have a list of words and I want to regularly get every web link where any of these words appears in any of the web url from my predefined urls list.
Eg. wordlist : qwe, rty
url list: www.asd.com, www.zxc.com
What is the process path in order to get daily and automated each web link where words "qwe" and/or "rty" apear in the www.asd.com and/or www.zxc.com
Many thanks
Dan