The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Need specific data from a website in categories
Hello,
I have to collect some specific data from webshops from a website who shows about 7000 webshops in different categories.
What I need exectly is the name, e-mail adress, street, zip-code and town. Until now I allways copied/pasted the data in an excel-document. This work is horrible.
My question is now how can I do that automatic with the RapidMiner?
Thanks for your help...
I have to collect some specific data from webshops from a website who shows about 7000 webshops in different categories.
What I need exectly is the name, e-mail adress, street, zip-code and town. Until now I allways copied/pasted the data in an excel-document. This work is horrible.
My question is now how can I do that automatic with the RapidMiner?
Thanks for your help...
0
Answers
you can request the html pages using one of the operators [tt]Get Page[/tt], [tt]Get Pages[/tt] or [tt]Crawl Web[/tt] from the RM Web Mining extension. Which one works best depends on the particular website and how you can acquire the pages.
Once you have the pages in an example set or in a document, you may extract particular parts of the page using the [tt]Generate Extract[/tt] or [tt]Extract Information[/tt] operator, respectively. For that, you may have to use regular expressions or XPATH expressions - which of course depend on the content you would like to extract and the structure of the page from which you want to extract the content.
Kind regards,
Tobias