The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
the question is unclear. What exactly do you mean by "contents"? Do you want only a specific (list of) web pages? Do you want to extract information from the Web page?
Please specify?
Cheers,
Simon
I want to extract information from web page.If I can copy the contents in the web page as a text file,then I will apply text mining algorithms.So now I need to copy the web page in to a text file.
Thanks
Ratheesan.
I guess you might change the "max_depth" parameter to zero. The crawler shouldn't then follow any links.
With RapidMiner 5 there will soon be a web mining extension making this more easily.
Greetings,
Sebastian
I have tried with the above method and I saved it as a text file. The saved text contains html tags and image url's etc... Is there any way to save only the texts (the text that is seen by a user when he opens a web page).
Thanks,
Ratheesan
with 5.0 this would be easy, in 4.x you can only set the TextInput to contenttype html, so that all tags are filtered out.
Greetings,
Sebastian