Web page selection.

ratheesan · January 2010

Hi,
How can I select the contents of a particular web page using RM.I tried it with crawler,but getting more pages than I specified.

Thanks,
Ratheesan

fischer · January 2010

Hi,

the question is unclear. What exactly do you mean by "contents"? Do you want only a specific (list of) web pages? Do you want to extract information from the Web page?
Please specify?

Cheers,
Simon

ratheesan · January 2010

Hi Simon,
I want to extract information from web page.If I can copy the contents in the web page as a text file,then I will apply text mining algorithms.So now I need to copy the web page in to a text file.

Thanks
Ratheesan.

land · January 2010

Hi,
I guess you might change the "max_depth" parameter to zero. The crawler shouldn't then follow any links.

With RapidMiner 5 there will soon be a web mining extension making this more easily.

Greetings,
Sebastian

ratheesan · January 2010

Hai,

I have tried with the above method and I saved it as a text file. The saved text contains html tags and image url's etc... Is there any way to save only the texts (the text that is seen by a user when he opens a web page).

Thanks,
Ratheesan

land · January 2010

Hi,
with 5.0 this would be easy, in 4.x you can only set the TextInput to contenttype html, so that all tags are filtered out.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Web page selection.

Answers