The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to read a text in Rapid Miner after
Hi,
I am new to Rapid miner, I have a requirement parse a web page ( able to do), then read a content after certain word for e.g web page contains data
Heading
Paragraph 1
Automobiles stocks are A1,A2,A3,A4.
I want to read A1,A2,A3,A4 which comes after string "Automobiles stocks are"
Please help!!
Thanks
I am new to Rapid miner, I have a requirement parse a web page ( able to do), then read a content after certain word for e.g web page contains data
Heading
Paragraph 1
Automobiles stocks are A1,A2,A3,A4.
I want to read A1,A2,A3,A4 which comes after string "Automobiles stocks are"
Please help!!
Thanks
Tagged:
0
Answers
If you have no experience in XPath (it's less complex as it looks) you have the option to use regex in combination with generate attributes.
In both cases you just open your crawled webpage with the read document operator, for xpath you keep the tags, for regex you might be better off with selecting 'text only' in the operators settings.
Thanks a lot for your response, positionig of paragraph can change so i feel using regex is better option, please let me know which ETL stage shall I use to impliment this regex and attributes so that i can fetch the required information.
Thanks
Read url -> Read document (extract text only) -> Documents to data -> generate attributes (using regex)
Get Page --> Read Document but getting an error "Expected File Object but received Document, please help also not using read url as it is expecting a csv file with comma separated values. Please correct my understanding If I am wrong
If you already have a webpage in document format you can skip the first step and attach it directly to Documents to Data.
Thanks for revert
I have created a process
Read Excel-> Get Pages -> Data to Doc -> Documents to Data -> Generate attrbute
Can you please let me know which attribute will have html body (content of webpage) so that i can parse the same.
Regards,
Karun
1 URL
2. Response Code
3 Response Message
......
But not able to find attribute that has html body(Content of We Page) to parse the data
I am able to read the attributes now, lastly please help me on regex front in terms of how to get data between two words in data miner
Word 1 : " Automobile stocks are"
Word 2 : "."
Thanks
(?s)^.*stocks are (.*?)\..*$
So start at the beginning, ignore whitespace and linebreaks until you find 'stocks are' and then keep everything until the first dot.
Please help.
Regards,
Karun
I have tried using
1. Where(.*)Learn
2. (?=Where).*(?=Learn)
to fetch string Where developers learn but no luck
Please help