The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[Solved]Syntax Xpath
I can't find the right syntax for Xpath tot extract data.
Right now I'm experimenting in google docs to find the richt syntax. I'm trying to pull the review text from the following url: http://www.tripadvisor.nl/ShowUserReviews-g188590-d2333086-r155685828-EasyHotel_Amsterdam-Amsterdam_North_Holland_Province.html#REVIEWS
With this syntax I get one specific review: //*[@id="review_155685828"]/text()
I want to extract all re reviews on that page, but I can't find the right syntax. Does anabody knows what synatax I have to use to retreive all the review text from that page?
Next step is to use the Xpath in rapidminer.
Thanxs, Arno
Right now I'm experimenting in google docs to find the richt syntax. I'm trying to pull the review text from the following url: http://www.tripadvisor.nl/ShowUserReviews-g188590-d2333086-r155685828-EasyHotel_Amsterdam-Amsterdam_North_Holland_Province.html#REVIEWS
With this syntax I get one specific review: //*[@id="review_155685828"]/text()
I want to extract all re reviews on that page, but I can't find the right syntax. Does anabody knows what synatax I have to use to retreive all the review text from that page?
Next step is to use the Xpath in rapidminer.
Thanxs, Arno
0
Answers
This is what I was looking for but couldn't figure out myself. So thank you very much. I tried to use it in Rapidminer but i don;'t get results. Do you know what I'm doing wrong?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
<list key="text_directories">
<parameter key="All" value="C:\Improve Your Business\Qing\Pilot\test\crawl"/>
</list>
<parameter key="create_word_vector" value="false"/>
<process expanded="true">
<operator activated="true" class="text:extract_information" compatibility="5.3.000" expanded="true" height="60" name="Extract Information" width="90" x="112" y="30">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="id="REVIEWS"" value="//h:div[@id=&quot;REVIEWS"]//h:p[starts-with(@id, "review_")]/text()"/>
</list>
<list key="namespaces"/>
<list key="index_queries"/>
</operator>
<connect from_port="document" to_op="Extract Information" to_port="document"/>
<connect from_op="Extract Information" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
P.S I added h:in Rapidminer
Thanks, Arno
Much better. . The only thing is that by using the Xpath syntax of rapidminer I get 1 review and using the same syntax in Google Docs I get all 6 reviews. Do you know how that is possine?
Thanks, Arno