Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[Solved]Syntax Xpath
I can't find the right syntax for Xpath tot extract data.
Right now I'm experimenting in google docs to find the richt syntax. I'm trying to pull the review text from the following url: http://www.tripadvisor.nl/ShowUserReviews-g188590-d2333086-r155685828-EasyHotel_Amsterdam-Amsterdam_North_Holland_Province.html#REVIEWS
With this syntax I get one specific review: //*[@id="review_155685828"]/text()
I want to extract all re reviews on that page, but I can't find the right syntax. Does anabody knows what synatax I have to use to retreive all the review text from that page?
Next step is to use the Xpath in rapidminer.
Thanxs, Arno
Right now I'm experimenting in google docs to find the richt syntax. I'm trying to pull the review text from the following url: http://www.tripadvisor.nl/ShowUserReviews-g188590-d2333086-r155685828-EasyHotel_Amsterdam-Amsterdam_North_Holland_Province.html#REVIEWS
With this syntax I get one specific review: //*[@id="review_155685828"]/text()
I want to extract all re reviews on that page, but I can't find the right syntax. Does anabody knows what synatax I have to use to retreive all the review text from that page?
Next step is to use the Xpath in rapidminer.
Thanxs, Arno
0
Answers
This is what I was looking for but couldn't figure out myself. So thank you very much. I tried to use it in Rapidminer but i don;'t get results. Do you know what I'm doing wrong?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.008">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.000" expanded="true" height="76" name="Process Documents from Files" width="90" x="45" y="30">
<list key="text_directories">
<parameter key="All" value="C:\Improve Your Business\Qing\Pilot\test\crawl"/>
</list>
<parameter key="create_word_vector" value="false"/>
<process expanded="true">
<operator activated="true" class="text:extract_information" compatibility="5.3.000" expanded="true" height="60" name="Extract Information" width="90" x="112" y="30">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="id="REVIEWS"" value="//h:div[@id=&quot;REVIEWS"]//h:p[starts-with(@id, "review_")]/text()"/>
</list>
<list key="namespaces"/>
<list key="index_queries"/>
</operator>
<connect from_port="document" to_op="Extract Information" to_port="document"/>
<connect from_op="Extract Information" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
P.S I added h:in Rapidminer
Thanks, Arno
Much better. . The only thing is that by using the Xpath syntax of rapidminer I get 1 review and using the same syntax in Google Docs I get all 6 reviews. Do you know how that is possine?
Thanks, Arno