Xpath returning ?

b00122599 · November 2019

Hey folks,

I an using Xpath for the first time with rapidminer. with the extract information operator however I keep getting "?" as the output for my attribute. I have checked in chrome that the Xpath is correct, and I've tried placing variants of h: /h: //h: in the query expression field at the start of the xpath, however no matter how I edit this field I still get ? as the result for the attribute.

Any pointers would be much appreciated.

Cheers,

Neil.

b00122599 · November 2019

Hey folks thanks for the kind replies I think I need to go do some more learning and come back with better question. Thanks again!

b00122599 · November 2019

Hey folks problem solved, excel had added formatting to my URLs when I was importing the links! All working now! Thanks again for the help!

sgenzer · November 2019

hi @b00122599 hmm I think we really need your XML that you're trying to parse, and your RapidMiner process XML.

MarcoBarradas · November 2019

Hi @b00122599 take a look at this thread.
http://https//community.rapidminer.com/discussion/14888/xpath-commands-working-in-google-docs-but-not-in-rapidminer
Next recommended steps would be to take the free courses on the RM academy
https://academy.rapidminer.com/

If you have more questions feel free to ask us. We do have the answer you are searching for but I'm trying to show you the next steps for answering all the questions that will come after you figure out how to pass the "?" the is currently bothering you.

Best regards!

b00122599 · November 2019

Hey folks,

Sorry for reopening but I'm still stuck. I am getting the correct results with Xpath in Google sheets using "//*@id="centerFrameWhite"]/p[1]/b" on the website https://www.ntfa.net/universe/english/index.php?act=view&char=Afterburner .

However I have tried this multiple different ways with Rapidminer to no success. Any help is much appreciated I tried to follow the other link above but couldn't get it working.

XML is below:

<?xml version="1.0" encoding="UTF-8"?><process version="9.5.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.5.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.5.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="136"> <parameter key="excel_file" value="D:\OneDrive\College\profilessmall.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"/> <parameter key="read_not_matching_values_as_missings" value="true"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="web:retrieve_webpages" compatibility="9.0.000" expanded="true" height="68" name="Get Pages" width="90" x="246" y="136"> <parameter key="link_attribute" value="LINKS"/> <parameter key="random_user_agent" value="false"/> <parameter key="user_agent" value="googlebot"/> <parameter key="connection_timeout" value="10000"/> <parameter key="read_timeout" value="10000"/> <parameter key="follow_redirects" value="true"/> <parameter key="accept_cookies" value="none"/> <parameter key="cookie_scope" value="global"/> <parameter key="request_method" value="GET"/> <parameter key="delay" value="none"/> <parameter key="delay_amount" value="1000"/> <parameter key="min_delay_amount" value="0"/> <parameter key="max_delay_amount" value="1000"/> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="8.2.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="136"> <parameter key="create_word_vector" value="true"/> <parameter key="vector_creation" value="TF-IDF"/> <parameter key="add_meta_information" value="true"/> <parameter key="keep_text" value="false"/> <parameter key="prune_method" value="none"/> <parameter key="prune_below_percent" value="3.0"/> <parameter key="prune_above_percent" value="30.0"/> <parameter key="prune_below_rank" value="0.05"/> <parameter key="prune_above_rank" value="0.95"/> <parameter key="datamanagement" value="double_sparse_array"/> <parameter key="data_management" value="auto"/> <parameter key="select_attributes_and_weights" value="false"/> <list key="specify_weights"/> <process expanded="true"> <operator activated="true" class="text:extract_information" compatibility="8.2.000" expanded="true" height="68" name="Extract Information" width="90" x="246" y="34"> <parameter key="query_type" value="XPath"/> <list key="string_machting_queries"/> <parameter key="attribute_type" value="Nominal"/> <list key="regular_expression_queries"/> <list key="regular_region_queries"/> <list key="xpath_queries"> <parameter key="Robotname" value="h://*[@id=&quot;centerFrameWhite"]/h:p[1]/h:b"/> </list> <list key="namespaces"/> <parameter key="ignore_CDATA" value="true"/> <parameter key="assume_html" value="true"/> <list key="index_queries"/> <list key="jsonpath_queries"/> </operator> <operator activated="true" class="web:extract_html_text_content" compatibility="9.0.000" expanded="true" height="68" name="Extract Content" width="90" x="447" y="34"> <parameter key="extract_content" value="true"/> <parameter key="minimum_text_block_length" value="500"/> <parameter key="override_content_type_information" value="true"/> <parameter key="neglegt_span_tags" value="true"/> <parameter key="neglect_p_tags" value="true"/> <parameter key="neglect_b_tags" value="true"/> <parameter key="neglect_i_tags" value="true"/> <parameter key="neglect_br_tags" value="true"/> <parameter key="ignore_non_html_tags" value="true"/> </operator> <connect from_port="document" to_op="Extract Information" to_port="document"/> <connect from_op="Extract Information" from_port="document" to_op="Extract Content" to_port="document"/> <connect from_op="Extract Content" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <connect from_op="Read Excel" from_port="output" to_op="Get Pages" to_port="Example Set"/> <connect from_op="Get Pages" from_port="Example Set" to_op="Process Documents from Data" to_port="example set"/> <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Xpath returning ?

Best Answers

Answers