The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Xpath returning ?
Hey folks,
I an using Xpath for the first time with rapidminer. with the extract information operator however I keep getting "?" as the output for my attribute. I have checked in chrome that the Xpath is correct, and I've tried placing variants of h: /h: //h: in the query expression field at the start of the xpath, however no matter how I edit this field I still get ? as the result for the attribute.
Any pointers would be much appreciated.
Cheers,
Neil.
I an using Xpath for the first time with rapidminer. with the extract information operator however I keep getting "?" as the output for my attribute. I have checked in chrome that the Xpath is correct, and I've tried placing variants of h: /h: //h: in the query expression field at the start of the xpath, however no matter how I edit this field I still get ? as the result for the attribute.
Any pointers would be much appreciated.
Cheers,
Neil.
Tagged:
0
Best Answers
-
b00122599 Member Posts: 26 Contributor IIHey folks thanks for the kind replies I think I need to go do some more learning and come back with better question. Thanks again!1
-
b00122599 Member Posts: 26 Contributor IIHey folks problem solved, excel had added formatting to my URLs when I was importing the links! All working now! Thanks again for the help!2
Answers
http://https//community.rapidminer.com/discussion/14888/xpath-commands-working-in-google-docs-but-not-in-rapidminer
Next recommended steps would be to take the free courses on the RM academy
https://academy.rapidminer.com/
If you have more questions feel free to ask us. We do have the answer you are searching for but I'm trying to show you the next steps for answering all the questions that will come after you figure out how to pass the "?" the is currently bothering you.
Best regards!
Sorry for reopening but I'm still stuck. I am getting the correct results with Xpath in Google sheets using "//*@id="centerFrameWhite"]/p[1]/b" on the website https://www.ntfa.net/universe/english/index.php?act=view&char=Afterburner .
However I have tried this multiple different ways with Rapidminer to no success. Any help is much appreciated I tried to follow the other link above but couldn't get it working.
XML is below:
<?xml version="1.0" encoding="UTF-8"?><process version="9.5.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.5.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="read_excel" compatibility="9.5.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="136"> <parameter key="excel_file" value="D:\OneDrive\College\profilessmall.xlsx"/> <parameter key="sheet_selection" value="sheet number"/> <parameter key="sheet_number" value="1"/> <parameter key="imported_cell_range" value="A1"/> <parameter key="encoding" value="SYSTEM"/> <parameter key="first_row_as_names" value="true"/> <list key="annotations"/> <parameter key="date_format" value=""/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="locale" value="English (United States)"/> <parameter key="read_all_values_as_polynominal" value="false"/> <list key="data_set_meta_data_information"/> <parameter key="read_not_matching_values_as_missings" value="true"/> <parameter key="datamanagement" value="double_array"/> <parameter key="data_management" value="auto"/> </operator> <operator activated="true" class="web:retrieve_webpages" compatibility="9.0.000" expanded="true" height="68" name="Get Pages" width="90" x="246" y="136"> <parameter key="link_attribute" value="LINKS"/> <parameter key="random_user_agent" value="false"/> <parameter key="user_agent" value="googlebot"/> <parameter key="connection_timeout" value="10000"/> <parameter key="read_timeout" value="10000"/> <parameter key="follow_redirects" value="true"/> <parameter key="accept_cookies" value="none"/> <parameter key="cookie_scope" value="global"/> <parameter key="request_method" value="GET"/> <parameter key="delay" value="none"/> <parameter key="delay_amount" value="1000"/> <parameter key="min_delay_amount" value="0"/> <parameter key="max_delay_amount" value="1000"/> </operator> <operator activated="true" class="text:process_document_from_data" compatibility="8.2.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="136"> <parameter key="create_word_vector" value="true"/> <parameter key="vector_creation" value="TF-IDF"/> <parameter key="add_meta_information" value="true"/> <parameter key="keep_text" value="false"/> <parameter key="prune_method" value="none"/> <parameter key="prune_below_percent" value="3.0"/> <parameter key="prune_above_percent" value="30.0"/> <parameter key="prune_below_rank" value="0.05"/> <parameter key="prune_above_rank" value="0.95"/> <parameter key="datamanagement" value="double_sparse_array"/> <parameter key="data_management" value="auto"/> <parameter key="select_attributes_and_weights" value="false"/> <list key="specify_weights"/> <process expanded="true"> <operator activated="true" class="text:extract_information" compatibility="8.2.000" expanded="true" height="68" name="Extract Information" width="90" x="246" y="34"> <parameter key="query_type" value="XPath"/> <list key="string_machting_queries"/> <parameter key="attribute_type" value="Nominal"/> <list key="regular_expression_queries"/> <list key="regular_region_queries"/> <list key="xpath_queries"> <parameter key="Robotname" value="h://*[@id=&quot;centerFrameWhite"]/h:p[1]/h:b"/> </list> <list key="namespaces"/> <parameter key="ignore_CDATA" value="true"/> <parameter key="assume_html" value="true"/> <list key="index_queries"/> <list key="jsonpath_queries"/> </operator> <operator activated="true" class="web:extract_html_text_content" compatibility="9.0.000" expanded="true" height="68" name="Extract Content" width="90" x="447" y="34"> <parameter key="extract_content" value="true"/> <parameter key="minimum_text_block_length" value="500"/> <parameter key="override_content_type_information" value="true"/> <parameter key="neglegt_span_tags" value="true"/> <parameter key="neglect_p_tags" value="true"/> <parameter key="neglect_b_tags" value="true"/> <parameter key="neglect_i_tags" value="true"/> <parameter key="neglect_br_tags" value="true"/> <parameter key="ignore_non_html_tags" value="true"/> </operator> <connect from_port="document" to_op="Extract Information" to_port="document"/> <connect from_op="Extract Information" from_port="document" to_op="Extract Content" to_port="document"/> <connect from_op="Extract Content" from_port="document" to_port="document 1"/> <portSpacing port="source_document" spacing="0"/> <portSpacing port="sink_document 1" spacing="0"/> <portSpacing port="sink_document 2" spacing="0"/> </process> </operator> <connect from_op="Read Excel" from_port="output" to_op="Get Pages" to_port="Example Set"/> <connect from_op="Get Pages" from_port="Example Set" to_op="Process Documents from Data" to_port="example set"/> <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator></process>