The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
FeatureExtraction - xpath - span
Hi,
I am trying to use FeatureExtraction to extract some text from a web page (XHTML). But it does not seem to work.
The xpath location i get using mozilla firebug is: /html/body/span/div/div/h2
For Rapidminer the xpath query i am using is: /h:html/h:body/h:span/h:div/h:div/h:h2/text()
The above xpath query does not seem to work in rapidminer.
But if i remove the span tag from the webpage, the resulting xpath query seems to work. [/h:html/h:body/h:div/h:div/h:h2/text()]
So my question is how do i extract text from a webpage which has a span tag.
Thanks
I am trying to use FeatureExtraction to extract some text from a web page (XHTML). But it does not seem to work.
The xpath location i get using mozilla firebug is: /html/body/span/div/div/h2
For Rapidminer the xpath query i am using is: /h:html/h:body/h:span/h:div/h:div/h:h2/text()
The above xpath query does not seem to work in rapidminer.
But if i remove the span tag from the webpage, the resulting xpath query seems to work. [/h:html/h:body/h:div/h:div/h:h2/text()]
So my question is how do i extract text from a webpage which has a span tag.
Thanks
0
Answers
I have actually not really an idea right now but it could be that the problem is that "span" is actually not allowed to have inner container tags like div etc., right? Maybe this is the reason why xpath fails here.
Cheers,
Ingo