The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[SOLVED] xpath
Below is an example XML.
<p>
Thisisgood
</p>
<p>
Thisisbad
</p>
<p>
This
<br>
is
<br>
acceptable
</p>
<p>
Thisisfine
</p>
I want the result:
Thisisgood
Thisisbad
Thisisacceptable
Thisisfine
I use Xpath //p/text() in Google Doc (=importXML). Ultimately, I will use //h:p/text() in Rapidminer (with Extract Information operator). This results in:
Thisisgood
Thisisbad
This is acceptable (appearing in different cells)
Thisisfine
What XPath would give me the result I need? Thank you.
<p>
Thisisgood
</p>
<p>
Thisisbad
</p>
<p>
This
<br>
is
<br>
acceptable
</p>
<p>
Thisisfine
</p>
I want the result:
Thisisgood
Thisisbad
Thisisacceptable
Thisisfine
I use Xpath //p/text() in Google Doc (=importXML). Ultimately, I will use //h:p/text() in Rapidminer (with Extract Information operator). This results in:
Thisisgood
Thisisbad
This is acceptable (appearing in different cells)
Thisisfine
What XPath would give me the result I need? Thank you.
0
Answers
Best regards,
Marius
Thisisgood
Thisisbad
Thisisacceptable
Thisisfine
I DO NOT want:
This is acceptable (appearing in different cells)
Thanks.
this is the community forum - for guaranteed answering times please consider to get a support contract. During the holidays our main focus is not on free support
However, let's focus on your issues: which versions of RapidMiner and the Text and Web extension are you using? I can't reproduce the behavior with text in different cells with Extract Information. In the latest versions Extract Information delivers only the first result node, in the case of //h:p/text() that would be "This" in the "this is acceptable" case. This is surely also not what you want. So in your case the proceeding would be to cut the document into its p tags and then extract the content of each p node with Extract Content. Optionally you can then use Replace to remove the spaces.
Please see the process below for details.
Best regards,
Marius