The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Split a single xml file into several docs or example set
mohammadreza
Member Posts: 23 Contributor II
Hi. I am new to RapidMiner text plugin.
I have an XML file consisting of <document> elements. Each document tag contains one document as follows:
I have an XML file consisting of <document> elements. Each document tag contains one document as follows:
<documents>I think I have to split them first and extract documents to be able to construct the word vector. Is there any way to do that?
<document>
<id> 1 </id>
<text>...............</text>
</document>
<document>
<id> 1 </id>
<text>...............</text>
</document>
...
</documents>
0
Answers
Dortmund, Germany
I think read XML operator is the wise option, but I need to do some text classification after that. That's why I wanted to work with documents through text plugin. Assuming that according to your explanation I use Read XML, is this any way to work with text plugin? I mean how should I connect the output of read XML to some operator like "Process Document" or any other operator to allow me do the tokenization, stemming and make word vector?
Thanks
Thanks in advance.
looks to me like a xpath can solve this.
Have you tried the import wizard?
Sadly i got no time to try it myself. But i guess it works
best
Martin
Dortmund, Germany
The wizard might get slow, because it caches the file at some point. But it still works
Dortmund, Germany
http://stackoverflow.com/questions/700213/xml-split-of-a-large-file/7823719#7823719