The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Traversing an XML DOM object (Using process documents?)"
Hello,
My goal is to traverse an XML document that has various attributes associated with features. Here is an example:
<attribute1 type="typeGoesHere">no consensus has yet emerged on the question of whether a dividend tax
penalty is capitalized into the return on a firm's common stock.</attribute1> <attribute3 type="typeGoesHere">The purpose of
this paper is to provide additional evidence on this question.</attribute3>
The tags follow this format. I have several hundred of these files and my goal is to traverse these files and assemble a 'bag of words' associated with each attribute and, if possible, each type as well.
I have, so far, tried:
'process documents from files'
> Extract content (WebMining/HTMLprocessing/)
> Tokenize
> Stemmer (snowball)
store example set
store word list
Any advice would be helpful. Thank you.
My goal is to traverse an XML document that has various attributes associated with features. Here is an example:
<attribute1 type="typeGoesHere">no consensus has yet emerged on the question of whether a dividend tax
penalty is capitalized into the return on a firm's common stock.</attribute1> <attribute3 type="typeGoesHere">The purpose of
this paper is to provide additional evidence on this question.</attribute3>
The tags follow this format. I have several hundred of these files and my goal is to traverse these files and assemble a 'bag of words' associated with each attribute and, if possible, each type as well.
I have, so far, tried:
'process documents from files'
> Extract content (WebMining/HTMLprocessing/)
> Tokenize
> Stemmer (snowball)
store example set
store word list
Any advice would be helpful. Thank you.
Tagged:
0
Answers
I would use the following approach: In the first time you should only extract the content and not generate a bag of word (switch vector creation of). Then you have an example set with one attribute for each attribute in your xml. Then you could process them separately using another process documents operator per attribute. of Course you can even loop over all attributes using the Loop Attributes operator.
Greetings,
Sebastian