The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Meta data problem
Benedict_von_Ah
Member Posts: 8 Contributor I
Hey,
I'm using the extension "dictionary based sentiment analysis" and have got a problem with some meta data output at the end. Everything works out fine, but i cannot see the token number. What i wanted to do: Screening text, scoring each text, output is negative/ positive and the number of uncovered tokens - so in order to be able to use the "number of uncovered tokens" i want to know the number of total tokens i have in my text. I'm using the "Extract token number" but it won't display at the end.
Thanks for help
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:filter_stopwords_german" compatibility="7.5.000" expanded="true" height="68" name="Filter Stopwords (German)" width="90" x="447" y="34">
<parameter key="stop_word_list" value="Standard"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:filter_by_length" compatibility="7.5.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="179" y="187">
<parameter key="min_chars" value="4"/>
<parameter key="max_chars" value="40"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:transform_cases" compatibility="7.5.000" expanded="true" height="68" name="Transform Cases (2)" width="90" x="313" y="187">
<parameter key="transform_to" value="lower case"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<operator activated="true" class="text:extract_token_number" compatibility="7.5.000" expanded="true" height="68" name="Extract Token Number" width="90" x="514" y="187">
<parameter key="metadata_key" value="token_number"/>
<parameter key="condition" value="all"/>
<parameter key="case_sensitive" value="false"/>
<parameter key="invert_condition" value="false"/>
</operator>
</process>
0
Answers
Your process XML appears to be malformed and won't render. Are you sure this is the XML from a single complete process?
In the meantime, "Extract Token Number" is meant to be used inside "Process Documents" so you'll need to incorporate it there in your workflow.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Sorry but I have a tangential question... @Telcontar120 - this thing seems to happen a lot. Any idea why people's XML gets corrupted in this way? @Benedict_von_Ah if you could help me understand how you pasted the XML, this would be helpful. Thanks!
Scott
@sgenzer I really don't know about the XML corruption---I remember discussing this at one point in the past with @Thomas_Ott and I think he thought it was some kind of problem with the Lithium site backend.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
yeah that's what I'm worried about but I don't see that problem when experienced users post code - only new users. I assume this is not corrupted for you?
Scott
@sgenzer yep, that one's fine for me (nice Keras model, btw) :smileyhappy:
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Now that more people are posting XML's I think I might have to rethink my original hypothesis. It appears that new users are posting corrupted XML's mostly.