The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
problem with stopwordfilterfile
nguyenxuanhau
Member Posts: 22 Contributor II
my file xml as:
<process version="4.6">
<operator name="Root" class="Process" expanded="yes">
<description text="Text Hau"/>
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="UTF-8"/>
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
<parameter key="graphics" value="dulieu"/>
</list>
<parameter key="default_content_type" value=""/>
<parameter key="default_content_encoding" value="utf-8"/>
<parameter key="default_content_language" value=""/>
<parameter key="prune_below" value="-1"/>
<parameter key="prune_above" value="-1"/>
<parameter key="vector_creation" value="TermOccurrences"/>
<parameter key="use_content_attributes" value="false"/>
<parameter key="use_given_word_list" value="false"/>
<parameter key="return_word_list" value="false"/>
<parameter key="id_attribute_type" value="short"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="false"/>
<parameter key="on_the_fly_pruning" value="-1"/>
<parameter key="extend_exampleset" value="false"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="StopwordFilterFile" class="StopwordFilterFile">
<parameter key="file" value="dulieu/stopword.txt"/>
<parameter key="case_sensitive" value="true"/>
</operator>
</operator>
</operator>
</process>
when i run this file, it don't filter words that were encoded by utf-8
<process version="4.6">
<operator name="Root" class="Process" expanded="yes">
<description text="Text Hau"/>
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="UTF-8"/>
<operator name="TextInput" class="TextInput" expanded="yes">
<list key="texts">
<parameter key="graphics" value="dulieu"/>
</list>
<parameter key="default_content_type" value=""/>
<parameter key="default_content_encoding" value="utf-8"/>
<parameter key="default_content_language" value=""/>
<parameter key="prune_below" value="-1"/>
<parameter key="prune_above" value="-1"/>
<parameter key="vector_creation" value="TermOccurrences"/>
<parameter key="use_content_attributes" value="false"/>
<parameter key="use_given_word_list" value="false"/>
<parameter key="return_word_list" value="false"/>
<parameter key="id_attribute_type" value="short"/>
<list key="namespaces">
</list>
<parameter key="create_text_visualizer" value="false"/>
<parameter key="on_the_fly_pruning" value="-1"/>
<parameter key="extend_exampleset" value="false"/>
<operator name="StringTokenizer" class="StringTokenizer">
</operator>
<operator name="StopwordFilterFile" class="StopwordFilterFile">
<parameter key="file" value="dulieu/stopword.txt"/>
<parameter key="case_sensitive" value="true"/>
</operator>
</operator>
</operator>
</process>
when i run this file, it don't filter words that were encoded by utf-8
0
Answers
if you switch to expert mode of RapidMiner in the parameters view, you will see that there is an encoding parameter. If you set this parameter to UTF-8 the process will work.
Greetings,
Sebastian