The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"[solved] struggling with word list feature"
Hi,
Can anyone help me please. am struggling with the text processing section. am able to tokenize but never get any results as far as creating a word frequency list is concerned. it can't be that difficult as there are lots of preliminary software in the web for calculating word frequency lists.
the only thing that has worked so far for me is the process documents from files command, that too, showcases results of only one of the two directories i chose.
just now i did the process documents command again with a mix of things - tokenize, transform cases and filtering english stopwords - but no output - here is the process flow - i don't have any programming background so just following the way other posts have been filed.
hope someone can help me out here.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<parameter key="logfile" value="C:\Users\user3\Desktop\dir text\5.txt"/>
<parameter key="resultfile" value="C:\Users\user3\Desktop\dir text\New Text Document.txt"/>
<process expanded="true" height="100" width="145">
<operator activated="true" class="text:process_documents" compatibility="5.1.004" expanded="true" height="76" name="Process Documents" width="90" x="45" y="30">
<process expanded="true" height="414" width="762">
<operator activated="true" class="text:tokenize" compatibility="5.1.004" expanded="true" height="60" name="Tokenize" width="90" x="31" y="27"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.004" expanded="true" height="60" name="Transform Cases" width="90" x="181" y="26"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="345" y="25"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
<connect from_op="Process Documents" from_port="word list" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Can anyone help me please. am struggling with the text processing section. am able to tokenize but never get any results as far as creating a word frequency list is concerned. it can't be that difficult as there are lots of preliminary software in the web for calculating word frequency lists.
the only thing that has worked so far for me is the process documents from files command, that too, showcases results of only one of the two directories i chose.
just now i did the process documents command again with a mix of things - tokenize, transform cases and filtering english stopwords - but no output - here is the process flow - i don't have any programming background so just following the way other posts have been filed.
hope someone can help me out here.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
<parameter key="logfile" value="C:\Users\user3\Desktop\dir text\5.txt"/>
<parameter key="resultfile" value="C:\Users\user3\Desktop\dir text\New Text Document.txt"/>
<process expanded="true" height="100" width="145">
<operator activated="true" class="text:process_documents" compatibility="5.1.004" expanded="true" height="76" name="Process Documents" width="90" x="45" y="30">
<process expanded="true" height="414" width="762">
<operator activated="true" class="text:tokenize" compatibility="5.1.004" expanded="true" height="60" name="Tokenize" width="90" x="31" y="27"/>
<operator activated="true" class="text:transform_cases" compatibility="5.1.004" expanded="true" height="60" name="Transform Cases" width="90" x="181" y="26"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="345" y="25"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
<connect from_op="Process Documents" from_port="word list" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
is the xml code you have posted the whole process? In that case you aren't getting any results because you aren't providing any documents to "Process Documents" operator.
You can use the "Read Document" operator to load documents. Connect it with the "Process Documents" operator and your results shouldn't be empty.
Greetings
Nils
Thanks a lot,
Guess I have to play around a lot right now
Nice to have people to bank upon, and great work happening here,
Please do keep it up,
Sana