The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Open WordNet Dictionary and Extract Sentiment
Hi there,
I am using the open wordnet dictionary operator along with the extract sentiment (english) operator. I have set up the dictionary path to point to the correct folder. I have a couple 100 text files that I want to analyze, if I analyze just 8 of those files the process runs fine, however if I try run it against 9+ of the text files I get an I/O error that the resource can't be read and parsed.
Is it possible to have the wordnet operator run once and remember the list of words instead of running for each time a new file is captured through the process documents from files operator?
If this is not possible, is there a way to overcome this issue?
Many Thanks.
Tagged:
0
Answers
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Files" width="90" x="246" y="34">
<list key="text_directories">
<parameter key="fx_bukley_reviews" value="C:Tripadvisor Text Files 1"/>
</list>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="112" y="85"/>
<operator activated="true" class="text:transform_cases" compatibility="8.1.000" expanded="true" height="68" name="Transform Cases" width="90" x="246" y="85"/>
<operator activated="true" class="text:filter_by_length" compatibility="8.1.000" expanded="true" height="68" name="Filter Tokens (by Length)" width="90" x="380" y="85">
<parameter key="min_chars" value="3"/>
</operator>
<operator activated="true" class="text:stem_porter" compatibility="8.1.000" expanded="true" height="68" name="Stem (Porter)" width="90" x="581" y="85"/>
<operator activated="true" class="wordnet:open_wordnet_dictionary" compatibility="5.3.000" expanded="true" height="68" name="Open WordNet Dictionary" width="90" x="581" y="187">
<parameter key="directory" value="C:\Desktop\WordNet-3.0\dict"/>
</operator>
<operator activated="true" class="wordnet:find_sentiment_wordnet" compatibility="5.3.000" expanded="true" height="82" name="Extract Sentiment (English)" width="90" x="782" y="85"/>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Stem (Porter)" to_port="document"/>
<connect from_op="Stem (Porter)" from_port="document" to_op="Extract Sentiment (English)" to_port="document"/>
<connect from_op="Open WordNet Dictionary" from_port="dictionary" to_op="Extract Sentiment (English)" to_port="dictionary"/>
<connect from_op="Extract Sentiment (English)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="34">
<list key="function_descriptions">
<parameter key="Recommend" value="if(sentiment<0,"NO","YES")"/>
</list>
</operator>
<connect from_port="input 1" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Is it always on the same txt file that the error occurs ? can you perform some tests ?
In order we can reproduce what you observe, can you share :
- your .txt files (a minima 9 .txt files)
- your dictionnary
Regards,
Lionel
I executed your process on 12 of my own .txt files (with Wordnet 3.0 dictionnary) and I have no problem : Your process
works fine...
So my hypothesis is that one of your .txt files poses problem...
Regards,
Lionel
Maybe an answer element :
Try to set file pattern = *.txt (instead file pattern = *) in the Process Documents from Files parameters.
Hope it helps,
Regards,
Lionel
I'm not able to reproduce the I/O error : Your process works fine with the 16 .txt files you shared on my computer.
Can you detail the I/O error you encounred ? Can you share the RapidMiner log file ?
Regards,
Lionel
Which OS are you running?
Regards,
Lionel
I'm running Windows 10 too and with your files and your Wordnet dictionary, the process works fine here.
I do not know what to think...
Try to update RapidMiner to the latest release (RM 9.1)...and if needed the extension Wordnet
Anyone have an idea?
Regards,
Lionel
If anyone else has suggestions they are appreciated. Thanks