The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"FP-Growth and Create Association Operators Hanging"
TyrellCorp
Member Posts: 3 Contributor I
Its been a week working with RapidMiner Studio 5.3 and I have yet to successfully process text files with FP-Growth and Create Association operators. I even tried a one-page text document and it continuously processes without stopping, then it will freeze up. Sometimes it will generate a "Process Failed" message. What am I doing wrong? I'm running RapidMiner on a OS X 10.8.5, 2.3 GHz Intel Core i7, 8G RAM My code is below:
https://www.youtube.com/watch?v=HBrYuV8eWjc
https://www.youtube.com/watch?v=oXrUz5CWM4E
There are similar issue I have read (http://rapid-i.com/rapidforum/index.php/topic,7572.0.html, http://rapid-i.com/rapidforum/index.php/topic,7541.0.html)
I found two tutorials on YouTube both of which do a similar process but with success.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="loop_files" compatibility="5.3.015" expanded="true" height="76" name="Loop Files" width="90" x="45" y="75">
<parameter key="directory" value="/Users/ddavis/Desktop/Emotion"/>
<process expanded="true">
<operator activated="true" class="text:read_document" compatibility="5.3.002" expanded="true" height="60" name="Read Document" width="90" x="112" y="75"/>
<connect from_port="file object" to_op="Read Document" to_port="file"/>
<connect from_op="Read Document" from_port="output" to_port="out 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_documents" compatibility="5.3.002" expanded="true" height="94" name="Process Documents" width="90" x="246" y="210">
<parameter key="vector_creation" value="Binary Term Occurrences"/>
<parameter key="prune_below_absolute" value="1"/>
<parameter key="prune_above_absolute" value="999"/>
<process expanded="true">
<operator activated="true" class="text:transform_cases" compatibility="5.3.002" expanded="true" height="60" name="Transform Cases" width="90" x="45" y="120"/>
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="179" y="210"/>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.3.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="313" y="120"/>
<operator activated="true" class="text:filter_by_length" compatibility="5.3.002" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="447" y="210">
<parameter key="min_chars" value="2"/>
<parameter key="max_chars" value="9999"/>
</operator>
<operator activated="true" class="text:stem_snowball" compatibility="5.3.002" expanded="true" height="60" name="Stem (Snowball)" width="90" x="648" y="120"/>
<connect from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_op="Stem (Snowball)" to_port="document"/>
<connect from_op="Stem (Snowball)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="5.3.015" expanded="true" height="76" name="Numerical to Binominal" width="90" x="447" y="120"/>
<operator activated="true" class="fp_growth" compatibility="5.3.015" expanded="true" height="76" name="FP-Growth" width="90" x="581" y="255"/>
<operator activated="true" class="create_association_rules" compatibility="5.3.015" expanded="true" height="76" name="Create Association Rules" width="90" x="782" y="120"/>
<connect from_port="input 1" to_op="Loop Files" to_port="in 1"/>
<connect from_op="Loop Files" from_port="out 1" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
https://www.youtube.com/watch?v=HBrYuV8eWjc
https://www.youtube.com/watch?v=oXrUz5CWM4E
There are similar issue I have read (http://rapid-i.com/rapidforum/index.php/topic,7572.0.html, http://rapid-i.com/rapidforum/index.php/topic,7541.0.html)
Tagged:
0
Answers
what does the error say when it fails? What does the log say (user home/.RapidMiner5/rm.log?
How do you start RapidMiner and how much memory is available (see "View" -> "Show View" -> "System Monitor")
Regards,
Marco
This comes up rather often, and has been recognised as a weakness since April 2011. The problem is that RM 5.3 can only handle small numbers of attributes, because the association rule generator tries to generate the powerset of each frequent itemset found. You may find this link illuminating.
https://rapid-i.com/rapidforum/index.php/topic,6837.0.html
I'm not sure whether this got fixed in the new version, perhaps an RM staffer could inform?
Best
H
I use RapidMiner on Mac and Windows. Below are some of the info requested.
Mac, 8 GB
System monitor:
Max: 1.7 GB
Total: 1.7 GB
Windows, 16 GB
System monitor:
Max: 10 GB
Total: 10 GB
The Log has the message:
Mar 25, 2014 10:17:48 AM INFO: No filename given for result file, using stdout for logging results!
Mar 25, 2014 10:17:48 AM INFO: Process starts
Mar 25, 2014 10:17:48 AM INFO: Loading initial data.
Mar 25, 2014 10:18:52 AM INFO: Process stopped. Completing current operator.
Mar 25, 2014 10:18:52 AM INFO: FP-Growth: Process stopped.
Mar 25, 2014 10:18:52 AM INFO: Process stopped in FP-Growth
I just downloaded RM 6 and will see if this is still an issue.
Funnily enough I've just attended a webinar which showed a word associator, not in RM, but interesting nevertheless ( datamonkees.wpengine.com ), so I sympathise, as I too have trod this path. As to your work, the problem could be one or more of the following.
Pre-pain checks...
Memory - are you really sure that you've given Java enough space?
Data - put a break in after each of the pre-processing operators, especially the last ( poly - > binominal ), is it always showing what you expect? No missings, all just clean and hunkydory?
In situ hints...
KISS - Keep It Simple Sometimes, at least to start. High frequency threshold, look for specific stems, only short sets, or specific items etc... Do whatever you can to produce a few short itemsets. So a break after FP-Growth, just to admire your handiwork.
FART - Finally Analyse Real Things. Don't bother to worry about making rules until your itemsets roll through. Then you'll see my point kicking in, unless the code has changed long itemsets will choke the association rules generator because of the powerset approach. Only RM staffers can tell you about RM6.
Good luck with your project, I spend a lot of time in this "meme machine" space, and find it very rewarding as an exploratory tool.
Best
H
Tyrell Corporation