The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Basics of FP-Growth
University Professor
Hello all,
I am struggling quite a bit with the FP-growth operator. I got all sorts of errors (no binomial attributes when I manually set them to binomial, outputs that I cannot understand, etc). I am trying to run the smallest possible example: 2 transactions, 3 products (juice, meat and milk)! My excel file is like that:
0 0 1
0 0 1
What am I doing wrong? What are the basic errors one should avoid when using FP-Growth? I read the help page at RM on this operator and I found it extremely confusing also. Any help is appreciated, I just want to use the operator in the simples possible way.
Regards,
Bernardo
I am struggling quite a bit with the FP-growth operator. I got all sorts of errors (no binomial attributes when I manually set them to binomial, outputs that I cannot understand, etc). I am trying to run the smallest possible example: 2 transactions, 3 products (juice, meat and milk)! My excel file is like that:
0 0 1
0 0 1
What am I doing wrong? What are the basic errors one should avoid when using FP-Growth? I read the help page at RM on this operator and I found it extremely confusing also. Any help is appreciated, I just want to use the operator in the simples possible way.
Regards,
Bernardo
Tagged:
1
Best Answer
-
bernardo_pagnon
Member, University Professor Posts: 64
University Professor
Oh, now I see: this option has tow modes, and when find min number of itemsets is checked it ignores this minimum value.Solved!!!1
Answers
I think there is something weird going on: using the exact same steps as the author suggests, I got the same result as he did. For instance, the frequency of "juices" as a single item was 0.780, while the one for desserts was 0.312. Then I implemented the same situation, but now I used "read csv", and the "numerical to binomial" operator. The results for the frequencies were .220 for Juice, and 0.312 for desserts. I checked on Excel, using COUNT IF, and the last results seem to be the correct ones. Strange. It seems that RM is not counting those singletons properly, or some operator inverts a few of the values. I would appreciate it if someone could check that.
Best,
Bernardo
I tested on the same market data downloaded from http://rapidminerbook.com/index.php/chapter-downloads/chapter-8/
The frequency output for "juices" is shown as 0.219613 which matches with your Excel count if results.
<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Supermarket_Extracted" width="90" x="313" y="85"> <parameter key="repository_entry" value="//demo/FP-Growth/Supermarket_Extracted"/> </operator> <operator activated="true" class="set_role" compatibility="9.6.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="85"> <parameter key="attribute_name" value="receipt_id"/> <parameter key="target_role" value="id"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="numerical_to_binominal" compatibility="9.6.000" expanded="true" height="82" name="Numerical to Binominal" width="90" x="648" y="85"> <parameter key="attribute_filter_type" value="all"/> <parameter key="attribute" value=""/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="numeric"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="real"/> <parameter key="block_type" value="value_series"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_series_end"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <parameter key="min" value="0.0"/> <parameter key="max" value="0.0"/> </operator> <operator activated="true" class="concurrency:fp_growth" compatibility="9.6.000" expanded="true" height="82" name="FP-Growth" origin="GENERATED_SAMPLE" width="90" x="782" y="85"> <parameter key="input_format" value="items in dummy coded columns"/> <parameter key="item_separators" value="|"/> <parameter key="use_quotes" value="false"/> <parameter key="quotes_character" value="""/> <parameter key="escape_character" value="\"/> <parameter key="trim_item_names" value="true"/> <parameter key="positive_value" value="true"/> <parameter key="min_requirement" value="support"/> <parameter key="min_support" value="0.005"/> <parameter key="min_frequency" value="100"/> <parameter key="min_items_per_itemset" value="1"/> <parameter key="max_items_per_itemset" value="0"/> <parameter key="max_number_of_itemsets" value="1000000"/> <parameter key="find_min_number_of_itemsets" value="false"/> <parameter key="min_number_of_itemsets" value="100"/> <parameter key="max_number_of_retries" value="15"/> <parameter key="requirement_decrease_factor" value="0.9"/> <enumeration key="must_contain_list"/> </operator> <operator activated="true" class="create_association_rules" compatibility="9.6.000" expanded="true" height="82" name="Create Association Rules" origin="GENERATED_SAMPLE" width="90" x="916" y="34"> <parameter key="criterion" value="confidence"/> <parameter key="min_confidence" value="0.1"/> <parameter key="min_criterion_value" value="0.8"/> <parameter key="gain_theta" value="2.0"/> <parameter key="laplace_k" value="1.0"/> </operator> <connect from_op="Retrieve Supermarket_Extracted" from_port="output" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/> <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/> <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/> <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/> <connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> </process> </operator> </process>YY
You have opened duplicated threads on the same question. For easy communication and trace down the issues, please go to
https://community.rapidminer.com/discussion/45849/fp-growth-itemset-one-of-the-items-is-oversupported#latest