The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
FP-Growth results inconsistent
Hi,
I'm a student using RapidMiner in a University Data Mining course. So far it's been without any issues, and now I've arrived at Association rules. Here I've hit a bump in the road which i hope someone may be able to point me in the right direction. I've chosen a dataset with a bunch of transactions (21 000). Every item has it's own example (many examples could be the same transaction), so I've filtered out the irrelevant attributes and then converted it into binominal to get the items as attributes.
I've tried many different ways to use the aggregate operator. I ended up using concatenation and two replace operators, first converting everything containing true into true, and secondly everything containing false into false. Then i remove the transaction number with select.and converting it into binominal.
This was a messy way to solve it but it appeared to be working. I now wanted to do it more cleanly so i started experimenting with different approaches. One being converting it into numerical instead of binominal, this way i could use the sum function in the aggregate operator. I then remove the transaction attribute and convert it into binominal.
As far as I can tell, the result appear to be the same. 9531 examples, 95 attributes. As far as i can tell the true and false values are the same for every attribute comparing side by side. However, the results from the following FP-Growth operator differ, one showing bread have a support of 0.675 and the other giving bread 0.325. I manually calculated it and the correct result would be 0.325, this meaning that the concationation approach is incorrect. Now my question is why? What am I missing, as far as i can tell, the input into the FP-Growth operator is the same. I'm aware that there must be much better ways to solve this problem, but what I'm most interested in is why the results differ using these two methods?
Thankful for any help.
Best Regards
David
It seems I'm not allowed to make links, but my dataset was downloaded from github.com/viktree/curly-octo-chainsaw
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Chapter5_Assignment" width="90" x="45" y="85"><br> <parameter key="repository_entry" value="../data/Chapter5_Assignment"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85"><br> <parameter key="attribute_filter_type" value="subset"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value="Item|Transaction"/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <operator activated="true" class="nominal_to_binominal" compatibility="9.3.001" expanded="true" height="103" name="Nominal to Binominal" width="90" x="313" y="85"><br> <parameter key="return_preprocessing_model" value="false"/><br> <parameter key="create_view" value="false"/><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="transform_binominal" value="false"/><br> <parameter key="use_underscore_in_name" value="false"/><br> </operator><br> <operator activated="true" class="aggregate" compatibility="9.3.001" expanded="true" height="82" name="Aggregate" width="90" x="447" y="85"><br> <parameter key="use_default_aggregation" value="true"/><br> <parameter key="attribute_filter_type" value="subset"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value="Item = Adjustment|Item = Afternoon with the baker|Item = Alfajores|Item = Argentina Night|Item = Art Tray|Item = Bacon|Item = Baguette|Item = Bakewell|Item = Bare Popcorn|Item = Basket|Item = Bowl Nic Pitt|Item = Bread|Item = Bread Pudding|Item = Brioche and salami|Item = Brownie|Item = Cake|Item = Caramel bites|Item = Cherry me Dried fruit|Item = Chicken sand|Item = Chicken Stew|Item = Chimichurri Oil|Item = Chocolates|Item = Christmas common|Item = Coffee|Item = Coffee granules |Item = Coke|Item = Cookies|Item = Crepes|Item = Crisps|Item = Drinking chocolate spoons |Item = Duck egg|Item = Dulce de Leche|Item = Eggs|Item = Ella's Kitchen Pouches|Item = Empanadas|Item = Extra Salami or Feta|Item = Fairy Doors|Item = Farm House|Item = Focaccia|Item = Frittata|Item = Fudge|Item = Gift voucher|Item = Gingerbread syrup|Item = Granola|Item = Hack the stack|Item = Half slice Monster |Item = Hearty & Seasonal|Item = Honey|Item = Hot chocolate|Item = Jam|Item = Jammie Dodgers|Item = Juice|Item = Keeping It Local|Item = Kids biscuit|Item = Lemon and coconut|Item = Medialuna|Item = Mighty Protein|Item = Mineral water|Item = Mortimer|Item = Muesli|Item = Muffin|Item = My-5 Fruit Shoot|Item = Nomad bag|Item = NONE|Item = Olum & polenta|Item = Panatone|Item = Pastry|Item = Pick and Mix Bowls|Item = Pintxos|Item = Polenta|Item = Postcard|Item = Raspberry shortbread sandwich|Item = Raw bars|Item = Salad|Item = Sandwich|Item = Scandinavian|Item = Scone|Item = Siblings|Item = Smoothies|Item = Soup|Item = Spanish Brunch|Item = Spread|Item = Tacos/Fajita|Item = Tartine|Item = Tea|Item = The BART|Item = The Nomad|Item = Tiffin|Item = Toast|Item = Truffles|Item = Tshirt|Item = Valentine's card|Item = Vegan Feast|Item = Vegan mincepie|Item = Victorian Sponge"/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="default_aggregation_function" value="concatenation"/><br> <list key="aggregation_attributes"/><br> <parameter key="group_by_attributes" value="Transaction"/><br> <parameter key="count_all_combinations" value="false"/><br> <parameter key="only_distinct" value="false"/><br> <parameter key="ignore_missings" value="true"/><br> </operator><br> <operator activated="true" class="replace" compatibility="9.3.001" expanded="true" height="82" name="Replace" width="90" x="581" y="85"><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="replace_what" value=".*true.*"/><br> <parameter key="replace_by" value="true"/><br> </operator><br> <operator activated="true" class="replace" compatibility="9.3.001" expanded="true" height="82" name="Replace (2)" width="90" x="715" y="85"><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="replace_what" value=".*false.*"/><br> <parameter key="replace_by" value="false"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="581" y="238"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="Transaction"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="true"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <operator activated="true" class="nominal_to_binominal" compatibility="9.3.001" expanded="true" height="103" name="Nominal to Binominal (2)" width="90" x="715" y="238"><br> <parameter key="return_preprocessing_model" value="false"/><br> <parameter key="create_view" value="false"/><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="transform_binominal" value="false"/><br> <parameter key="use_underscore_in_name" value="false"/><br> </operator><br> <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Chapter5_Assignment (2)" width="90" x="45" y="391"><br> <parameter key="repository_entry" value="../data/Chapter5_Assignment"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes (3)" width="90" x="179" y="391"><br> <parameter key="attribute_filter_type" value="subset"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value="Item|Transaction"/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <operator activated="true" class="nominal_to_numerical" compatibility="9.3.001" expanded="true" height="103" name="Nominal to Numerical" width="90" x="313" y="391"><br> <parameter key="return_preprocessing_model" value="false"/><br> <parameter key="create_view" value="false"/><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="nominal"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="file_path"/><br> <parameter key="block_type" value="single_value"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="single_value"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="coding_type" value="dummy coding"/><br> <parameter key="use_comparison_groups" value="false"/><br> <list key="comparison_groups"/><br> <parameter key="unexpected_value_handling" value="all 0 and warning"/><br> <parameter key="use_underscore_in_name" value="false"/><br> </operator><br> <operator activated="true" class="aggregate" compatibility="9.3.001" expanded="true" height="82" name="Aggregate (2)" width="90" x="447" y="391"><br> <parameter key="use_default_aggregation" value="true"/><br> <parameter key="attribute_filter_type" value="subset"/><br> <parameter key="attribute" value="Transaction"/><br> <parameter key="attributes" value="Item = Adjustment|Item = Afternoon with the baker|Item = Alfajores|Item = Argentina Night|Item = Art Tray|Item = Bacon|Item = Baguette|Item = Bakewell|Item = Bare Popcorn|Item = Basket|Item = Bowl Nic Pitt|Item = Bread|Item = Bread Pudding|Item = Brioche and salami|Item = Brownie|Item = Cake|Item = Caramel bites|Item = Cherry me Dried fruit|Item = Chicken sand|Item = Chicken Stew|Item = Chimichurri Oil|Item = Chocolates|Item = Christmas common|Item = Coffee|Item = Coffee granules |Item = Coke|Item = Cookies|Item = Crepes|Item = Crisps|Item = Drinking chocolate spoons |Item = Duck egg|Item = Dulce de Leche|Item = Eggs|Item = Ella's Kitchen Pouches|Item = Empanadas|Item = Extra Salami or Feta|Item = Fairy Doors|Item = Farm House|Item = Focaccia|Item = Frittata|Item = Fudge|Item = Gift voucher|Item = Gingerbread syrup|Item = Granola|Item = Hack the stack|Item = Half slice Monster |Item = Hearty & Seasonal|Item = Honey|Item = Hot chocolate|Item = Jam|Item = Jammie Dodgers|Item = Juice|Item = Keeping It Local|Item = Kids biscuit|Item = Lemon and coconut|Item = Medialuna|Item = Mighty Protein|Item = Mineral water|Item = Mortimer|Item = Muesli|Item = Muffin|Item = My-5 Fruit Shoot|Item = Nomad bag|Item = NONE|Item = Olum & polenta|Item = Panatone|Item = Pastry|Item = Pick and Mix Bowls|Item = Pintxos|Item = Polenta|Item = Postcard|Item = Raspberry shortbread sandwich|Item = Raw bars|Item = Salad|Item = Sandwich|Item = Scandinavian|Item = Scone|Item = Siblings|Item = Smoothies|Item = Soup|Item = Spanish Brunch|Item = Spread|Item = Tacos/Fajita|Item = Tartine|Item = Tea|Item = The BART|Item = The Nomad|Item = Tiffin|Item = Toast|Item = Truffles|Item = Tshirt|Item = Valentine's card|Item = Vegan Feast|Item = Vegan mincepie|Item = Victorian Sponge"/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="default_aggregation_function" value="sum"/><br> <list key="aggregation_attributes"/><br> <parameter key="group_by_attributes" value="Transaction"/><br> <parameter key="count_all_combinations" value="false"/><br> <parameter key="only_distinct" value="false"/><br> <parameter key="ignore_missings" value="true"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.3.001" expanded="true" height="82" name="Select Attributes (4)" width="90" x="581" y="391"><br> <parameter key="attribute_filter_type" value="single"/><br> <parameter key="attribute" value="Transaction"/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="true"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <operator activated="true" class="numerical_to_binominal" compatibility="9.3.001" expanded="true" height="82" name="Numerical to Binominal" width="90" x="715" y="391"><br> <parameter key="attribute_filter_type" value="all"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value=""/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="numeric"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="real"/><br> <parameter key="block_type" value="value_series"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_series_end"/><br> <parameter key="invert_selection" value="false"/><br> <parameter key="include_special_attributes" value="false"/><br> <parameter key="min" value="0.0"/><br> <parameter key="max" value="0.0"/><br> </operator><br> <operator activated="true" class="concurrency:fp_growth" compatibility="9.3.001" expanded="true" height="82" name="FP-Growth" width="90" x="983" y="85"><br> <parameter key="input_format" value="items in dummy coded columns"/><br> <parameter key="item_separators" value="|"/><br> <parameter key="use_quotes" value="false"/><br> <parameter key="quotes_character" value="""/><br> <parameter key="escape_character" value="\"/><br> <parameter key="trim_item_names" value="true"/><br> <parameter key="min_requirement" value="support"/><br> <parameter key="min_support" value="0.95"/><br> <parameter key="min_frequency" value="100"/><br> <parameter key="min_items_per_itemset" value="1"/><br> <parameter key="max_items_per_itemset" value="0"/><br> <parameter key="max_number_of_itemsets" value="1000000"/><br> <parameter key="find_min_number_of_itemsets" value="true"/><br> <parameter key="min_number_of_itemsets" value="100"/><br> <parameter key="max_number_of_retries" value="15"/><br> <parameter key="requirement_decrease_factor" value="0.9"/><br> <enumeration key="must_contain_list"/><br> </operator><br> <operator activated="true" class="concurrency:fp_growth" compatibility="9.3.001" expanded="true" height="82" name="FP-Growth (2)" width="90" x="849" y="391"><br> <parameter key="input_format" value="items in dummy coded columns"/><br> <parameter key="item_separators" value="|"/><br> <parameter key="use_quotes" value="false"/><br> <parameter key="quotes_character" value="""/><br> <parameter key="escape_character" value="\"/><br> <parameter key="trim_item_names" value="true"/><br> <parameter key="min_requirement" value="support"/><br> <parameter key="min_support" value="0.95"/><br> <parameter key="min_frequency" value="100"/><br> <parameter key="min_items_per_itemset" value="1"/><br> <parameter key="max_items_per_itemset" value="0"/><br> <parameter key="max_number_of_itemsets" value="1000000"/><br> <parameter key="find_min_number_of_itemsets" value="true"/><br> <parameter key="min_number_of_itemsets" value="100"/><br> <parameter key="max_number_of_retries" value="15"/><br> <parameter key="requirement_decrease_factor" value="0.9"/><br> <enumeration key="must_contain_list"/><br> </operator><br> <connect from_op="Retrieve Chapter5_Assignment" from_port="output" to_op="Select Attributes" to_port="example set input"/><br> <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/><br> <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Aggregate" to_port="example set input"/><br> <connect from_op="Aggregate" from_port="example set output" to_op="Replace" to_port="example set input"/><br> <connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/><br> <connect from_op="Replace (2)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/><br> <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Nominal to Binominal (2)" to_port="example set input"/><br> <connect from_op="Nominal to Binominal (2)" from_port="example set output" to_op="FP-Growth" to_port="example set"/><br> <connect from_op="Retrieve Chapter5_Assignment (2)" from_port="output" to_op="Select Attributes (3)" to_port="example set input"/><br> <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/><br> <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Aggregate (2)" to_port="example set input"/><br> <connect from_op="Aggregate (2)" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/><br> <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/><br> <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth (2)" to_port="example set"/><br> <connect from_op="FP-Growth" from_port="example set" to_port="result 1"/><br> <connect from_op="FP-Growth" from_port="frequent sets" to_port="result 2"/><br> <connect from_op="FP-Growth (2)" from_port="example set" to_port="result 3"/><br> <connect from_op="FP-Growth (2)" from_port="frequent sets" to_port="result 4"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> <portSpacing port="sink_result 3" spacing="0"/><br> <portSpacing port="sink_result 4" spacing="0"/><br> <portSpacing port="sink_result 5" spacing="0"/><br> <portSpacing port="sink_result 6" spacing="0"/><br> <portSpacing port="sink_result 7" spacing="0"/><br> </process><br> </operator><br></process>
0
Best Answer
-
gmeier Employee-RapidMiner, Member Posts: 25 RM EngineeringHi @zept,please set the parameter "positive value" of your first FP-Growth operator to "true". If you cannot see this parameter, click first on "Show advanced parameters" at the bottom of the Parameters panel. Then both operators yield the same result.The reason that this is necessary is because the Nominal to Binominal operator befor FP-Growth does not recognize correctly that "true" should be the positive value everywhere, since you created the true and false values by replacing something else instead of using a Numerical to Binominal operator as in the alternative below.Hope that helps!3
Answers