FP growth produced infinite value for association rules
Hello, all
As I got stuck with memory when running old version of FP-growth. Now, I'm using FP-growth ver 8.2. It's so cool that I can limit my results by setting some parameters of FP-growth-min and max items per itemset, i.e., min items per itemset = 2 and max items per itemset = 2 in order to complete the process for my memory limitations. Everything goes well, FI can go through Create Association Rules, but the confidence is shown as infinity value. How can I get the real confidence for this situation?.
(I tried to run by not setting min and max items per itemset (default = 1 and 0), the confidence values are correct)
Thank you very much in advanced.
aKe.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read Pivot" width="90" x="45" y="136">
<parameter key="csv_file" value="I:\Google Drive\iAnA\elderly\ana_data\elderly_P1_F551_test.csv"/>
<parameter key="column_separators" value=",\s*|;\s*"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="TIS-620"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ID.true.polynominal.attribute"/>
<parameter key="1" value="A1.true.integer.attribute"/>
<parameter key="2" value="A3.true.integer.attribute"/>
<parameter key="3" value="A4.true.integer.attribute"/>
<parameter key="4" value="A5.true.integer.attribute"/>
<parameter key="5" value="A6.true.integer.attribute"/>
<parameter key="6" value="A6_1.true.attribute_value.attribute"/>
<parameter key="7" value="A7.true.integer.attribute"/>
<parameter key="8" value="A8.true.integer.attribute"/>
<parameter key="9" value="A9.true.integer.attribute"/>
<parameter key="10" value="A10.true.integer.attribute"/>
<parameter key="11" value="A11.true.integer.attribute"/>
<parameter key="12" value="A12.true.integer.attribute"/>
<parameter key="13" value="F55.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="8.2.000" expanded="true" height="82" name="Numerical to Binominal" width="90" x="45" y="238"/>
<operator activated="true" class="remove_useless_attributes" compatibility="8.2.000" expanded="true" height="82" name="Remove Useless Attributes" width="90" x="45" y="340">
<parameter key="nominal_useless_below" value="0.2"/>
<description align="left" color="transparent" colored="false" width="126">Remove 1-itemset that support &lt; &quot;nominal useless below&quot;</description>
</operator>
<operator activated="true" breakpoints="after" class="concurrency:fp_growth" compatibility="8.2.000" expanded="true" height="82" name="FP-Growth" width="90" x="179" y="340">
<parameter key="positive_value" value="true"/>
<parameter key="min_support" value="0.2"/>
<parameter key="min_items_per_itemset" value="2"/>
<parameter key="max_items_per_itemset" value="2"/>
<enumeration key="must_contain_list"/>
</operator>
<operator activated="true" breakpoints="after" class="create_association_rules" compatibility="8.2.000" expanded="true" height="82" name="Create Association Rules" width="90" x="313" y="340">
<parameter key="min_confidence" value="0.5"/>
</operator>
<connect from_op="Read Pivot" from_port="output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="Remove Useless Attributes" to_port="example set input"/>
<connect from_op="Remove Useless Attributes" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="left" color="yellow" colored="false" height="76" resized="true" width="559" x="34" y="23">Problem (R4): Generate Report didn't work because the xls-output file is zero bytes, Solution: use Item Sets to Data and Association Rules to Exmaple to change FI and AR to example set inorder to write out to a file by using Write Excel or Write CSV</description>
</process>
</operator>
</process>
Comments
Hi,
good finding.
We are looking into it and investigating the matter. Stay posted for updates.
Best,
David
Thank you so much, David for your quick response.
I'm looking forward for your update.
Hi @knichcha!
Sorry for the long wait. We looked into this, and everything works as expected.
The thing about association rules is that they are calculated in an iterative manner, see Wikipedia for an explanation. If there is no support for single items (i.e. FP Growth does not return itemsets with size 1), you cannot calculate confidences for itemsets of size 2. So if you want to use the association rules operator, you need to have FP Growth calculate those single-item itemsets. I hope this helps!
Cheers
Jan
Thank you for your finding.
I understand for calucating support and confidence.
As your team have improved FP-Growth operator to limit number of itemsets, it's very good to alleviate memory use.
If you do this for Create Association Rule operator as well, it's will be great.
I means that even we limit itemsets with size 2, association rules should be create from 2-itemsets and confidence should be calculated correctly.
Thank you in advanced
moving to Product Ideas.