The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Market Basket Analysis"

lawrence_sljlawrence_slj Member Posts: 2 Contributor I
edited May 2019 in Help
Hi,

I am working on Market Basket Analysis.
I am importing data through excel (with yes, no as options) and used FP Growth and Association Rule Generator algorithm to get the results. I have given minimum support value as .50.

Frequency
Count Percentage
Brand 1 237 11%
Brand 2 220 10%
Brand 3 1702 81%
Brand 4 1242 59%
Brand 5 727 34%
Brand 6 316 15%
Brand 7 1182 56%
Brand 8 154 7%
Brand 9 142 7%
Brand 10 449 21%
Brand 11 135 6%
Brand 12 69 3%
Brand 13 44 2%
Brand 14 41 2%
Brand 15 84 4%
Brand 16 32 2%
Brand 17 72 3%
Brand 18 235 11%
Brand 19 18 1%
Brand 20 78 4%
Brand 21 113 5%
Brand 22 1586 75%
Brand 23 1504 71%
Brand 24 1045 50%
Brand 25 631 30%
Brand 26 37 2%
Brand 27 326 15%
Brand 28 86 4%
Brand 29 99 5%
Brand 30 557 26%
Brand 31 264 13%
Brand 32 183 9%
Brand 33 1705 81%
Brand 34 864 41%
Brand 35 56 3%
Brand 36 1244 59%
Brand 37 539 26%
Brand 38 821 39%
Brand 39 529 25%
Brand 40 64 3%
Brand 41 61 3%
Brand 42 233 11%
Total 2110

Support Table
Support
Brand 39 0.749
Brand 37 0.745
Brand 30 0.736
Brand 25 0.701
Brand 5 0.655
Brand 38 0.611
Brand 34 0.591
Brand 7 0.560
Brand 24 0.505
On the results I am getting very low support for Brand 3, which is answered by most of respondents (please refer the above tables). The same is happening for Brand 33, Brand 22 etc. What could be the reason for it? Is that correct? Or am I doing any mistake?

Logically if we say, the brands which are having highest frequency should get higher support.

Also, in my data most of the brands are less than having 20% frequency on total. Will it affect the results?

Please help me to understand.


Thanks,
Lawrence

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    the problem might be that RapidMiner is not getting that "yes" is actually the internally used "true" value and might think that "no" is the value which is used for the internal true (although it's a great program it's not able to guess from natural language  ;) ). For this purpose, you could use the operator "Remap Binominals" can be used to ensure that "yes" is actually taken as the "true" value.

    Cheers,
    Ingo
  • lawrence_sljlawrence_slj Member Posts: 2 Contributor I
    Thanks a lot Ingo.

    I am able to get the results after using Nominal to Binomial operator. Here is the syntax,

      <operator name="Root" class="Process" expanded="yes">
        <operator name="ExcelExampleSource" class="ExcelExampleSource">
            <parameter key="excel_file" value="C:\Users\Lawerence\Desktop\Mystudy - 040711.xls"/>
            <parameter key="first_row_as_names" value="true"/>
            <parameter key="create_label" value="true"/>
        </operator>

        <operator name="Nominal2Binominal" class="Nominal2Binominal">
            <parameter key="return_preprocessing_model" value="true"/>
            <parameter key="create_view" value="true"/>
            <parameter key="use_underscore_in_name" value="true"/>
        </operator>

        <operator name="FPGrowth" class="FPGrowth">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.2"/>
        </operator>

        <operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
            <parameter key="min_confidence" value="0.2"/>
        </operator>
    </operator>


    However it gives results for Brands which are not selected by respondents i.e., Brand 3 (answered "No") & Brand 5 (answered "No").
    Premises Conclusion Support Confidence
    Brand 3_No Brand 5_No 0.479 0.608
    Brand 2_Yes Brand 1_Yes 0.492 0.609

    By default the algorithm should identify "Yes" in all the brands and calculate Support, Confidence etc.

    After adding Nominal to Binomial operator it gives results for "No" also. Is there a way to not to show "No" calculation from results?

    Thanks in advance,
    Lawrence
Sign In or Register to comment.