The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Market basket analysis data preprocess

newjop228newjop228 Member Posts: 3 Learner I
edited October 2020 in Help
Hi all! I am extremely new to Rapidminer (only got introduced recently). I am trying to conduct a market basket analysis with the given CSV file. The example is something like this:

ID,Item
C1,yogurt,cheese roll,cat pork
C2,chicken,yogurt,pork,soda,whipped/sour cream
C3,beef
C4,onions,liquor
C5,soda,whipped/sour cream
...
C2000,soda,cheese roll,yogurt

I understand I will need to apply FP-Growth followed by Association rule in Rapidminer. I have read through a few previous post but can't understand how to process this set of data.

I am trying to get it into a binary matrix where it will be like then apply the association rule.

Is there an easy way in Rapidminer 9.8 that can quickly transform this set of data into binary matrix or any other ways to preprocessed this kind of data for market basket analysis? If yes, how should I do it?

End result should be something like:
     yogurt   cheese roll....
C1    1               1
C2    1               0
C3    0               0
....
where x axis will be all the unique products in the basket
y axis is the customer ID

Thanks!

Best Answer

Answers

  • newjop228newjop228 Member Posts: 3 Learner I
    Hey @mschmitz , thanks for the reply!

    Cool that is great! I have tested it out but my results show nothing.

    Operators:
    Retrieve -> FP-Growth -> Create association rules -> output

    Results:
    FrequentItemSet(FP-growth) - no itemset found
    AssociationRules(Create association rules) - No rules found

    Any idea?
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    did you try to lower the min_support setting of FP-Growth? This basically controls how strong the rule needs to be to find it. The lower the more results you get.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • newjop228newjop228 Member Posts: 3 Learner I
    Thanks @mschmitz. May I ask one last thing - for association rule, do I have to arrange the products to have an effective analysis? That is, column 1 will be "yogurt" and all customerID, column 2 will be "chicken". Or the association rule do not require me to arrange these products?
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    order should not make a difference.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.