The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Market basket analysis data preprocess
Hi all! I am extremely new to Rapidminer (only got introduced recently). I am trying to conduct a market basket analysis with the given CSV file. The example is something like this:
ID,Item
C1,yogurt,cheese roll,cat pork
C2,chicken,yogurt,pork,soda,whipped/sour cream
C3,beef
C4,onions,liquor
C5,soda,whipped/sour cream
...
C2000,soda,cheese roll,yogurt
I understand I will need to apply FP-Growth followed by Association rule in Rapidminer. I have read through a few previous post but can't understand how to process this set of data.
I am trying to get it into a binary matrix where it will be like then apply the association rule.
Is there an easy way in Rapidminer 9.8 that can quickly transform this set of data into binary matrix or any other ways to preprocessed this kind of data for market basket analysis? If yes, how should I do it?
End result should be something like:
yogurt cheese roll....
C1 1 1
C2 1 0
C3 0 0
....
where x axis will be all the unique products in the basket
y axis is the customer ID
Thanks!
ID,Item
C1,yogurt,cheese roll,cat pork
C2,chicken,yogurt,pork,soda,whipped/sour cream
C3,beef
C4,onions,liquor
C5,soda,whipped/sour cream
...
C2000,soda,cheese roll,yogurt
I understand I will need to apply FP-Growth followed by Association rule in Rapidminer. I have read through a few previous post but can't understand how to process this set of data.
I am trying to get it into a binary matrix where it will be like then apply the association rule.
Is there an easy way in Rapidminer 9.8 that can quickly transform this set of data into binary matrix or any other ways to preprocessed this kind of data for market basket analysis? If yes, how should I do it?
End result should be something like:
yogurt cheese roll....
C1 1 1
C2 1 0
C3 0 0
....
where x axis will be all the unique products in the basket
y axis is the customer ID
Thanks!
Tagged:
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi @newjop228 ,for awhile you do not need to change the format anymore, since we changed the FP-Grwoth operator to also work with list data like yours. The format you have is called 'item list in column'. It is the first parameter of Fp-Growth.Cheers,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5
Answers
Cool that is great! I have tested it out but my results show nothing.
Operators:
Retrieve -> FP-Growth -> Create association rules -> output
Results:
FrequentItemSet(FP-growth) - no itemset found
AssociationRules(Create association rules) - No rules found
Any idea?
Dortmund, Germany
Dortmund, Germany