The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Simple Market basket Analysis
Hi Community,
i am a German student and i've got the task to make a market basket analysis (i hope its correctly translated). It sounds very simple (and maybe it is ) so i will start explaining:
My Data:
The BonID is unimportant, the BonNr is the Number of a Bon (I think its bill) and the ArtikelBez is the name of the article on this token.
For Example on the first bill is Feinwaschmittel, Kaugummi and Vollwaschmittel.
My task is now to "see" "association purchases". For example 'Feinwaschmittel' is allways bought together with 'Vollwaschmittel'.
I've done tutorials und testet RM for multiple hours but i dont get it. Maybe because of my bad english Can someone please explain me, which RM-Components i need? Of course Apriori and/or FPGrowth but in which order, which settings and why?? ^^ And which other components?
Thanks a lot and excuse my bad English.
Best Regards
Marianne Rose
i am a German student and i've got the task to make a market basket analysis (i hope its correctly translated). It sounds very simple (and maybe it is ) so i will start explaining:
My Data:
BonID,BonNr,ArtikelBez 1,1,Feinwaschmittel 2,1,Kaugummi 3,1,Vollwaschmittel 4,2,Feinwaschmittel 5,2,Hose 6,2,Vollwaschmittel 7,3,Kaugummi 8,3,Hose 9,3,Schuhe |
For Example on the first bill is Feinwaschmittel, Kaugummi and Vollwaschmittel.
My task is now to "see" "association purchases". For example 'Feinwaschmittel' is allways bought together with 'Vollwaschmittel'.
I've done tutorials und testet RM for multiple hours but i dont get it. Maybe because of my bad english Can someone please explain me, which RM-Components i need? Of course Apriori and/or FPGrowth but in which order, which settings and why?? ^^ And which other components?
Thanks a lot and excuse my bad English.
Best Regards
Marianne Rose
0
Answers
[I'm using version 4.6. These things might be different in version 5. I haven't made the transition yet].
You have the classical case of MBA. There are 2 ways in which your data might be formatted:
1) A Binary Matrix which is just one variable that identifies uniquely the transaction + n columns that represent the different products available at the store.
Each row is a transaction. You identify the products a customer buys by entering 1s in the corresponding columns. Example
tid, bananas, apples, pears, grapes
1, 1, 0, 0, 1
2, 0, 1, 0, 1
3, 1, 1, 1, 0
The first transaction includes bananas and grapes. The second apples and grapes. The third bananas, apples and pears.
( To obtain Rules from this type, you would read the data into Rapid-I, transform the 1s and 0s into Trues and False with an operator Numeric2Binomial and then apply FPGrowth + Rules Generator. )
2) Two columns: one for the unique transaction ID, the other one for the product (there may be others indicating other info : items bought, date, discounts, etc). This is obviously the most efficient way to store the information. You would represent the example above in the following way
tid, product
1, bananas
1, grapes
2, apples
2, grapes
3, bananas
3, apples
3, pears.
Your data are formatted in the second way. The nice people at Rapid-I have written the code to read and process that info in Rapid-I. Take a look at the sample code Transaction2Basket.xml that you can find in the folder \samples\Preprocessing\.
Apriori and FPGrowth are two different algorithms for finding Frequent Items. From this you can construct Association Rules. Take a look also at the sample code \samples\Learner\AssociationRules.xml.
The best way to learn this program is to go thru the examples provided by Rapid-I in the samples folder.
This should get you started.
If i have other questions for the next tasks, i know where i have to ask, after i made it thru the examples.
But i think the examples will tell me everything i want to know
If anyone needs a simple tutorial for the beginning::
http://rapid-i.com/videos/rapidminer_tour_3_4_de.html (ger)
http://rapid-i.com/videos/rapidminer_tour_3_4_en.html (eng)
Thanks a lot jlo.
Until next time
Greetings Marianne
i have a similar mba problem. i am using rapiminer 5. my data is structured in the second way (but 4 columns total, but the interesting ones should only be bonID, and articleNR). i selected bonID as "ID" and articleNR is just "regular" (or should it be "label"?).
after loading the data, i added the "nominal to binary" operator and connected it to the "fp-growth" operator (pre to exa). rm complains that "Meta data is underspecified. Cannot check precondition". wich attribute role must have been set for "articleNR"? or is it anything else that is wrong? also i am not sure about the "nominal to binominal" operator. what has to be set there?
to set the record straight, my aim is just a simple mba (apriori or fp-g), this is how my data looks like:
attrib_1 attrib_2 transacionNR aticleNR
abc yxz 1 321
kdd dms 1 654
bic fsf 1 789
osi fpg 2 258
ais mss 2 159
=> two baskets with three and two articles ([321,654,789] and [258,159])
thanks in advance!
Sorry but I don't see any difference between the format of your data and
this format:
tid, item
1, bananas
1, grapes
2, apples
2, grapes
3, bananas
3, apples
3, pears.
You should be able to use the sample program to perform the transformation to a binary matrix and then the finding of itemsets and association rules.
Filter the variables you don't need (like attrib1 and atrib2) and declare your variable transactionNR as TID and articleNR as ITEM. I don't know if your problems are related to the version of the program you are using but I truly doubt it. I use version 4.6.
id label a1 a2 a3 a4
id1 iris-seto 5.1 3.5 1.4 0.2
id2 iris-seto 4.9 3.0 1.4 0.2
id3 iris-vers 4.7 3.2 1.3 0.2
there the preprocessing operator chain consists of the frequency discretization operator, which discretizes numerical attributes by putting the values into bins of equal size an the conversion (of those bins) into true and false. there you get a different type of output (from the preprocessing) it looks something like that:
id label a1range1 a1range2 a1range3 a1range4 a2range1 a2range2 a2range3 a2range4 etc....
id1 iris-se false true false false false false false false
id2 iris-se true false false false false false true false
that means that you have every basket in one row. the type of date i am using (similar to the type you described) has the baskets spread over several rows. on top i have a few thousands of different articles. which preprocessing steps have to be done to get frequent patterns created?
thanks again!
It's hard to transpose this data to binary matrix.
Any help?