Market Basket Analysis - Operators to change data layout
Hi,
I am new to Rapid Miner and I am trying to create a Market Basket Analysis Model using the FP-Growth Operator and the Create Association Operator.
I am reading a csv file into Rapid Miner and it looks like the Figure 1 below with attributes going out to att32 and 9,835 observations. Each row represents a transaction.
Figure 1
Row |
att1 |
att2 |
att3 |
att4 |
att5 |
1 |
tropical fruit |
yogurt |
coffee |
? |
? |
2 |
whole milk |
? |
? |
? |
? |
3 |
pip fruit |
yogurt |
cream cheese |
meat spreads |
? |
4 |
other vegetables |
whole milk |
condensed milk |
long life bakery product |
? |
5 |
whole milk |
butter |
yogurt |
rice |
abrasive cleaner |
I believe the FP-Growth Operator is expecting an example data set like Figure 2 shown below. Each id corresponds to an item. The table expands to 167 items and 43,367 rows.
Row |
id_1.0 |
id_10.0 |
id_11.0 |
id_12.0 |
id_13.0 |
id_14.0 |
Tran_ID |
1 |
true |
false |
false |
false |
false |
false |
1 |
2 |
false |
false |
true |
false |
false |
false |
1 |
3 |
false |
false |
false |
false |
true |
false |
2 |
4 |
false |
true |
false |
false |
false |
false |
2 |
5 |
false |
false |
false |
false |
false |
true |
2 |
Are there operators within Rapid Miner that can transform the data from what’s in Figure 1 to something like what’s in Figure 2 that the FP-Growth Operator will like? If I can do it with item names instead of item numbers it would be even better. I had to transform the layout of the data outside of Rapid Miner to make it work.
Thanks for any help or guidance you can provide.
Answers
Mh,
good question! I would try De-Pivot on att.* to get a SQL-ish table for it, and then use Pivot and Numerical to Binominal to get what i what. Not sure if it really works but i think it does.
~Martin
Dortmund, Germany
Martin,
I will play with the operators you mentioned and see if I can get them to work. Thanks for the feedback
--Alan--