The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Market Basket not getting results"
Hi everyone!
This is my first post in this board so I gotta tell you that I love rapidminer.
I think I'm going to use it very often in the future.
At the moment I'm trying to create a market basket analysis for the following data set:
About 350.000 transactions
Transaction-Id;Item-Id;Sales Value (I also inserted an "amount" value on how many pieces of a product were bought)
An exapmle:
Transaction-Id;Item-Id;Sales Value
525344;585555;24,80
525344;158065;12,85
524634;158065;12,85
...
I went through all the templates and tutorials in RM 7 and also tried several solutions from the board or from external pages (always renaming the column titles and choosing the correct attribute-type (even I can't find all of the suggested ones in RM 7)) but I can't get any results as either
1., The process runs out of memory (tried it with 4 GB and 16 GB Macs as well as on a 4 GB, 64 bit Windows 10 machine)
2., The process ends but doesn't show any results
For 1: I also tried splitting the data so the number of rows gets smaller
Does anyone have an idea on how to get this done?
Thank you very much in advance!
This is my first post in this board so I gotta tell you that I love rapidminer.
I think I'm going to use it very often in the future.
At the moment I'm trying to create a market basket analysis for the following data set:
About 350.000 transactions
Transaction-Id;Item-Id;Sales Value (I also inserted an "amount" value on how many pieces of a product were bought)
An exapmle:
Transaction-Id;Item-Id;Sales Value
525344;585555;24,80
525344;158065;12,85
524634;158065;12,85
...
I went through all the templates and tutorials in RM 7 and also tried several solutions from the board or from external pages (always renaming the column titles and choosing the correct attribute-type (even I can't find all of the suggested ones in RM 7)) but I can't get any results as either
1., The process runs out of memory (tried it with 4 GB and 16 GB Macs as well as on a 4 GB, 64 bit Windows 10 machine)
2., The process ends but doesn't show any results
For 1: I also tried splitting the data so the number of rows gets smaller
Does anyone have an idea on how to get this done?
Thank you very much in advance!
Tagged:
0
Answers
Is it the conversion to binominal?
Yes, most of the time it happened there but I think also sometimes at fp-growth or at create attribute sets.
At the moment, it always runs all the way to the results view but there is no result shown.
First get the dataset with the binominal conversion stored. (Use the Store operator).
Ths has two advantages, first it saves memory by breaking it up.
Second it saves time, because if there is a problem in the way your FP Growth has been setup so it isn't actually finding associations then you don't need to wait for the binominal conversion before you try again.
The saved data set only shows three rows (row no.; invoice; sum(orders)), wehere every value in sum(orders) is "true".
Also; it only shows 68.515 examples out of over 300.000 in the original data set.
What does that mean?
Thanks!
It sounds like there is something not quite right there. Possibly using an aggregate operator in the wrong place.
(Don't worry about the data, just the process XML is fine, you can get this by going to View -> Show Panel -> XML )
Here is the xml, hope this helps: Thanks
Thanks!
What about after?
Row No.; Invoice; sum(Orders)
Orders is automatically generated using the operator "generate attributes", setting each value to "1".
Why is "Orders important"?
Wouldn't it be sufficient to only have the invoice- and product-number?
Thanks
Run this process (from the RapidMiner 7 templates) and have a look at the breakpoint before FP Growth.
I would expect in your process to see similarly:
Invoice - Product1 - Product2 - Productetc
999999 - True - False - False
etc
Have a check over your process and data again to convert it into this format just before FP Growth.
I just create the attribute "Orders" on the fly using a processor.
I now created the attribute "Orders" in Excel and imported the data again so it looks exactly as the sample provided by RM. (including same attributes types and roles)
Unfortunately, I still only get three columns; where the last one (sum(Orders) says "true" in each row.
Is "Orders" really necessary?
What I understand from this column is that this one shows how many pieces of the specific product was bought in one order, correct?
Or is it about how many products are in an order? (This would make a huge difference.
Thanks!
Invoice "647991", 4x "Product 15", each set to Orders "1".
Why is one product even listed several times in one Invoice?
Does anyone have an idea on how to get this analysis done?
Thanks!