The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Market Basket Analysis: First Timer
WindsAloft
Member Posts: 11 Contributor II
Okay, so first of all, the tutorials are nice, I watched them all but still cannot figure out how to do a Market Basket Analysis.
I got so frustrated with the error messages that I deleted everything I created and I'm starting over and typing this step by step so maybe someone can point out my mistake.
1. Open RapidMiner
2. File, Import Data, Import CSV file
3. I selected a .csv file which I am using as a sample. It has 3 headers and sample data
CustomerID, itemID, itemCount
4. Wizard suggests CustomerID to be Nominal, itemID to be Nominal, itemCount to be integer
Here is a sample row of my data: CustomerID, D21953; itemID, E3; itemCount, 1;
5. Wizard suggests I set all roles as Regular
6. I choose my Local Repository as the location and name it DATA
7. I go to File, Open Template, Market Basket Analysis, Next
8. I leave the Values the same, since I made my example headers to match perfectly.
9. For Retrieve.repository_entry, I manually type in //My Repository/DATA Because when I click the little folder and select DATA in my repository, it stays blank.
I show 3 red errors.
"The Attribute customerIDAttributeName is missing in the input example set" - from Pivot
"The Attribute itemIDAttributeName is missing in the input example set" - from Pivot
"The Attribute customerIDAttributeName is missing in the example set" - from Set Role (quickfix)
Now what?
I got so frustrated with the error messages that I deleted everything I created and I'm starting over and typing this step by step so maybe someone can point out my mistake.
1. Open RapidMiner
2. File, Import Data, Import CSV file
3. I selected a .csv file which I am using as a sample. It has 3 headers and sample data
CustomerID, itemID, itemCount
4. Wizard suggests CustomerID to be Nominal, itemID to be Nominal, itemCount to be integer
Here is a sample row of my data: CustomerID, D21953; itemID, E3; itemCount, 1;
5. Wizard suggests I set all roles as Regular
6. I choose my Local Repository as the location and name it DATA
7. I go to File, Open Template, Market Basket Analysis, Next
8. I leave the Values the same, since I made my example headers to match perfectly.
9. For Retrieve.repository_entry, I manually type in //My Repository/DATA Because when I click the little folder and select DATA in my repository, it stays blank.
I show 3 red errors.
"The Attribute customerIDAttributeName is missing in the input example set" - from Pivot
"The Attribute itemIDAttributeName is missing in the input example set" - from Pivot
"The Attribute customerIDAttributeName is missing in the example set" - from Set Role (quickfix)
Now what?
Tagged:
0
Answers
Looks like the lights are on but nobody is at home, so let me confuse you further...
I've used the same template, and it needs some attention, specifically it uses macros ( the RM equivalent of variables which show as %{XXXX} in parameters ), but does not assign values to them, so no wonder it confuses you! I've butchered a template by replacing the data call with a generator, like this... Actually there are also relevant samples ( 1-25 and 2-23 ), and this subject has raised its ugly head before, as a quick seach for "Market Basket" shows.
Pip Pip ;D
I tried your process but I didn't necessarily get any results that I could see.... but I am going to mess around with this.
Thanks for the reply!
still any problems?
Greetings,
Sebastian
Whats weird is, I actually *get* results with the process above (it generates its own recordset). My OWN recordset, has an ID field which is text, so when I replace the first process with a retrieve, everything transitions fine except I don't get any results. And I'm betting the nominal field is the problem.
I've tried adding a Type Conversion process in between: Nominal to Binomial. But that didn't work either.
Mea maxima culpa :-[ I pasted in completely the wrong code.. this is what should have been there... Hope that goes a bit better!
And is it okay if I still get the caution for FP-Growth that regular attributes must be binomial? I will experiment with this and see if I can put my own dataset into the input, and see if it works.
Perhaps that is my problem. I get the warning with the process you posted, but it actually is successful despite the warning, probably because the ID's are numbers?
Yep, you can bin the gray jobs, and you can ignore the warning, especially as it all runs OK. So all you need to do is replace the example generator, and all should be well....
Make sure that your meta-data matches on attribute Name and Content
Role Name Content
id Id nominal
regular Item nominal
regular Amount integer
Don't think so, concentrate first on loading the data and seeing ( from the meta-data ) that RM thinks it has data as I described.
Instead of doing that, I'll simply create a new set of data which has the names and content you describe above. That should eliminate the possibility that I was making mistakes while reconfiguring the different processes.
I've imported this CSV format as a data repository, and substituted that repository for the generator
Id,Item,Amount
E30098AE,Product0001,1
E230843F,Product0001,1
E230289D,Product0002,2
E30098AE,Product0002,1
E230843F,Product0001,1
E230289D,Product0002,2
And it works ( in the sense it doesn't fall over ).
;D
Now it doesn't break. But my association rules are blank. However this might mean I'm filtering out rules that might have existed in my data, but didn't meet a criteria.
To get the maximum number of results, I set
FP-Growth
min number of items = 0;
positive value = [blank]
min support = 0
max items = -1
must contain = [blank]
Create Association Rules
Min Conf = 0
Gain theta = 0
laplace k = 0
But still can't see rules.
Some real rows from my data that I have, that I would expect some sort of rule would be:
Id Item Amount
D11131 E1 1
D11131 E5 1
D11124 E5 1
D11125 E5 1
I should see a rule appearing for E1 -- E5 right?
Now we're on the rigth track, I'm thinking my example data isn't very good.
Happy dredging!
Could you help me find the criteria that could be the maximum results? or did I have it right with my previous post?
Take it step by step. First thing is to understand about frequent item sets, and the parameters for their generation. If in doubt, as always, check out Wikipedia. Then do the rule building end.
pip pip
I am just reading this post here, it is very good. I have a question- how would i modify the code to include a zip code, therefore providing associations rules by zip code for each?
Thanks you
Thanks
do you want a rule set per zip code? Then you would have to split your data according to the zip codes and perform the process on each of this subsets. You could do this with an filter Examples and a loop value operator.
Greetings,
Sebastian
Yes it is a per zip code.
I am using the code as shown Where exactly can i put these into this? Rather than zip code i am looking at State.
Thanks you
I want to show the associations by State as results one after another.