The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
association rules data preparation help needed
Hi all,
I am new to RapidMiner and need some help. I am working on basket analysis and found this dataset
http://www.sci.csueastbay.edu/~esuess/classes/Statistics_6620/Presentations/ml13/groceries.csv
but I struggle with data preparation. The attributes should be constructed from all possible item titles and each transaction should have either true or false in the right column. E.g I need to convert this:
1 | milk | pastry |
2 | milk | sausage |
Into this:
milk | pastry | sausage | |
1 | TRUE | TRUE | FALSE |
2 | TRUE | FALSE | TRUE |
I will appreciate any help.
Regards
Rob
0
Answers
Hi @piernik
You had to add the attribute name of your list of grocery in the .CSV file (I named it "Att1" / See attached file)
and here the process for the data preparation :
I hope it helps,
Regards,
Lionel
Hi again @piernik
Here the process with the renamed attributes (ex : Att1_butter --> butter) :
Regards,
Lionel
Hi Lionel,
Many thanks for the solution. Your idea of renaming column is pretty smart! I did not know this operator.
The solution works well if there is only one attribute. The source .csv file contains multiple transaction lines. I noticed that RapidMiner processes only the first one. For example, some transactions may look like this:
1. milk, egg, sausage
2. butter
3. egg, sugar, cake, water
When I read the csv file, RapidMiner sees only the first attribute:
1. milk
2. butter
3. egg
But this is not an issue. I have reformatted the file as a proper comma delimited and added the columns headers 'PurchaseLine01'... (see file attached) Now RapidMiner sees all the attributes.
The problem is how to create a column for each item from all PurchaseLines. So for the example, there are three transactions below, where some customers purchased two, three or four products:
The output should be:
I think it could be something along these lines:
1. Create a list of all distinctive items from all transactions and purchase lines
2. Using the above list, create a column for each product
3. Map items with TRUE/FALSE depending on what was purchased in a given transaction
I hope this makes sense.
Many thanks
Rob
Hi @piernik,
Unfortunately, I failed to achieve the transformation of your dataset with RapidMiner ' s native operators.
So, I used a Python script :
To execute this process, you have to install the Python environment on your computer and to install
the Execute Python operator (from the MarketPlace).
The process :
I hope it helps,
Regards,
Lionel
NB : My intimate conviction is that this dataset transformation is possible with RapidMiner (without scripts), so if someone has an idea, I'll be curious to know it.