The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Preprocessing market basket data
Hi,
I m a student from Pakistan. I am not much familiar with Rapidminer. I am given a task of market basket analysis and have almost 10,000 rows of data to apply FP-growth and apriori.
My given data is in the format:
1 cheese, bread, milk
2 milk cake
3 cake, cheese, milk
and for apriori algorithm I need to convert data into binary matrix format like:
TID | cheese bread milk cake
1 | 1 1 1 0
2| 0 0 1 1
3| 1 0 1 1
how can I preprocess my data in rapidminer to get this format
thanks in advance
Tagged:
0
Answers
Hi @RobotGirl,
For the moment, I don't know how to perform your data transformation with RapidMiner's native operators.(I will think about it)
So waiting, I propose a Python script :
I assume that your initial dataset is under this form :
By executing the process, you obtain a dataset like that :
the process :
To execute this process you need :
- to install Python on your computer
- to install Execute Python operator (from the marketPlace)
I hope it helps,
Regards,
Lionel
thanks for your respond @lionelderkrikor but my task is to use a rapidminer tool without any external coding.
You can do it directly with the new version of the FP-Growth operator.
Your dataset (CSV file should be like this):
id;basket
1;cheese,bread,milk
2;milk,cake
3;cake,cheese,milk
Please notice the ';'. This is the column separator. So this dataset has only two colums 'id' and 'basket'.
Read it into your repository. It should look like the image below:
Set the first column to the role of ID.
When you use the FP-growth operator make sure that in Input format you select 'items list in a column' and the item separators is set to ','.
Now run the process below:
I got curious about this question.
How would you preprocess the original CSV to replace the first ',' with a ';'?
A few minutes later of googling the answer:
1) Open the CSV in any decent editor (atom,ultraedit,notepad++,etc)
2) Find:
3) Replace
Regex, of course. I should learn more Regex.
and you don't need to use a editor but can use rapidminer's Replace operator for it
Cheers,
Martin
Dortmund, Germany