The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"What is the best dataset form to mining using fp-growth algorithm in RM?"
brenda_natasha
Member Posts: 1 Learner I
Anyone knows the best criteria or at least the rules for dataset that want to be mined using fp-growth?
And about the form, which one is better?
1. order_id | item1 | item 2 | item 3
or
2. order_id | item {}
or
3. order_id | book (T/F) | pencil (T/F) | bag (T/F)
because every example i read always use #2 form but what about the #1 and #3??
And about the form, which one is better?
1. order_id | item1 | item 2 | item 3
or
2. order_id | item {}
or
3. order_id | book (T/F) | pencil (T/F) | bag (T/F)
because every example i read always use #2 form but what about the #1 and #3??
0
Answers
It would not affect the outcome as long as you have information related to the order id and the items id.
The real difference is on the performance when you try to explore your data.
on case 1 and 3 you may have a column for each of the products depending on your use case it could be any number of columns and as it grows the array is bigger and the resources used by your computer would be bigger.
The main difference between 1 and 3 would be having binary encoding vs quantity of products on the order. Since the he amount ordered of each producto doesn't impact the outcome of the rule either way is ok.
At the end the process would transform the DataSet(DS) to a binary Matrix.
I prefer form 2 since you only need 2 columns on your DS an its easier to obtain that structure out of any transaccional software.
Hope this answers you question.
Best regards.