The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Pivoting a table using Unique/Distinct Values
I have got data that looks like this table(I have simplified the table, but the idea is to get the number of unique product ids form unique columns and insert discount id where customer id meets product id, otherwise it will be a missing value(?)):
CustomerID | Product | Discount ID |
---|---|---|
Customer1 | Product1 | D1 |
Customer1 | Product2 | D3 |
Customer2 | Product1 | D2 |
Customer3 | Product3 | D3 |
I would like to turn this into this form:
CustomerID | Product1 | Product2 | Product3 |
---|---|---|---|
Customer1 | D1 | ? | ? |
Customer1 | ? | D3 | ? |
Customer2 | D2 | ? | ? |
Customer3 | ? | ? | D3 |
I have played around with the PIVOT operator but I am struggling to achieve this.
Thanks very much in advance!
0
Answers
Something like this?
Thanks T-Bone, not exactly in that format(see 2 rows for Customer1 in original post), thats how far I got using PIVOT operator, however in my case I'd need Customer1 to have 2 rows/examples; one for Product1 with with a value D1 while the rest are missing values and the second example with a value of D3 on Product2 with the rest as missing values. It is this bit of not being able to GROUP BY CustomerID which is making the whole thing complex.
I appreciate your efforts, thanks.
Well @Chiko, this was an interesting problem. For starters, as already shown, it is quite easy to turn your original dataset into the following table structure, using just the pivot operator and then doing some renaming:
CustomerID Product1 Product2 Product3
Customer1 D1 D3 ?
Customer2 D2 ? ?
Customer3 ? ? D3
This has one row per customer id, and separate attributes for each product discount combination. But to get to the next version, to split it into separate rows for each customer id & product id combination, required several additional steps. The attached process file should do it for you (the initial read csv operator simply starts with your original data provided in the first format), yielding a final table with the structure you want.
However, this was a non-trivial exercise, and if you had a lot of different products, you would need to do more work to build it out. If I were you, I would think about whether the structure you are looking for is really necessary compared to the 2nd version above, which is quite simple to produce.
I hope this helps!
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks very much Telcontar120. Yes this is a hard one. My product run into thousands. I think I will prep the data outside RapidMiner using dynamic SQL.
Thank you very much!