How to "unjoin" a dataset?
Hi,
I have a dataset that is the result of a join and I want to break up, back into the 2 source tables with a foreign key id in one of the tables.
For example the table I have looks a bit like this:
Invoice_id | Product_weight | Product_Color | Product_size
1 | 25 | blue | Large
2 | 5 | red | Small
3 | 17 | green | Large
4 | 15 | blue | Medium
5 | 25 | blue | Large
6 | 25 | blue | Large
I want to break it back in to 2 tables like this:
Invoice_id | Product_id
1 |1
2 |2
3 |3
4 |4
5 |1
6 |1
and
Product_id | Product_weight | Product_Color | Product_size
1 | 25 | blue | Large
2 | 5 | red | Small
3 | 17 | green | Large
4 | 15 | blue | Medium
How do I do this?
Answers
hi @mdale9 - so I assume you created this via a left-hand-join like this, joining on "Product_id" as your key:
Hence the result is what you say:
So if this is the case, to "unjoin" is to simply select the first two attributes (Invoice_id and Product_id) from the result as this is exactly what join did to make them in the first place (that is left join):
As for the other side, when you join you create duplicates. So removing duplicates should do the trick:
Here's the process from soup to nuts:
Example sets are attached.
Scott
Hi Scott,
Not quite. If you look at my original dataset, I don't have a product_id in the joined dataset and hence can't extract it, unlike in your Exampleset(Join). I need to remove duplicates, generate a new product_id and then figure out a way of getting that original link between invoice and product.
thanks for taking the time to look at this though.
Martin