The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Joining tables based on closest date value
data1maths
Member Posts: 27 Contributor II
Hello everyone,
I possess tow different tables that i want to join, the problem is that there no IDs from both of them to equal and get the join, the only way is to join them based on the closest value from two date columns.
The fisrt table contains this date column 1 :
01/01/2016 00:00
01/01/2016 00:10
01/01/2016 00:20
01/01/2016 00:30
01/01/2016 00:40
01/01/2016 00:50
01/01/2016 01:00
01/01/2016 01:10
01/01/2016 01:20
01/01/2016 01:30
01/01/2016 01:40
01/01/2016 01:50
01/01/2016 02:00
01/01/2016 02:10
01/01/2016 02:20
01/01/2016 02:30
01/01/2016 02:40
01/01/2016 02:50
01/01/2016 03:00
01/01/2016 03:10
01/01/2016 03:20
01/01/2016 03:30
01/01/2016 03:40
01/01/2016 03:50
01/01/2016 04:00
01/01/2016 04:10
01/01/2016 04:20
01/01/2016 04:30
01/01/2016 04:40
01/01/2016 04:50
01/01/2016 05:00
01/01/2016 05:10
01/01/2016 05:20
01/01/2016 05:30
ETC
And in the other table the column corresponds to :
01/01/2016 05:07
01/01/2016 07:10
01/01/2016 08:19
01/01/2016 08:27
01/01/2016 09:18
01/01/2016 10:13
01/01/2016 10:23
01/01/2016 10:30
01/01/2016 10:57
01/01/2016 12:20
01/01/2016 14:50
01/01/2016 14:54
01/01/2016 15:00
01/01/2016 15:20
01/01/2016 16:12
01/01/2016 18:26
01/01/2016 19:08
01/01/2016 20:00
01/01/2016 21:15
01/01/2016 21:20
01/01/2016 22:10
01/01/2016 22:13
01/01/2016 22:18
ETC
I DO NEED YOUR HELP.
Thank you in advance.
Best regards
0
Answers
Hi!
If your tables are not too large, you could do a Cartesian join (everything with everything), calculate the difference, take the absolute value (so -7 = 7), group by one of the timestamps, calculate the minimum per group, and join on (timestamp, minimum difference) to only keep this example.
If your tables are larger, you might want to do this in batches, e. g. only select data from an hour (+/- 10 minutes) for one round.
Regards,
Balázs
Another approach would be to convert the second series to numerics, and then do some mod arithmetic to round to the nearest 10 minutes, and then turn them back into date/times to match the first series. That should enable you to do the join pretty smoothly.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Hi BalazsBarany and Telcontar120
If only you can detail your solutions a bit more !!! I've tried Telcontar's suggestion but couldn't get it right !
Best regards
Here's a simple process doing what I described, it will take a date/time stamp and round it to the nearest 10 minute interval. That can then be used to join to your first dataset. The first subprocess just creates the data, and the date transformation could actually all be done inside a single Generate Attributes operator, but I split it out using the Date to Numericals so you could better see what was happening.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts