The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Analyzing data from two related data sets
jeganathanvelu
Member Posts: 17 Contributor II
Hi,
I have two data sets : First data-set has application id and complaint registration time. Second data-set has application id, complaint Responses and response registration time. Second table will have multiple entries for each application.
My requirement is to identify the latest response based on response registration time in second table and map it against the application id in first table.
For mapping I can use join operator. But I dont know how to identify the latest reponse from second data-set using rapidminer.
Thanks for your help in advance,
Jegan
I have two data sets : First data-set has application id and complaint registration time. Second data-set has application id, complaint Responses and response registration time. Second table will have multiple entries for each application.
My requirement is to identify the latest response based on response registration time in second table and map it against the application id in first table.
For mapping I can use join operator. But I dont know how to identify the latest reponse from second data-set using rapidminer.
Thanks for your help in advance,
Jegan
0
Answers
maybe you could provide more information regarding your second table, otherwise it is pretty hard to give you a hint what to do next. Have you considered to add an id to your table or generate one using RapidMiner?
Cheers,
Helge
Thanks for the reply. My second table already had an ID for each entry and also a foreign key (as in RDBMS) to be used for look-up with the first table. The second table has multiple entries with the same foreign key.
While doing join I wanted to refer to the entry with latest time-stamp for each foreign key. I solved the issue by sorting the second table in descending order based on the time-stamp and used remove duplicate operator on the foreign key. This retained only entries with latest time-stamp for each foreign key. since Remove duplicate operator always retains the first entry only and removes other entries against a given attribute and I was able to do a join to get the desired result :-)