The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
how do we join multiple datasets by using both ID and date together to stitch into one database
i have historical stock data which I am trying to put together in the same row with historical financial statements, both quarterly and yearly, to match them I want to use the ID first which will match both financial and historical data of a stock and then match it by date accordingly, for example i would want yearly reports of IBM launched on 1st Jan 2019 to be added in the same row with IBM's stock price data for that whole year starting from Jan 1 2019, and then match next year's yearly reports with next year's stock price, so instead of using one ID for joining datasets i want to use two IDs that is the stock id and date range.
0
Best Answer
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
If you're already using a database or planning to use it, that's also the best place for joining information.
If you want to do it in RapidMiner (or even in the database), you first need to define the join criteria. For example, if the yearly report affects the year 2019 and you want to join it with daily stock data of the same year, the common criterion would be the year 2019. Similarly with a quarterly report.
RapidMiner provides some tools for this. Date to Numerical helps you with extracting components of a date (year, month, day, quarter). Generate Attributes has many functions for date/time handling. And Join merges different data sets based on one or more common attributes. For example, if you have "Company Name" and "Period" as the prepared common attributes, you would specify these to join the two tables.
If you need more complex joins, or joins on inequality, you could check out the Database Envy extension.
If the data are already in a database, and it is supported by the In-Database Processing extension, you can comfortably join the data inside the database using RapidMiner operators (no SQL knowledge needed), giving you the best performance. (It's hard to beat modern databases in join speed.)
Regards,
Balázs5
Answers