🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉

RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance


how do we join multiple datasets by using both ID and date together to stitch into one database

hellomotohellomoto Member Posts: 5 Learner I
i have historical stock data which I am trying to put together in the same row with historical financial statements, both quarterly and yearly, to match them I want to use the ID first which will match both financial and historical data of a stock and then match it by date accordingly, for example i would want yearly reports of IBM launched on 1st Jan 2019 to be added in the same row with IBM's stock price data for that whole year starting from Jan 1 2019, and then match next year's yearly reports with next year's stock price, so instead of using one ID for joining datasets i want to use two IDs that is the stock id and date range.


  • hellomotohellomoto Member Posts: 5 Learner I
    my brother wrote code that does exactly what I want but it takes hours to just match data just for one stock, I have at least 15 thousand of them to match, and I am confident that if there is a solution in rapidminer for this, it will take a fraction of time compared to what the python code is taking.
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 485   Unicorn

    If you're already using a database or planning to use it, that's also the best place for joining information.

    If you want to do it in RapidMiner (or even in the database), you first need to define the join criteria. For example, if the yearly report affects the year 2019 and you want to join it with daily stock data of the same year, the common criterion would be the year 2019. Similarly with a quarterly report.

    RapidMiner provides some tools for this. Date to Numerical helps you with extracting components of a date (year, month, day, quarter). Generate Attributes has many functions for date/time handling. And Join merges different data sets based on one or more common attributes. For example, if you have "Company Name" and "Period" as the prepared common attributes, you would specify these to join the two tables.

    If you need more complex joins, or joins on inequality, you could check out the Database Envy extension.

    If the data are already in a database, and it is supported by the In-Database Processing extension, you can comfortably join the data inside the database using RapidMiner operators (no SQL knowledge needed), giving you the best performance. (It's hard to beat modern databases in join speed.)

Sign In or Register to comment.