Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Analyzing data from two related data sets

jeganathanvelujeganathanvelu Member Posts: 17 Contributor II
edited November 2018 in Help
Hi,

I have two data sets : First data-set  has application id and complaint registration time. Second data-set has application id, complaint Responses and response registration time. Second table will have multiple entries for each application.

My requirement is to identify the latest response based on response registration time in second table and map it against the application id in first table.

For mapping I can use join operator. But I dont know how to identify the latest reponse from second data-set using rapidminer.

Thanks for your help in advance,
Jegan

Answers

  • homburghomburg Employee, Member Posts: 114 RM Data Scientist
    Hi Jegan,

    maybe you could provide more information regarding your second table, otherwise it is pretty hard to give you a hint what to do next. Have you considered to add an id to your table or generate one using RapidMiner?

    Cheers,
    Helge
  • jeganathanvelujeganathanvelu Member Posts: 17 Contributor II
    Hi,

    Thanks for the reply. My second table already had an ID for each entry and also a foreign key (as in RDBMS) to be used for look-up with the first table. The second table has multiple entries with the same foreign key.

    While doing join I wanted to refer to the entry with latest time-stamp for each foreign key. I solved the issue by sorting the second table in descending order based on the time-stamp and used remove duplicate operator on the foreign key. This retained only entries with latest time-stamp for each foreign key. since Remove duplicate operator always retains the first entry only and removes other entries against a given attribute and I was able to do a join to get the desired result :-)
Sign In or Register to comment.