Options

Analyzing data from two related data sets

jeganathanvelujeganathanvelu Member Posts: 17 Contributor II
edited November 2018 in Help
Hi,

I have two data sets : First data-set  has application id and complaint registration time. Second data-set has application id, complaint Responses and response registration time. Second table will have multiple entries for each application.

My requirement is to identify the latest response based on response registration time in second table and map it against the application id in first table.

For mapping I can use join operator. But I dont know how to identify the latest reponse from second data-set using rapidminer.

Thanks for your help in advance,
Jegan

Answers

  • Options
    homburghomburg Moderator, Employee, Member Posts: 114 RM Data Scientist
    Hi Jegan,

    maybe you could provide more information regarding your second table, otherwise it is pretty hard to give you a hint what to do next. Have you considered to add an id to your table or generate one using RapidMiner?

    Cheers,
    Helge
  • Options
    jeganathanvelujeganathanvelu Member Posts: 17 Contributor II
    Hi,

    Thanks for the reply. My second table already had an ID for each entry and also a foreign key (as in RDBMS) to be used for look-up with the first table. The second table has multiple entries with the same foreign key.

    While doing join I wanted to refer to the entry with latest time-stamp for each foreign key. I solved the issue by sorting the second table in descending order based on the time-stamp and used remove duplicate operator on the foreign key. This retained only entries with latest time-stamp for each foreign key. since Remove duplicate operator always retains the first entry only and removes other entries against a given attribute and I was able to do a join to get the desired result :-)
Sign In or Register to comment.