Sample One row within a group

yrgowtham · May 2018

Hi Experts,
I have a table with PatientID, the day of their stay and max vital signs for the day.
I want to create a process that randomly samples one day for each patient.
Table Structure :
PatientID Day Number Max_Temp Max_Resp Max_SBP Max_HR
ABC 1 98.7 32 90 72
ABC 2 98.8 33 95 75
ABC 3 95 35 90 78
DEF 1 98.7 32 90 72
DEF 2 95 35 90 78
the output of my process should have one day for each patient picked randomly and should look like as below :

PatientID Day Number Max_Temp Max_Resp Max_SBP Max_HR
ABC 2 98.8 33 95 75
DEF 1 98.7 32 90 72

Methods I have tried :

I have tried to use sample operator and use balance data option but it requires me to mention each PatientID in
the parameter list (sample size per class).This is impossible because there are more than 50000 patientID
Using R-code(Execute R) will solve this, but trying to find if there is a way in Rapidminer to solve it.

I am looking for a more automated method to achieve it in Rapidminer

Please let me know if you need more info.
Thanks in advance

Telcontar120 · May 2018

You can sort your datset by a random variable (which you can add if you need to using "Generate Attributes") and then simply use "Remove Deuplicates" to get rid of records based on the patient id. This should give you one random day per patient in the resulting dataset.

kypexin · May 2018

@Telcontar120 - pretty elegant solution! however, why would you want to sort dataset by a random variable beforehand?

Telcontar120 · May 2018

@kypexin Sorting by a random variable should help ensure it doesn't systematically keep the same day for each patient.(I'm not 100% sure what the internal logic is for removing duplicates but it might conceivably be related to the order in which they appear, so if your dataset is sorted by the patient/day, that could lead to non- random sampling results.)

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Sample One row within a group

Answers