🦉🦉   WOOT WOOT!   RAPIDMINER WISDOM 2020 EARLY BIRD REGISTRATION ENDS FRIDAY DEC 13!   REGISTER NOW!   🦉🦉

Breaking an attribute into further attributes

arsalan_karimarsalan_karim Member Posts: 14 Contributor II
edited November 2018 in Help

Hi All

I am stuck with something that is giving me nightmares.

I have a medical data set that consists of a list of medications which look similar to the below: 

 

FENTANYL 1 Patch(es), Q3D, 6 Mth30
FENTANYL 1 Patch(es), Q3D, 36 Day(s)
FENTANYL 1 Patch(es), Q2D, 100 Day(s)
FENTANYL 1 Patch(es), Q3D, 9 Day(s)
FENTANYL 1 Patch(es), Q3D, 9 Day(s)
FENTANYL 1 Patch(es), 2x/week, 30 Day(s)
FENTANYL 1 Patch(es), Q3D, 30 Day(s)
FENTANYL 1 Patch(es), 2x/week, 100 Day(s)

 

the second Column consists of :the dose, unit of measure, frequency and duration all in one string. How can I break this attribute into the 4 seperate attributes as below:

 

Dose = 1
Unit of Measure = Patch
Frequency = Q2D
Duration = 9 Days

 

Thanks

Arsalan

 

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Have you tried the Split operator and split them on the comma?  If that doesn/t work you could use the RegEx function on the Split operator to do it. Bit more complicated but doable. 

  • arsalan_karimarsalan_karim Member Posts: 14 Contributor II

    Thanks T-Bone

    Some of my enteries in the list are without a comma. Is there anyway for that...?

     

    Thanks

    Arsalan

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    Ok then you'll have to use RegEx something like .*\W.*,.* 

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,277   Unicorn

    Or you could just use two sequential Split operators.  From your examples it looks like dose is separated by a space but the others are done via commas.  So you could first split on comma and then take the first attribute generated (which should contain both dose and unit of measure) and split again on space.  That might be less elegant than the regex method but more robust if you have more variations of delimiters.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
    Thomas_Ott
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761   Unicorn

    If you do that, you just have to be diligent with renaming the attribute columns. I did one once with 4 Split operators and went nuts. :)

Sign In or Register to comment.