The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

Breaking an attribute into further attributes

arsalan_karimarsalan_karim Member Posts: 14 Contributor II
edited November 2018 in Help

Hi All

I am stuck with something that is giving me nightmares.

I have a medical data set that consists of a list of medications which look similar to the below: 

 

FENTANYL 1 Patch(es), Q3D, 6 Mth30
FENTANYL 1 Patch(es), Q3D, 36 Day(s)
FENTANYL 1 Patch(es), Q2D, 100 Day(s)
FENTANYL 1 Patch(es), Q3D, 9 Day(s)
FENTANYL 1 Patch(es), Q3D, 9 Day(s)
FENTANYL 1 Patch(es), 2x/week, 30 Day(s)
FENTANYL 1 Patch(es), Q3D, 30 Day(s)
FENTANYL 1 Patch(es), 2x/week, 100 Day(s)

 

the second Column consists of :the dose, unit of measure, frequency and duration all in one string. How can I break this attribute into the 4 seperate attributes as below:

 

Dose = 1
Unit of Measure = Patch
Frequency = Q2D
Duration = 9 Days

 

Thanks

Arsalan

 

Answers

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Have you tried the Split operator and split them on the comma?  If that doesn/t work you could use the RegEx function on the Split operator to do it. Bit more complicated but doable. 

  • arsalan_karimarsalan_karim Member Posts: 14 Contributor II

    Thanks T-Bone

    Some of my enteries in the list are without a comma. Is there anyway for that...?

     

    Thanks

    Arsalan

     

     

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Ok then you'll have to use RegEx something like .*\W.*,.* 

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Or you could just use two sequential Split operators.  From your examples it looks like dose is separated by a space but the others are done via commas.  So you could first split on comma and then take the first attribute generated (which should contain both dose and unit of measure) and split again on space.  That might be less elegant than the regex method but more robust if you have more variations of delimiters.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    If you do that, you just have to be diligent with renaming the attribute columns. I did one once with 4 Split operators and went nuts. :)

Sign In or Register to comment.