Replace single characters without changing strings value of the attribute

f_lapernaf_laperna Member Posts: 13 Contributor II
edited December 2018 in Help

Hi, I want to replace some values in an attribute of my dataset. In particular, I have some characters (like "C", "P", "A" and some strings, like "SPAIN", "ITALY" etc.).

I want to modify the value A without changing the string SPAIN. For example, by replacing A with "Other" I always obtain SPOtherIN.

I tried with A, with "A", with (A) but without success. Does anyone knows how to achieve that? Thank you!

Best Answer

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted

    Perhaps a more direct solution would be to use the lookaround syntax with regex.  Here's what you want, I think:

    (?<!\w)A(?!\w)

    This will only take "A" when it is not preceded by another word character before and after (thus it will skip "A" in the middle of a word).

    Try it and see if that works.

    But kudos to @FBT for a creative solution that would also work, albeit a more complex one.

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @f_laperna - have you tried doing this with a RegEx like \sA\s ?

     

    Scott

  • f_lapernaf_laperna Member Posts: 13 Contributor II
    I tried but the problem is I don’t have a space character before and after A. So it’s not working
  • FBTFBT Member Posts: 106 Unicorn

    If your example is representative of your data, you could do the following:

     

    1. Generate an attribute "length" with the "Generate Attributes" and length() in the function expression.

    2. Multiply your data with the "Multiply" operator

    3. Filter both threads that come out of the Multiply output ports. You can use "Filter examples" and the newly created length attribute. As per your example it would be length > 1 and length <= 1

    4. Now you can replace the values in the filtered thread with length <= 1 with e.g. some RegEx

    5. Glue the two threads back together with the "Append" operator.

     

    There is probably a more elegant way of doing the above, but sometimes it helps to break things down into small steps.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    yes well done @FBT - that will work nicely.  Sorry @f_laperna I did not realize that the content looked like this:

     

    A

    ALIEN

    B

    BROKEN

    C

    CHARLIE

    etc...

     

    so perhaps the easiest way is to use the Map operator and create a lookup table.  That will only make changes for a true string match - not partial like Replace.  Otherwise you can use @FBT 's idea - even in one Generate Attributes operator like this:

     

    att1         if(length(att1)=="A", "foo",att1)

     

    or something like that.  It's pretty much what Map does but you can be more specific.

     

    Scott

     

  • f_lapernaf_laperna Member Posts: 13 Contributor II

    Yes, It worked, and it's exactly the solution I was looking for. Thank you very much!

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    yes well done @Telcontar120.  I always get tangled up with RegEx lookarounds.  :)

     

    Scott

Sign In or Register to comment.