Compete in RapidMiner's 3rd Competition: Fantasy Football. Top prize is $750. Deadline December 19.
Yes it's out and we're super excited about it. We call it "Data Science for the Enterprise". Download today and let us know how you like it!
Read about how our community works. Meet other newbies. Get your questions answered fast!
Hi, I want to replace some values in an attribute of my dataset. In particular, I have some characters (like "C", "P", "A" and some strings, like "SPAIN", "ITALY" etc.).
I want to modify the value A without changing the string SPAIN. For example, by replacing A with "Other" I always obtain SPOtherIN.
I tried with A, with "A", with (A) but without success. Does anyone knows how to achieve that? Thank you!
Solved! Go to Solution.
hello @f_laperna - have you tried doing this with a RegEx like \sA\s ?
If your example is representative of your data, you could do the following:
1. Generate an attribute "length" with the "Generate Attributes" and length() in the function expression.
2. Multiply your data with the "Multiply" operator
3. Filter both threads that come out of the Multiply output ports. You can use "Filter examples" and the newly created length attribute. As per your example it would be length > 1 and length <= 1
4. Now you can replace the values in the filtered thread with length <= 1 with e.g. some RegEx
5. Glue the two threads back together with the "Append" operator.
There is probably a more elegant way of doing the above, but sometimes it helps to break things down into small steps.
so perhaps the easiest way is to use the Map operator and create a lookup table. That will only make changes for a true string match - not partial like Replace. Otherwise you can use @FBT 's idea - even in one Generate Attributes operator like this:
att1 if(length(att1)=="A", "foo",att1)
or something like that. It's pretty much what Map does but you can be more specific.
Perhaps a more direct solution would be to use the lookaround syntax with regex. Here's what you want, I think:
This will only take "A" when it is not preceded by another word character before and after (thus it will skip "A" in the middle of a word).
Try it and see if that works.
But kudos to @FBT for a creative solution that would also work, albeit a more complex one.
yes well done @Telcontar120. I always get tangled up with RegEx lookarounds.