Replacing Miscoded Values/terms

jlajla Member Posts: 11 Learner I
Hi! My data set should only include these entries: Positive and Negative. However, there are miscoded entries such as Neg, NEG, and Pos. I tried the "Replace" operator for a couple of times now but some correct entries change too. For instance, I want to change Neg to Negative, the entries with the correct word Negative become Negativeative. How can I correct this?

Best Answer

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted

    ^neg.* would be a good expression, especially if you switch on case-insensitive matching.

    ^ means "beginning of the string". This makes sure that your text starts with "neg". The .* after it is "any sequence of characters". This will match neg, negative, negativeative etc.

    Replace is good for replacing text that you can express as a regular expression. If you have a list of values, Map can be easier to use without regular expressions.



  • jlajla Member Posts: 11 Learner I
    I think I just figured it out right now, but I still don't know if it's right. I used the "multiple arbitrary characters" and it worked for me though I'm not sure if it really is the correct way to do it. 
  • jlajla Member Posts: 11 Learner I
    The second process I used is just the regular expression "negativeative" and it also worked somehow. I viewed their differences in the results and they don't have any so far. Which method do you think is more appropriate to use?
  • jlajla Member Posts: 11 Learner I
    Just tried it on my previous processes and it worked! Thank you so much @BalazsBarany ! :smiley: 
Sign In or Register to comment.