Options

Why am I getting "inadmissible input" for regex in "finds" expression?

LearnAWLearnAW Member Posts: 4 Newbie
I have a Generate Attributes operator with an expression that uses the "finds" function.  It takes the existing nominal attribute Link_prefix, which contains URL strings, and I want to check for the existence of an IP address in the URL.  My expression looks like this:

finds(Link_prefix, "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")

i.e. a regex looking for four numbers from 1-3 digits, separated by periods. However, it gives me the following error message:  Error: Inadmissible input at "".  It has an arrow pointing at the opening quotes in the regex expression.  What am I doing wrong?  Why is the opening quotes of the regex producing an error?  I've looked at several questions in RapidMiner Community with examples of regex expressions, and they look like they're formatted the same way mine is.

My xml details are as follows.  (Please don't ask for the entire process xml.  It's lengthy, and sensitive.)

- Version:
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">

- Operator:
      <operator activated="true" class="generate_attributes" compatibility="9.0.003" expanded="true" height="82" name="Generate Attributes (4)" width="90" x="581" y="34">
        <list key="function_descriptions">
          <parameter key="IP_in_link" value="finds(Link_prefix, &quot;\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}&quot;)"/>
        </list>
      </operator>

Thanks for any help!!  I'm getting frustrated with repeated roadblocks in simple attribute extraction, before I even get to the modeling.

Best Answer

  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Solution Accepted
    Try changing you backslashes into double backslashes. The first will ensure that the second will be used as part of your regular expression, otherwise a single backslash before "d" would simply escape "d" making it a "normal" character (which already is). In other words use the following function with a regular expression:
    • finds(URL,"\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}")
    Jacob

Answers

  • Options
    LearnAWLearnAW Member Posts: 4 Newbie
    No way! That actually worked!   :)  If I understand this correctly, the first slash is string control, which escapes to make its follower a part of the string itself.  Then once the string is evaluated as a regular expression, the second slash is regular expression control, which escapes to make its follower one of the regular expression control characters...  I think.  Correct me if I'm wrong.  And it feels like a confusing way for RapidMiner to interpret things, which ought to be clearly called out.  But thanks for the solution!
  • Options
    jacobcybulskijacobcybulski Member, University Professor Posts: 391 Unicorn
    Basically, you need to 'construct' your syntactically correct regular expression using RapidMiner strings, which interpret backslashes in a special way. As regular expressions use backslashes hence we need a double backslash.
Sign In or Register to comment.