The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

"[SOLVED] regex processing bug?"

tennenrishintennenrishin Member Posts: 177 Contributor II
edited June 2019 in Help
Why would the following process output 'a/b/c' rather than just 'c'?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="659" width="1042">
     <operator activated="true" class="set_macro" compatibility="5.2.008" expanded="true" height="76" name="Set Macro" width="90" x="45" y="75">
       <parameter key="macro" value="path"/>
       <parameter key="value" value="a/b/c"/>
     </operator>
     <operator activated="true" class="generate_macro" compatibility="5.2.008" expanded="true" height="76" name="Generate Macro" width="90" x="179" y="75">
       <list key="function_descriptions">
         <parameter key="path_ending" value="replaceAll(&quot;%{path}&quot;,&quot;^.*/(?=[^/]+^)&quot;,&quot;&quot;)"/>
       </list>
     </operator>
     <operator activated="true" class="print_to_console" compatibility="5.2.008" expanded="true" height="76" name="Print to Console" width="90" x="313" y="75">
       <parameter key="log_value" value="OUTPUT: %{path_ending}"/>
     </operator>
     <connect from_op="Set Macro" from_port="through 1" to_op="Generate Macro" to_port="through 1"/>
     <connect from_op="Generate Macro" from_port="through 1" to_op="Print to Console" to_port="through 1"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>
Am I missing something obvious or might this be a bug?
Tagged:

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    I think that you are missing two things, '[' and ']', and that this is not a bug. Check this out..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
        <process expanded="true" height="383" width="763">
          <operator activated="true" class="set_macro" compatibility="5.2.003" expanded="true" height="76" name="Set Macro" width="90" x="45" y="75">
            <parameter key="macro" value="path"/>
            <parameter key="value" value="a/b/c"/>
          </operator>
          <operator activated="true" class="generate_macro" compatibility="5.2.003" expanded="true" height="76" name="Generate Macro" width="90" x="179" y="75">
            <list key="function_descriptions">
              <parameter key="path_ending" value="replaceAll(&quot;%{path}&quot;,&quot;[/]&quot;,&quot;oops &quot;)"/>
            </list>
          </operator>
          <operator activated="true" class="provide_macro_as_log_value" compatibility="5.2.003" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="313" y="75">
            <parameter key="macro_name" value="path_ending"/>
          </operator>
          <operator activated="true" class="log" compatibility="5.2.003" expanded="true" height="76" name="Log" width="90" x="447" y="75">
            <list key="log">
              <parameter key="m" value="operator.Provide Macro as Log Value.value.macro_value"/>
            </list>
          </operator>
          <connect from_op="Set Macro" from_port="through 1" to_op="Generate Macro" to_port="through 1"/>
          <connect from_op="Generate Macro" from_port="through 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
          <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
          <connect from_op="Log" from_port="through 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Best wishes,

    H
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    Hi H

    Thank you for your response, but I don't understand. regexpal.com agrees that ^.*/(?=[^/]+^) matches the a/b/ in a/b/c

    So I would expect the output in the OP to be c rather than a/b/c

    Can you explain why you hold that a/b/c is the correct output?

    Thanks
    Isak
  • haddockhaddock Member Posts: 849 Maven
    Hi,

    Regex formulae may not uniquely satisfy, different methods can also work; on your problem you can achieve a/b/c -> c with a regex match of [/ab], which makes a list of matchable characters, like this..
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.003">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.2.003" expanded="true" name="Process">
       <process expanded="true" height="383" width="763">
         <operator activated="true" class="set_macro" compatibility="5.2.003" expanded="true" height="76" name="Set Macro" width="90" x="45" y="75">
           <parameter key="macro" value="path"/>
           <parameter key="value" value="a/b/c"/>
         </operator>
         <operator activated="true" class="generate_macro" compatibility="5.2.003" expanded="true" height="76" name="Generate Macro" width="90" x="179" y="75">
           <list key="function_descriptions">
             <parameter key="path_ending" value="replaceAll(&quot;%{path}&quot;,&quot;[/ab]&quot;,&quot;&quot;)"/>
           </list>
         </operator>
         <operator activated="true" class="provide_macro_as_log_value" compatibility="5.2.003" expanded="true" height="76" name="Provide Macro as Log Value" width="90" x="313" y="75">
           <parameter key="macro_name" value="path_ending"/>
         </operator>
         <operator activated="true" class="log" compatibility="5.2.003" expanded="true" height="76" name="Log" width="90" x="447" y="75">
           <list key="log">
             <parameter key="m" value="operator.Provide Macro as Log Value.value.macro_value"/>
           </list>
         </operator>
         <connect from_op="Set Macro" from_port="through 1" to_op="Generate Macro" to_port="through 1"/>
         <connect from_op="Generate Macro" from_port="through 1" to_op="Provide Macro as Log Value" to_port="through 1"/>
         <connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
         <connect from_op="Log" from_port="through 1" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    cf. http://www.lunametrics.com/blog/2006/10/22/regular-expressions-part-viii-square-brackets-and-dashes/

    Best

    H

    PS I'm a great fan of RegexBuddy, which I think is related to the source you used; it has an explain feature - you put in your regex, and out comes an interpretation. Here's what it said about your first..
    regexbuddy says..

    ^.*/(?=[^/]+^)

    Options: ^ and $ match at line breaks

    Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»
    Match any single character that is not a line break character «.*»
      Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
    Match the character “/” literally «/»
    Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=[^/]+^)»
      Match any character that is NOT a “/” «[^/]+»
          Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
      Assert position at the beginning of a line (at beginning of the string or after a line break character) «^»


    Created with RegexBuddy
    and here's what it said about my second..
    regexbuddy says..

    [/ab]

    Options: ^ and $ match at line breaks

    Match a single character present in the list “/ab” «[/ab]»


    Created with RegexBuddy
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    I am not asking for a regex that matches a/b/. (That's trivial - a/b/ is a regex that matches a/b/.)
    a/b/c was just an example string that I happened to use, but it could just as well have been
    i/know/what/my/regex/does in which case I want a match on i/know/what/my/regex/

    But thanks anyway for discussing because typing out these responses revealed a typo in my regex. The regex of my intention was
    ^.*/(?=[^/]+$) rather than ^.*/(?=[^/]+^)

    Now it all works.
Sign In or Register to comment.