🥳 RAPIDMINER 9.9 IS OUT!!! 🥳

The updates in 9.9 power advanced use cases and offer productivity enhancements for users who prefer to code.

CLICK HERE TO DOWNLOAD

finds() returns false when text contains line feed

MaxFMaxF Member Posts: 5 Contributor II
Hi there,

I've been working a lot with regex and texts lately and come across an unexpected behaviour of the finds() expression. Whenever there is a line feed in the text, finds() without modifiers will always return false, even finds(TextAttribute,"."). Using the dotall mode, finds(TextAttribute, "(?s)."), it matches any substring of the first line but not of the following lines. So what's apparently happening in the background is that finds(TextAttribute,"(?s)expression") is translated to matches(TextAttribute,".*(?s)expression.*"). Can anyone reproduce this behaviour and can confirm that this is happening? If so, I think there is a find() method in Java that could be used to solve this problem. I don't have any experience in Java though, and there might be a reason for not using that.

So far I've simply used matches() with dotall mode instead of finds(), which works fine for me. But in case anyone can reproduce the behaviour of the finds() expression, it might be useful for others, who rely on this expression, to know about that behaviour.

Regards
Max
Tagged:
land
1
1 votes

Declined · Last Updated

No activity or votes since March 2019. Please comment and cc sgenzer if this should be reopened. IC-1270

Comments

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,958  Community Manager
    hi @MaxF - can you pls post an example of what you're describing? It's hard to understand in abstraction.

    Scott

  • MaxFMaxF Member Posts: 5 Contributor II
    Hi Scott,
    here is an example process:

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="246" y="136">
            <parameter key="text" value="first line&#10;second line"/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="82" name="Documents to Data" width="90" x="380" y="136">
            <parameter key="text_attribute" value="Text"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="514" y="136">
            <list key="function_descriptions">
              <parameter key="FindsAnything" value="finds(Text,&quot;.*&quot;)"/>
              <parameter key="FindsWithModifier1" value="finds(Text,&quot;(?s)first&quot;)"/>
              <parameter key="FindsWithModifier2" value="finds(Text,&quot;(?s)second&quot;)"/>
            </list>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>



  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,958  Community Manager
    edited March 2019
    hi @MaxF - yep that's a weird one. Pushing to Product Feedback. Here is a more illustrative version of the bug:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="136">
            <parameter key="text" value="two lines first line&#10;two lines second line"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document (2)" width="90" x="45" y="238">
            <parameter key="text" value="one line first line"/>
            <parameter key="add label" value="false"/>
            <parameter key="label_type" value="nominal"/>
          </operator>
          <operator activated="true" class="text:documents_to_data" compatibility="8.1.000" expanded="true" height="103" name="Documents to Data" width="90" x="179" y="187">
            <parameter key="text_attribute" value="Text"/>
            <parameter key="add_meta_information" value="true"/>
            <parameter key="datamanagement" value="double_sparse_array"/>
            <parameter key="data_management" value="auto"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.2.000" expanded="true" height="103" name="Multiply" width="90" x="313" y="289"/>
          <operator activated="true" class="text_to_nominal" compatibility="9.2.000" expanded="true" height="82" name="Text to Nominal" width="90" x="447" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Text"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="text"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="text"/>
            <parameter key="block_type" value="value_matrix"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.000" expanded="true" height="82" name="Rename" width="90" x="581" y="187">
            <parameter key="old_name" value="Text"/>
            <parameter key="new_name" value="Nominal"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="operator_toolbox:merge" compatibility="1.8.000" expanded="true" height="103" name="Merge Attributes" width="90" x="715" y="289">
            <parameter key="handling_of_duplicate_attributes" value="rename"/>
            <parameter key="handling_of_special_attributes" value="keep_first_special_other_regular"/>
            <parameter key="handling_of_duplicate_annotations" value="rename"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.000" expanded="true" height="82" name="Generate Attributes" width="90" x="849" y="289">
            <list key="function_descriptions">
              <parameter key="textFindsAnything" value="finds(Text,&quot;.*&quot;)"/>
              <parameter key="textFindsWithModifier1" value="finds(Text,&quot;(?s)first&quot;)"/>
              <parameter key="textFindsWithModifier2" value="finds(Text,&quot;(?s)second&quot;)"/>
              <parameter key="nominalFindsAnything" value="finds(Nominal,&quot;.*&quot;)"/>
              <parameter key="nominalFindsWithModifier1" value="finds(Nominal,&quot;(?s)first&quot;)"/>
              <parameter key="nominalFindsWithModifier2" value="finds(Nominal,&quot;(?s)second&quot;)"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
          <connect from_op="Create Document (2)" from_port="output" to_op="Documents to Data" to_port="documents 2"/>
          <connect from_op="Documents to Data" from_port="example set" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Text to Nominal" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Merge Attributes" to_port="example set 2"/>
          <connect from_op="Text to Nominal" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Merge Attributes" to_port="example set 1"/>
          <connect from_op="Merge Attributes" from_port="merged set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    Just curious - why are you still on RM 8.0?

    Scott
     
  • MaxFMaxF Member Posts: 5 Contributor II
    I switch between versions a lot and it was a coincidence that I used 8.0 to create the example process. I'm usually only using 8.0 for projects that are in production on 8.0 servers and that I don't want to upgrade right now.
Sign In or Register to comment.