Regular expression in Relace : Replacement character 'New Line'

kludikovskykludikovsky Member Posts: 30 Maven
edited December 2019 in Help

How can I use a 'New Line' Character as the replacement String(/Character) in the Replace operator?

 

Background:
I need to elminiate empty lines from an attribute.

So, replace multiple consecutive 'New Lines' by one 'New Line'.

My search so far is

(?s)((\n*)(\n))

which locates those multple empty lines, with an replacement string of '$3' .

 

I could have used also a simple '\n{2,}'.

 

But regardsless of what I do I can get a 'new Line' (or any other special character) as an replacement string.

Neither '\n' or '\013' worked. (Also not '\t' for test purposes.)

 

I have read through http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html already, but there is nothing related to replacement strings an any limitations.

 

Any ideas appreciated.

Tagged:

Best Answer

  • kludikovskykludikovsky Member Posts: 30 Maven
    Solution Accepted

    Solution:

    New Line <> New Line !

     

    Instead of just using the NL ('\n') the '\r\n' has to be used.

    '\n' oder the \x0D od LF character is actually used in the a 'Create Document' - 'Document to Data' sequenece, while in my impoerted data from disk this where the \r\n-sequence for line end.

     

    So the '((\r\n)(\r\n)*)' will find those multiple New Lines. which can be replaced by the $2.

     

     

     

    _Screen_Shot_2017-09-14_224553a.jpg

Answers

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @kludikovsky - yes I have encountered this many times before.  To my knowledge there is not a good replace function that replaces regex with more regex.  What I do is a workaround using encode/decode URL:

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="7.5.000" expanded="true" height="68" name="Create Document" width="90" x="45" y="34">
    <parameter key="text" value="This is the first line&#10;&#10;This is the third line&#10;&#10;&#10;This is the sixth line&#10;&#10;&#10;&#10;This is the tenth line"/>
    </operator>
    <operator activated="true" class="text:documents_to_data" compatibility="7.5.000" expanded="true" height="82" name="Documents to Data" width="90" x="179" y="34">
    <parameter key="text_attribute" value="text"/>
    <parameter key="add_meta_information" value="false"/>
    </operator>
    <operator activated="true" class="web:encode_urls" compatibility="7.3.000" expanded="true" height="82" name="Encode URLs" width="90" x="313" y="34">
    <parameter key="url_attribute" value="text"/>
    </operator>
    <operator activated="true" class="replace" compatibility="7.6.001" expanded="true" height="82" name="Replace" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="text"/>
    <parameter key="replace_what" value="([%]0A)+"/>
    <parameter key="replace_by" value="%0A"/>
    </operator>
    <operator activated="true" class="web:decode_urls" compatibility="7.3.000" expanded="true" height="82" name="Decode URLs" width="90" x="581" y="34">
    <parameter key="url_attribute" value="text"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
    <connect from_op="Documents to Data" from_port="example set" to_op="Encode URLs" to_port="example set input"/>
    <connect from_op="Encode URLs" from_port="example set output" to_op="Replace" to_port="example set input"/>
    <connect from_op="Replace" from_port="example set output" to_op="Decode URLs" to_port="example set input"/>
    <connect from_op="Decode URLs" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Scott

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    huh!  Well glad you found a solution as well!  :) 

     

    Scott

     

Sign In or Register to comment.