The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

Tokenize last paragraph

hbuggledhbuggled Member Posts: 5 Contributor I
edited December 2018 in Help

Hi everyone,

I want to tokenize the last paragraphs from my text collections. I used the tokenize operator with the regular expression /n. But I only need the last paragraphs. Is there maybe an other expression to tokenize the last paragraph or can I filter all paragraphs except the last one? Do you have an idea?

 

Thank you for your help in advance.

Answers

  • Options
    kaymankayman Member Posts: 662 Unicorn

    use a wildcard pattern as follows :

     

    (?s).*\n{2,}(.*)

     

    and replace with

     

    $1

     

    What this does is as follows : Ignore beginning and ending of a line (the(?s) part, take everything (.*) until the last time you see 2 or more linefeeds (\n{2,}) and then store whatever is behind in a variable (.*). Then get the variable as replacement value ($1)

Sign In or Register to comment.