Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Tokenize last paragraph

hbuggledhbuggled Member Posts: 5 Contributor I
edited December 2018 in Help

Hi everyone,

I want to tokenize the last paragraphs from my text collections. I used the tokenize operator with the regular expression /n. But I only need the last paragraphs. Is there maybe an other expression to tokenize the last paragraph or can I filter all paragraphs except the last one? Do you have an idea?

 

Thank you for your help in advance.

Answers

  • kaymankayman Member Posts: 662 Unicorn

    use a wildcard pattern as follows :

     

    (?s).*\n{2,}(.*)

     

    and replace with

     

    $1

     

    What this does is as follows : Ignore beginning and ending of a line (the(?s) part, take everything (.*) until the last time you see 2 or more linefeeds (\n{2,}) and then store whatever is behind in a variable (.*). Then get the variable as replacement value ($1)

Sign In or Register to comment.