Tokenize last paragraph

Wisdom logo Registration now open for RapidMiner Wisdom Americas | New Orleans | October 10-12, 2018   Learn More
Learner III hbuggled
Learner III

Tokenize last paragraph

Hi everyone,

I want to tokenize the last paragraphs from my text collections. I used the tokenize operator with the regular expression /n. But I only need the last paragraphs. Is there maybe an other expression to tokenize the last paragraph or can I filter all paragraphs except the last one? Do you have an idea?


Thank you for your help in advance.


Re: Tokenize last paragraph

use a wildcard pattern as follows :




and replace with




What this does is as follows : Ignore beginning and ending of a line (the(?s) part, take everything (.*) until the last time you see 2 or more linefeeds (\n{2,}) and then store whatever is behind in a variable (.*). Then get the variable as replacement value ($1)