Options

"whitespace regular expression"

student24student24 Member Posts: 7 Contributor II
edited June 2019 in Help
Hello everybody,

I want to search words from documents. I use the operator Filter Tokens by content with regular expression. If I want to search more than one word I use word1|word2|...|wordn. Now my question is how can I search an expression where there is a whitespace? For example "Research and Development|Word2|Word3 etc. ". Is there any wildcard for whitespaces?

Thanks for your help
Tagged:

Answers

  • Options
    RalfKlinkenbergRalfKlinkenberg Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member, Unconfirmed, University Professor Posts: 68 RM Founder
    You can use
    • [tt]\s[/tt]  as a placeholder for a whitespace character,
    • [tt]\s+[/tt]  for one or more whitespace characters, and
    • [tt]\s*[/tt]  for zero, one, or more whitespace characters.
    • [tt]\t[/tt]  is a placeholder for tabulator symbols.
    • [tt].[/tt]  stands for an arbitrary character.
    RapidMiner regular expressions use the Java syntax for regular expressions. If you search for "[tt]Java regular expressions[/tt]" with Google or another search engine, you will find a lot of documentation.

    Example: "[tt]Research\sand\sDevelopment[/tt]" for "Research and Development".

    Best wishes,
    Ralf
Sign In or Register to comment.