"Extract Information Regular Expression query type failed (Text Processing)"

CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
edited June 2019 in Help
Dear All!

I have a simple process: Create Document + Extract Information. I create a simple text: "string1 string2 string3 string4" and I use a simple regular expression: ^\S* so I want to extract the first string from my document. And RapidMiner gives the following error: Process Failed. No group 1.
If I use not a Regular Expression query type but a String Matching one and I set string2 and string4 at query expression, then I get string3 as result. So String Matching works well. But Regular Expression does not.
Can anybody check this why this query type does not work? Or did I make any mistake? (what?)

If I use Regular Region and set eg. ^\S* and .* as region delimiter, then RapidMiner gives the correct result: string1 string2 string3 string4.
Only the normal Regular Expression does not work........
Of course if I use Regular Region and ^\S* and '\ ' as the two delimiter, then I will get the result I want: string1
But why Regular Expression query type does not work?

Thank you for reading it and trying to help me!

Answers

  • CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
    My second question is that if I use a Remove Document Parts operator after the Extract Information operator and I remove the first string plust the beginner space, then I will have the following document: 'string2 string3 string4'. Then I use another Extract Information operator to extract the tailings, so the rest of the original documement (original was 'string1 string2 string3 string4'). So I want extract firstly the first string (string1) then secondly the tailings (string2 string3 string4).
    But the second Extract Information operator works on the whole original document. I checked that the input document of the second Extract Information operator is 'string2 string3 string4'.
    So why does the second Extract Information operator extract not this but the original  'string1 string2 string3 string4'?

    Thank you!
  • CharlieFirpoCharlieFirpo Member Posts: 48 Contributor II
    I found the solution:
    You have to use brackets when using the Regular Expression query type at Extract Information operator.
    So eg.:
    wrong: ^\S*
    good: (^\S*)

    These brackets are not part of the regular expression.

    Nice day!
  • cissy0201cissy0201 Member Posts: 1 Contributor I
    I found the same problem and also make it work by adding brackets. Thank you Caharlie for the solution!

    Is there anyone know why is this 'bracket soulution' required for Rapidminer. As I am always aware of, a regular expression does not need brackes unless a it is a group captures..

    Thank you
Sign In or Register to comment.