Text Tokenization Using Regular Expression For Text Mining

edited November 2019 in Help
I have a problem and i need your help, please.
I want to tokenize a unstructured  document using regular expression. I have a text file where each rows include a sentence such as:

1. String1 String2 String3 String4 String5
2. String6      -      String7    -           -
n. String8    -        String9 String10   -               (assume string2 and string5 dont exist.)

What I exactly want to do is that tokenization will extract each word  and give the results in a table in Excel format such as:

    S1              S2            S3            S4             S5
1.   String1    String2    String3      String4    String5
2.   String6        -          String7          -              -
n.   String8        -          String9      String10      -

which operators and and which regular expression structure can i use in Rapid Miner?
Thank you for your help in advance.


    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    If your original document contains the dashes you can simply read it with Read CSV and specify all blanks (space, tab, etc.) as column separator.

    Best regards,
