🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.

Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!


"Split Text"

fmuellerfmueller Member Posts: 4 Contributor I
edited May 23 in Help
Hi guys

I have several Text files with difffrent number of words and i want to split it into Text Files with max 500 words?

How can i segment this Text Files in RapidMiner? I try it with the operator SplitSegmenter but i have no idea how i can set the regular expression.

Can anybode help me?



  • RyujakkRyujakk Member Posts: 17  Maven

    I'm not really sure what you want...

    BUT! You can always try this regex:
    What it does is search for any number of non whitespace characters, followed by a whitespace character, this pattern repeated 500 times. It works on the site http://www.regexplanet.com/simple/index.html at least (credits to Sebastian for the URL  ;) ) !

    - R
  • fmuellerfmueller Member Posts: 4 Contributor I
    Thanks for your answer...

    So i will clarify my problem a little bit...for example:
    Text1.txt (Total: 110 words)
    Text2.txt (Total: 410 words)
    Text3.txt (Tota: 50 words)

    I need Text Files in 50 words blocks...the result should be:
    Text1.txt -> 3 Segments Files: Text1_Seg1.txt (50 words), Text1_Seg2.txt (50 words), Text1_Seg3.txt (10 words) = Total 110 words

    can i do this with the operator SplitSegmenter or TextSegmenter (TextMining PlugIn)

    Thanks for your answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,527   Unicorn
    did you try the regular expression above? I don't know why it shouldn't work...

Sign In or Register to comment.