🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

"Split Text"

fmuellerfmueller Member Posts: 4 Contributor I
edited May 23 in Help
Hi guys

I have several Text files with difffrent number of words and i want to split it into Text Files with max 500 words?

How can i segment this Text Files in RapidMiner? I try it with the operator SplitSegmenter but i have no idea how i can set the regular expression.

Can anybode help me?

Regards
Florian

Answers

  • RyujakkRyujakk Member Posts: 17  Maven
    Hi,

    I'm not really sure what you want...

    BUT! You can always try this regex:
    ([^\s]+\s){500}
    What it does is search for any number of non whitespace characters, followed by a whitespace character, this pattern repeated 500 times. It works on the site http://www.regexplanet.com/simple/index.html at least (credits to Sebastian for the URL  ;) ) !

    - R
  • fmuellerfmueller Member Posts: 4 Contributor I
    Thanks for your answer...

    So i will clarify my problem a little bit...for example:
    Text1.txt (Total: 110 words)
    Text2.txt (Total: 410 words)
    Text3.txt (Tota: 50 words)

    I need Text Files in 50 words blocks...the result should be:
    Text1.txt -> 3 Segments Files: Text1_Seg1.txt (50 words), Text1_Seg2.txt (50 words), Text1_Seg3.txt (10 words) = Total 110 words
    ......

    can i do this with the operator SplitSegmenter or TextSegmenter (TextMining PlugIn)

    Thanks for your answers


  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,527   Unicorn
    Hi,
    did you try the regular expression above? I don't know why it shouldn't work...

    Greetings,
      Sebastian
Sign In or Register to comment.