Options

How to split a text into several pieces?

fangkuoyufangkuoyu Member Posts: 11 Contributor II
I want to to split a text into several pieces for retrieval-augmented generation under Generative Models Extension. 
I have checked the document at https://docs.rapidminer.com/latest/studio/generative-ai/#retrieval-augmented-generation 
but I don't know how to reproduce the process. Can someone provide the process? Further, I have tried text processing extensions with "create document" and "window document". But, I get "no elements in this collection" from "window document". Any help? Thanks.

Regards
Frank

Answers

  • Options
    rjones13rjones13 Member Posts: 168 Unicorn
    Hi @fangkuoyu,

    I'd recommend looking into the Text Analysis course on the RapidMiner Academy, as it gives a nice overview of how you can load and manipulate text data.

    To split up text, as a starting point generally I would use the Tokenize Operator inside a Process Documents operator. This splits each document by some form or regular pattern, which usually for me ends up being whitespace. Also just make beforehand you set the column data type to Text, and also use a Data to Documents operator.

    Hope this makes sense.

    Best,
    Roland
Sign In or Register to comment.