I'm trying to read pdf-files in RapidMiner through the "Read Document" operator and then use the "Replace Token Operator" to delete all line-breaks. I replace "\n" with " ", but when I then copy the text, all line breaks are still in place. Weirdly, when I use the "Create Document" operator and manually copy the text into the operator, everything seems to work fine... Does anyone in the community know why this could be or if there is a way to automatically read the pdf-files and copy the text into a "Create Document" operator? Thanks alot in advance :smile:

    jwpfau
    Hi tobow,

    You could do this with Documents to Data → Replace \R → Extract Document.

    The Replace Token operator can sadly only replace inside the individual tokens as far as i know.

