RapidMiner 9.8 Beta is now available
Be one of the first to get your hands on the new features. More details and downloads here:
PDF encoding issue
I was trying to do the most simple one can do, by reading a PDF file into RM.... I have done this several times before, but now I am stuck with (I suspect) an encoding issue.
After using the "Read Document" Operator (extract text only and use file extension as type are tick-marked) I inserted a breakpoint, before I do some preprocessing of the text. However I don't get any text out of my PDF, what I get instead is something like:
Anyone an idea where the problem is? I would suggest that it is an encoding issue?!
If I go into the PDF file and Copy+Paste the text into a Word File there is no problem and the text is displayed in a correct manner....