Options
Why UTF-8 is not working?
heron_oliveira
Member Posts: 6 Newbie
in Help
Today I converted a pdf to txt, and I'm trying to analyse some therms frequency in the text. Despite the txt is in UTF-8 and I've already changed the program's encoding into the default (SYSTEM) or into 'UTF-8' before tokenizing, generating n_grams, it keeps showing incorrect words. For example, the word should've been 'abrangência' inetead of 'abrangãºncia'.
0
Best Answer
-
OptionsMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,512 RM Data ScientistHi there,what operator do you use to read the text file? It should have a setting as well.Cheers,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1
Answers