UTF-8 encoded text doesn't get right out of the Get Page operator
s_nektarijevic RapidMiner Certified Analyst, Member Posts: 12 Contributor II
edited December 2018 in Help
I am having an issue with the Get Page operator and UTF-8 encoding.
I am scraping the content of this web page:
According to the html code I get out of Get Page, this page uses UTF-8:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The problem is that for example: FDA’s turns out as FDAâs.
I tried enforcing the right encoding by checking the "override encoding" box in the Get Page operator, but if I do that, I get an error message:
"Encoding 'SYSTEM' is not supported"
Any idea how to solve this (without having to manually search and replace the unwanted characters please!) ?
Many thanks in advance for any kind of input!