Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Process Web Spanish
Hi everyone!
I'm trying "Process Web" in spanish language and i'm having problems with the accents.
The web page has "charset=iso-8859-1" then i try to put encoding parameter as "iso-8859-1" but it doesn't work. (I try all usual encoding)
The curious thing is that "Crawl web" works but only if I mark "write pages into files", because if I don't, it doesn't work too.
Is this a bug?
Does anyone know how can i solve it?
Thanks : )
I'm trying "Process Web" in spanish language and i'm having problems with the accents.
The web page has "charset=iso-8859-1" then i try to put encoding parameter as "iso-8859-1" but it doesn't work. (I try all usual encoding)
The curious thing is that "Crawl web" works but only if I mark "write pages into files", because if I don't, it doesn't work too.
Is this a bug?
Does anyone know how can i solve it?
Thanks : )
0
Answers
In this code you can see atribute "Introduccion" has diferent values depending on the method:
I think this is an issue with the encoding of the webpage. It's rather difficult to always read the correct encoding, if the web page doesn't specify it. We are usually assuming UTF-8 if nothing is specified in the html document.
You could manually try to request the webpages in an appropriate terminal program and check if the encoding is correct. If not, you might add a bug to the tracker with a detailed example process. This would make my life much easier and will speed up the fixing
Greetings,
Sebastian
I can see this pages in my navigator, and I've seen in the source code of the page:
<META http-equiv=Content-Type content="text/html; charset=iso-8859-1"> (I'm not sure if you refers to this)
You told me to request the webpages in an appropiate terminal program... (navigator?, sorry I don't know what you are trying to tell me)
In the example, you can see "Process web" operator, replaces the accents with a simbol, but with "Crawl web" operator, accent are well written (but only if is marked "write pages into files")
I would like to help to fix it, but I don't know how
Thanks for all
I have added a bug to the bug tracker. We will solve it as soon as possible.
Greetings,
Sebastian