RapidMiner

RapidMiner

Get Pages - Error parsing HTTP headers

Regular Contributor

Get Pages - Error parsing HTTP headers

Hi,

When a web server sends a header string that violates the cookie specification, the method getHeaderFields() of the HttpURLConnection class throws an IllegalArgumentException which is not handled by RapidMiner and makes the "Get Pages" operator in the Web Mining extension fail. I have added a try/catch around that code, and it seem to be working now.

This the code I modified in line 164 of the GetWebPageOperator:


try { // El metodo GetHeaderFields falla si hay cookies que no tienen el simbolo = entre el nombre y el valor
for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
getLogger().info("Response Header:" + header.getKey() + ": " + header.getValue());
}
} catch(IllegalArgumentException ex) {
getLogger().warning("Failed to get HTTP header fields. Error: " + ex.getMessage());
}


I am posting this in case it helps anyone.

Thanks,
Miguel
3 REPLIES
Super Contributor

Re: Get Pages - Error parsing HTTP headers

Hi Miguel,

thanks for reporting. Can you give us a link to a page that reproduces this error?
Best regards,
Marius
Regular Contributor

Re: Get Pages - Error parsing HTTP headers

Hi Marius,

I am sorry, but unfortunately I forgot to keep the URL that was causing this problem. I process lots of different URLs everyday, and tried looking for the one (since I know the problem happened on Nov 6th) but I couldn't find it. The only thing I have is the screenshot of the error in RapidAnalytics, which I know isn't going to be of much help to you.  :'(



Thanks,
Miguel
Super Contributor

Re: Get Pages - Error parsing HTTP headers

Ok, thanks for searching in any case Smiley Happy
I will forward your error description and the piece of code above to the developers. Let's see if it is of use to them.

Best regards,
Marius