Get Pages - Error parsing HTTP headers

miguelalmiguelal Member Posts: 23 Contributor II
edited July 2019 in Help
Hi,

When a web server sends a header string that violates the cookie specification, the method getHeaderFields() of the HttpURLConnection class throws an IllegalArgumentException which is not handled by RapidMiner and makes the "Get Pages" operator in the Web Mining extension fail. I have added a try/catch around that code, and it seem to be working now.

This the code I modified in line 164 of the GetWebPageOperator:

try { // El metodo GetHeaderFields falla si hay cookies que no tienen el simbolo = entre el nombre y el valor
for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
getLogger().info("Response Header:" + header.getKey() + ": " + header.getValue());
}
} catch(IllegalArgumentException ex) {
getLogger().warning("Failed to get HTTP header fields. Error: " + ex.getMessage());
}
I am posting this in case it helps anyone.

Thanks,
Miguel
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Miguel,

    thanks for reporting. Can you give us a link to a page that reproduces this error?
    Best regards,
    Marius
  • miguelalmiguelal Member Posts: 23 Contributor II
    Hi Marius,

    I am sorry, but unfortunately I forgot to keep the URL that was causing this problem. I process lots of different URLs everyday, and tried looking for the one (since I know the problem happened on Nov 6th) but I couldn't find it. The only thing I have is the screenshot of the error in RapidAnalytics, which I know isn't going to be of much help to you.  :'(

    image

    Thanks,
    Miguel
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Ok, thanks for searching in any case :)
    I will forward your error description and the piece of code above to the developers. Let's see if it is of use to them.

    Best regards,
    Marius
Sign In or Register to comment.