Get Pages operator - possible enhancements

miguelalmiguelal Member Posts: 23 Contributor II
edited November 2018 in Help

I am not sure if this is the right place to post this, but we have encountered two minor issues with the Get Page and Get Pages operators that are part of the Web extension.

1) When the remote web server returns an invalid encoding (uft-16) or an empty one, the operators throw an exception.
Some sample URLs are: www.ochoa.es, www.mrw.es, www.giraud.es, www.alartec.com
It would be great if the user could select a default encoding, and in case the web server returns an invalid one, the default gets used.

2) Some web servers don't properly return the EOF when serving a page, and even though I believe the operators are able to read the page's content, an exception is thrown when trying to read the EOF.
A sample URL: http://www.jamonescarretero.com
It would be great if the operators could identify this situation, and not throw an exception.



  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Miguel,

    for the second issue, please create a bug on our public bug tracker at http://bugs.rapid-i.com

    Concerning your first issue I can say that at least the Get Page operator contains an "override encoding" parameter - I am not sure though, if it has already been released.

Sign In or Register to comment.