Options

Get Pages - Connection Reset Error

bhickiebhickie Member Posts: 2 Contributor I
edited November 2018 in Help
Hi,

I am using the get pages operator to process the home page URL for 4,000 or so websites from an excel file.  I have received the following message several times:
Process Failed
Could Not Read Document
Reason: Connection Reset

I cannot figure out what is causing this. I have tried to run the code on a subset of the URLs (~200) and not had any problems, so i know the code works.  I also tried it with 1,500 URLs and into a similar problem.

What is causing this?  The only thing I have been able to find on this was on Stack Overflow on a similar SQL error.

The article basically said that this error is a Java based error to indicate that the connection between your java process and the database service have been lost.  The article says this could have happened for lots of different reasons including a network connection lost, a problem with available resources, as well as others.

This seems like a vague error code.  How do i troubleshoot what is causing it?  It is repeatedly happening so it seems like an environment, resource or data problem.  The only things I can come up with are that it is caused by:
- Internet connection problems - But I tested it in the office an data home to test for this issue.  Error appeared multiple times in both locations.
- Data issue - something odd about data is causing the system to lose connection.  Maybe exceeding read time limits.
- Community License Restriction - The license has some sort of resource restriction causing a connection reset.  Seems odd since there are other licensing errors that I have come across in past.

Any information that you could provide would be much appreciated.

Thanks,
Brandon

Answers

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    the license has nothing to do with it. The error indicates that the remote side of the connection forcefully closed it by sending a TCP RST (reset) flag. The reasons for this can be very hard (if at all possible) to find out. A RST can be sent to block traffic, because the server had some kind of error, because the server received something he was not expecting and thus became confused, an intermediate router may do this, etc etc.
    You may want to try to reduce the load a bit by doing batch requests, i.e. taking the first 200 URLs inside a "Loop" operator and querying them, then add a "Delay" operator afterwards before starting the next 200 URLs in the loop. If it's a load related problem this might fix it.

    Regards,
    Marco
Sign In or Register to comment.