Script using LWP::UserAgent is sometimes failing with 500 error,although server reports 200

David Karr · Aug 23, 2011

I have a Perl script that uses LWP::UserAgent to run a series of tests against a REST service running in an intranet. I configure it with a host and port to run its tests against. I've been using it for a while with no significant problems. Yesterday I started running tests against another serverwhere my service was just deployed. This is a "final stage" environment, prior to production. My service is already deployed to production, this isjust an additional release.

For some reason, while running its tests against this new server, after getting numerous successful results back, the script is failing at no particular request with "500 read failed: Software caused connection abort". Even more curious, when I check the "access.log" on the server for that particular request, it reports a 200.

When I try to run that specific request manually, it always works fine. I also just checked with a QA tester that's using SoapUI to do similar testing, and he hasn't seen any problems.

So, it seems like there's something in my Perl script that is causing some sort of a race condition that makes it think a request fails, when the server sees no problem.

I'll show the head of my Perl script, along with the excerpt of the code that makes the request and checks for success.

-------------------
#!/usr/bin/perl -w
# -*- mode: Perl; -*-
use threads;
use threads::shared;
use Thread:

ool;
use Getopt::Long;
use LWP::UserAgent;
use HTTP::Request::Common qw(GET);
use XML::XPath;
use XML::XPath::XMLParser;
use Time::HiRes qw/gettimeofday/;
use List::MoreUtils qw(uniq);

....

sub sendGet($) {
my ($url) = @_;
if ($url =~ /\?/) {
foreach my $param (@opt_params) {
$url = $url . "&" . $param;
}
}
else {
$url = $url . "?";
foreach my $param (@opt_params) {
$url = $url . $param . "&";
}
}
print localtime() . ": url[$url]\n";
my $request = GET $url;
$request->header("X-Client-Code", "abc");
eval {
my $response = $ua->request($request);
if ($response->is_success) {
return $response->decoded_content;
}
else {
print "Call to url \"" . $url . "\" failed: " .
$response->status_line . "\n";
}
1;
} or do {
print "Call to url \"" . $url . "\" failed.\n";
}
}

David Karr · Aug 23, 2011

Two details I should have provided:

I'm on Cygwin 1.5.25, and Perl 5.10.0.

Rainer Weikusat · Aug 23, 2011

David Karr said:
I have a Perl script that uses LWP::UserAgent to run a series of
tests against a REST service running in an intranet. I configure it
with a host and port to run its tests against. I've been using it
for a while with no significant problems. Yesterday I started
running tests against another server where my service was just
deployed. This is a "final stage" environment, prior to production.
My service is already deployed to production, this is just an
additional release.

For some reason, while running its tests against this new server,
after getting numerous successful results back, the script is
failing at no particular request with "500 read failed: Software
caused connection abort".

http://www.google.com/#hl=en&cp=32&...gc.r_pw.&fp=c22dbf1ad5dc4c1f&biw=1598&bih=847

David Karr · Aug 23, 2011

So far we've discovered that the URL I'm using is going through an F5 to the web server, and if I change the test to go directly to the web server, itdoesn't fail. Going through the F5 I get one random failure after it's been running for a while (the script exits on the first failure). So far thenetwork engineer hasn't found any obvious configuration details on the F5 that might cause this. Note that my requests are not using SSL.

It's pretty clear this isn't really a Perl problem, but something about howit works in my script is making the race condition surface.

Jim · Aug 24, 2011

So far we've discovered that the URL I'm using is going through an F5 to the web server, and if I change the test to go directly to the web server, it doesn't fail. Going through the F5 I get one random failure after it's been running for a while (the script exits on the first failure). So far the network engineer hasn't found any obvious configuration details on the F5 that might cause this. Note that my requests are not using SSL.

It's pretty clear this isn't really a Perl problem, but something about how it works in my script is making the race condition surface.

Another headbanger gotcha is a bad network card. Or a loose connector
- I've had both of those in the same situation.

David Karr · Aug 24, 2011

If it matters, we've now ported the entire script to a Solaris VM, and it does not display this symptom, so there's something about Perl in Cygwin and Windows, combined with the F5, that is causing this symptom.

David Karr · Aug 24, 2011

I understand, but this test is running on my laptop, which I use heavily every single day. I suppose it's entirely possible there's a problem in the network card, but I've never seen any particular issue with network connections on this box.

Rainer Weikusat · Aug 24, 2011

David Karr said:
If it matters, we've now ported the entire script to a Solaris VM,
and it does not display this symptom, so there's something about
Perl in Cygwin and Windows, combined with the F5, that is causing
this symptom.

As the abundance of links returned by the Google search request I
posted should have told you, 'software caused connection abort' on
Windows means 'Winsock decided to kill the connection for some
reason'. If you're interested in what happened there, try a traffic
capture 'between' the Windows machine and its counterpart. But the
results of that are (with a very high probability) 'for educational
purposes only'. Practically, this just means when the software needs
to work reliably despite Windows being used, it needs to be able to
deal with essentially gratuitous connection aborts. A sensible
strategy would be to retry the request a couple of times with some
delay between the retries (using exponential backoff in order to cope
with problems a la 'someone disconnected a cable' might make sense).

Jim · Aug 24, 2011

I understand, but this test is running on my laptop, which I use heavily every single day. I suppose it's entirely possible there's a problem in the network card, but I've never seen any particular issue with network connections on this box.

Having watched the thread, it appeared you had been connected to one
server for development, then another for testing (that became flakey).

It really does look like something lower in the communications stack
than the running Perl code -- something closer to the SMTP or IP
layer.. I used to see this when tuning mail systems, and it was
always either hardware, or handshake timing, external to the
application code (once it was debugged of course), somewhere in the
communications or protocol layers.

David Karr · Aug 25, 2011

That makes perfect sense. I realized this is what the Java frameworks I usually use for http connections do, LWP::UserAgent just doesn't do it automatically. I implemented a simple retry loop with configurable retries, and now it's perfectly happy.

Thanks.

https request failing	2	Sep 18, 2012
LWP::UserAgent problem - 500 error	6	Jun 1, 2004
NTLM and LWP::UserAgent	4	Sep 12, 2006
LWP::UserAgent infinite hang	1	Mar 5, 2007
using LWP::UserAgent Get method	0	Jun 4, 2007
LWP::UserAgent and HTTP::Request with basic authentication...	1	Mar 29, 2007
Problem posting with LWP::UserAgent	3	Oct 21, 2005
Problem using LWP::UserAgent	1	Oct 1, 2003

Script using LWP::UserAgent is sometimes failing with 500 error,although server reports 200

David Karr

David Karr

Rainer Weikusat

David Karr

Jim

David Karr

David Karr

Rainer Weikusat

Jim

David Karr

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads