Script using LWP::UserAgent is sometimes failing with 500 error,although server reports 200

D

David Karr

I have a Perl script that uses LWP::UserAgent to run a series of tests against a REST service running in an intranet. I configure it with a host and port to run its tests against. I've been using it for a while with no significant problems. Yesterday I started running tests against another serverwhere my service was just deployed. This is a "final stage" environment, prior to production. My service is already deployed to production, this isjust an additional release.

For some reason, while running its tests against this new server, after getting numerous successful results back, the script is failing at no particular request with "500 read failed: Software caused connection abort". Even more curious, when I check the "access.log" on the server for that particular request, it reports a 200.

When I try to run that specific request manually, it always works fine. I also just checked with a QA tester that's using SoapUI to do similar testing, and he hasn't seen any problems.

So, it seems like there's something in my Perl script that is causing some sort of a race condition that makes it think a request fails, when the server sees no problem.

I'll show the head of my Perl script, along with the excerpt of the code that makes the request and checks for success.

-------------------
#!/usr/bin/perl -w
# -*- mode: Perl; -*-
use threads;
use threads::shared;
use Thread::pool;
use Getopt::Long;
use LWP::UserAgent;
use HTTP::Request::Common qw(GET);
use XML::XPath;
use XML::XPath::XMLParser;
use Time::HiRes qw/gettimeofday/;
use List::MoreUtils qw(uniq);

....

sub sendGet($) {
my ($url) = @_;
if ($url =~ /\?/) {
foreach my $param (@opt_params) {
$url = $url . "&" . $param;
}
}
else {
$url = $url . "?";
foreach my $param (@opt_params) {
$url = $url . $param . "&";
}
}
print localtime() . ": url[$url]\n";
my $request = GET $url;
$request->header("X-Client-Code", "abc");
eval {
my $response = $ua->request($request);
if ($response->is_success) {
return $response->decoded_content;
}
else {
print "Call to url \"" . $url . "\" failed: " .
$response->status_line . "\n";
}
1;
} or do {
print "Call to url \"" . $url . "\" failed.\n";
}
}
 
R

Rainer Weikusat

David Karr said:
I have a Perl script that uses LWP::UserAgent to run a series of
tests against a REST service running in an intranet. I configure it
with a host and port to run its tests against. I've been using it
for a while with no significant problems. Yesterday I started
running tests against another server where my service was just
deployed. This is a "final stage" environment, prior to production.
My service is already deployed to production, this is just an
additional release.

For some reason, while running its tests against this new server,
after getting numerous successful results back, the script is
failing at no particular request with "500 read failed: Software
caused connection abort".

http://www.google.com/#hl=en&cp=32&...gc.r_pw.&fp=c22dbf1ad5dc4c1f&biw=1598&bih=847
 
D

David Karr

So far we've discovered that the URL I'm using is going through an F5 to the web server, and if I change the test to go directly to the web server, itdoesn't fail. Going through the F5 I get one random failure after it's been running for a while (the script exits on the first failure). So far thenetwork engineer hasn't found any obvious configuration details on the F5 that might cause this. Note that my requests are not using SSL.

It's pretty clear this isn't really a Perl problem, but something about howit works in my script is making the race condition surface.
 
J

Jim

So far we've discovered that the URL I'm using is going through an F5 to the web server, and if I change the test to go directly to the web server, it doesn't fail. Going through the F5 I get one random failure after it's been running for a while (the script exits on the first failure). So far the network engineer hasn't found any obvious configuration details on the F5 that might cause this. Note that my requests are not using SSL.

It's pretty clear this isn't really a Perl problem, but something about how it works in my script is making the race condition surface.

Another headbanger gotcha is a bad network card. Or a loose connector
- I've had both of those in the same situation.
 
D

David Karr

If it matters, we've now ported the entire script to a Solaris VM, and it does not display this symptom, so there's something about Perl in Cygwin and Windows, combined with the F5, that is causing this symptom.
 
D

David Karr

I understand, but this test is running on my laptop, which I use heavily every single day. I suppose it's entirely possible there's a problem in the network card, but I've never seen any particular issue with network connections on this box.
 
R

Rainer Weikusat

David Karr said:
If it matters, we've now ported the entire script to a Solaris VM,
and it does not display this symptom, so there's something about
Perl in Cygwin and Windows, combined with the F5, that is causing
this symptom.

As the abundance of links returned by the Google search request I
posted should have told you, 'software caused connection abort' on
Windows means 'Winsock decided to kill the connection for some
reason'. If you're interested in what happened there, try a traffic
capture 'between' the Windows machine and its counterpart. But the
results of that are (with a very high probability) 'for educational
purposes only'. Practically, this just means when the software needs
to work reliably despite Windows being used, it needs to be able to
deal with essentially gratuitous connection aborts. A sensible
strategy would be to retry the request a couple of times with some
delay between the retries (using exponential backoff in order to cope
with problems a la 'someone disconnected a cable' might make sense).
 
J

Jim

I understand, but this test is running on my laptop, which I use heavily every single day. I suppose it's entirely possible there's a problem in the network card, but I've never seen any particular issue with network connections on this box.

Having watched the thread, it appeared you had been connected to one
server for development, then another for testing (that became flakey).

It really does look like something lower in the communications stack
than the running Perl code -- something closer to the SMTP or IP
layer.. I used to see this when tuning mail systems, and it was
always either hardware, or handshake timing, external to the
application code (once it was debugged of course), somewhere in the
communications or protocol layers.
 
D

David Karr

That makes perfect sense. I realized this is what the Java frameworks I usually use for http connections do, LWP::UserAgent just doesn't do it automatically. I implemented a simple retry loop with configurable retries, and now it's perfectly happy.

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top