Web programming: issues with large amounts og data

T

Ted Byers

Running on Windows XP and on Windows Server 2003, Activestate Perl
5.8.8 and 5.10.0.

In the script that is giving me troubles, I am using LWP::RobotUA,
LWP::UserAgent, HTTP::Request, HTTP::Request::Common, and
HTTP::Response. However, I have seen similar issues with
Finance::QuoteHist::Yahoo.

The problem manifests itself as the download terminating and the
script appearing to freeze. This only happens when the download is
large (many megabytes in size). i have a data feed that I can only
access using POST to a given URI (with query parameters specifyin what
data I am trying to retrieve. Never does this result in an error.
The script just freezes and the download ends. So far, I have been
able to work around this by modifying my scripts to break the download
into smaller peices to be handled in a child process. With the one
script, each download is only a few kbytes in size, but there are over
9000 downloads of about the same size, and the total amount of data
across all downloads appeas to be the key, Doing each of those 9000
downloads in its own child process results in a happy, successfully
completed job. Doing them in a loop in a single process results in
unhappy failure.

Does anyone know what I can look at to make these download scripts
either finish successfully or give me a message saying there's too
much data for the script to handle? More importantly, what is the
most likely cause of this misbehaviour and how can it be fixed?

NB: The scripts I'm using work flawlessly when I use parameters that
are guaranteed to restrict the total amount of data to be handled by
the script to a few dozen kbytes, and this is diligently checking for
problems i know about.

Thanks

Ted
 
T

Tim Greer

Ted said:
The problem manifests itself as the download terminating and the
script appearing to freeze.

There can be a number of reasons for this. Please post the relevant
portions of the (Perl) code. This might not be a perl code issue.
 
T

Ted Byers

There can be a number of reasons for this.  Please post the relevant
portions of the (Perl) code.  This might not be a perl code issue.
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!

Hi Tim,

Thanks.

Here is a simple program that shows this problem:

use strict;
use XML::Twig;
use DBI;
use IO::File;
use POSIX qw(strftime);
use Text::parseWords;
use LWP::RobotUA;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Request::Common;
use HTTP::Response;
use Date::Manip;

Date_Init("TZ=EST5EDT");

my $affiliate;
my $merchant;
my $uid;
my $pwd;
my $opfile;

my $db='test';
my $hostname = 'localhost';
my $port = '3306';
my $user = 'xxxxxxx';
my $dbpwd = 'yyyyyyyyyyy';

my $dbh = DBI->connect("DBI:mysql:database=$db;host=$hostname",
$user, $dbpwd, {RaiseError => 1});

my $uri = "https://our.datafeed.site/query.php";
my $req_url;
#my $ua = LWP::UserAgent->new;

my $now_string = strftime "%a %b %d %Y Generic", localtime;
mkdir "$now_string";
chdir "$now_string";

my $start_date = $ARGV[0];
my $end_date = $ARGV[1];

my $query = "SELECT merchant_name, uid, pwd FROM merchants WHERE
is_active = 1";

my $sth = $dbh->prepare($query) or die $dbh->errstr;
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
$merchant = $ref->{'merchant_name'};
$uid = $ref->{'uid'};
$pwd = $ref->{'pwd'};
$req_url = "$uri?username=$uid&password=$pwd&start_date=
$start_date&end_date=$end_date";
STDOUT->print($req_url);STDOUT->print("\n\n");
$opfile = "$uid.xml";
STDOUT->print($opfile);STDOUT->print("\n");
system("..\\generic_download_child.pl $uid \"$req_url\" 1>
$uid.stdout 2>$uid.stderr");
# open(MYOUTPUT,"> $opfile");
# my $response = $ua->request(POST "$uri", ['username' => "$uid",
'password' => "$pwd",'start_date' => "$start_date",'end_date' =>
"$end_date"]);
# if ($response->is_success) {
# print MYOUTPUT $response->content;
# print $response->content;
# } else {
# print STDERR $response->status_line, "\n";
# }
}
chdir "..";


Note, the string works for short time periods and a handful of
merchants is the presently commented out code is uncommented and the
call to generic_download_child.pl is commented out. But the only way
I could make this work for one week's worth of data was to move the
code that is commented out here into generic_download_child.pl . The
script shown here, when used as a driver or master for
generic_download_child.pl works fine for one week's worth of data,
but some merchants have so much data that even
generic_download_child.pl freezes if I try to get two weeks' worth of
data at a time.

What would you recommend I do to prevent this script from freeezing
while still allowing retrieval of larger chuncks of data?

In general, what would you add to this sort of script to detect any
kind of failure and take appropriate remedial action should a problem
arise? Sometimes the company that is hosting our machine has problems
with their ISP that prevents our machine from connecting to anything,
and there are occassional power issues: thus we are wrestling with
finding ways of reliably automatically detecting problems and redoing
downloads that have been interrupted. What would you be recommending
as best practice? this is a bit new to me as my strengths in software
engineering are more in the realm of high performance number crunching
and mathematical/statistical analysis (I recently completed a robust
empirical model of operational risk applicable to the services we
provide to our clients and used to manage this risk so that no one
client's problems can seriously damage us).

Thanks

Ted
 
T

Tad J McClellan

mkdir "$now_string";


You should test to see if you actually got what you asked for.

See also:

perldoc -q vars

What's wrong with always quoting "$vars"?

Then make the line above something like:

mkdir $now_string or die "could not create '$now_string' directory $!";

In general, what would you add to this sort of script to detect any
kind of failure and take appropriate remedial action should a problem
arise?


eval BLOCK

perldoc -f eval
 
T

Ted Zlatanov

TB> The problem manifests itself as the download terminating and the
TB> script appearing to freeze. This only happens when the download is
TB> large (many megabytes in size). i have a data feed that I can only
TB> access using POST to a given URI (with query parameters specifyin what
TB> data I am trying to retrieve. Never does this result in an error.
TB> The script just freezes and the download ends.

Try using `curl' from the command line to do the same large download.
Does it also freeze (or rather, hang waiting)? If so, the problem is
not in the Perl code.

If `curl' does not hang, you can trace the HTTP exchange that `curl'
does and compare it with the one that LWP does. That will tell you what
is different between the two, and perhaps point you to the solution.

Ted
 
T

Ted Byers

TB> The problem manifests itself as the download terminating and the
TB> script appearing to freeze.  This only happens when the download is
TB> large (many megabytes in size).  i have a data feed that I can only
TB> access using POST to a given URI (with query parameters specifyin what
TB> data I am trying to retrieve.  Never does this result in an error.
TB> The script just freezes and the download ends.

Try using `curl' from the command line to do the same large download.
Does it also freeze (or rather, hang waiting)?  If so, the problem is
not in the Perl code.

If `curl' does not hang, you can trace the HTTP exchange that `curl'
does and compare it with the one that LWP does.  That will tell you what
is different between the two, and perhaps point you to the solution.

Ted

Hi Ted,

Thanks.

Where is 'curl' to be found?

Thanks

Ted
 
S

sln

There can be a number of reasons for this.  Please post the relevant
portions of the (Perl) code.  This might not be a perl code issue.
--
Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
Industry's most experienced staff! -- Web Hosting With Muscle!

Hi Tim,

Thanks.

Here is a simple program that shows this problem:

use strict;
use XML::Twig;
use DBI;
use IO::File;
use POSIX qw(strftime);
use Text::parseWords;
use LWP::RobotUA;
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Request::Common;
use HTTP::Response;
use Date::Manip;

Date_Init("TZ=EST5EDT");

my $affiliate;
my $merchant;
my $uid;
my $pwd;
my $opfile;

my $db='test';
my $hostname = 'localhost';
my $port = '3306';
my $user = 'xxxxxxx';
my $dbpwd = 'yyyyyyyyyyy';

my $dbh = DBI->connect("DBI:mysql:database=$db;host=$hostname",
$user, $dbpwd, {RaiseError => 1});

my $uri = "https://our.datafeed.site/query.php";
my $req_url;
#my $ua = LWP::UserAgent->new;

my $now_string = strftime "%a %b %d %Y Generic", localtime;
mkdir "$now_string";
chdir "$now_string";

my $start_date = $ARGV[0];
my $end_date = $ARGV[1];

my $query = "SELECT merchant_name, uid, pwd FROM merchants WHERE
is_active = 1";

my $sth = $dbh->prepare($query) or die $dbh->errstr;
$sth->execute();
while (my $ref = $sth->fetchrow_hashref()) {
$merchant = $ref->{'merchant_name'};
$uid = $ref->{'uid'};
$pwd = $ref->{'pwd'};
$req_url = "$uri?username=$uid&password=$pwd&start_date=
$start_date&end_date=$end_date";
STDOUT->print($req_url);STDOUT->print("\n\n");
$opfile = "$uid.xml";
STDOUT->print($opfile);STDOUT->print("\n");
system("..\\generic_download_child.pl $uid \"$req_url\" 1>
$uid.stdout 2>$uid.stderr");

You should get out of the habit of passing unquoted "$variables" to
the command shell, even if you know they don't need quotes.

system("..\\generic_download_child.pl \"$uid\" \"$req_url\" 1>\"$uid.stdout\" 2>\"$uid.stderr\"");
^^ ^^ ^^ ^^ ^^ ^^ ^^

sln
 
S

sln

On Thu, 04 Dec 2008 20:14:14 GMT, (e-mail address removed) wrote:

[snip]
You should get out of the habit of passing unquoted "$variables" to
the command shell, even if you know they don't need quotes.

system("..\\generic_download_child.pl \"$uid\" \"$req_url\" 1>\"$uid.stdout\" 2>\"$uid.stderr\"");
^^ ^^ ^^ ^^ ^^ ^^ ^^

sln

I'm not saying this is your problem, but some other guy was trying to do
the same thing, not quoting $variables passed to the command shell.

Below is his problem and the fix. Just something to watch out for.

sln

________________________________________________________
Here is the simple loop with the problem:

print "\n\nProcessing merchant data: \n";
foreach $merchant(sort keys %merchants) {
print "\tmerchant name: $merchant\n\tmid: $merchants{$merchant}\n
\tAPI user name: $merchants_usernames{$merchant}\n\n";
system("activity.report.1.pl \"$merchant\" $merchants_usernames
{$merchant} $merchants{$merchant} $date_string 1>$dir\\activity.Report.
$merchant.stdout 2>$dir\\activity.Report.$merchant.stderr") == 0
or warn "Problem creating activity report for $merchant\n";
}

I am using "use strict" and $| = 1, from the top of the script.

The hash is fully populated and there are no null values in any
variable in the code shown.

The problem is that for 90% of the merchants, the script
activity.report.1.pl executes fine, producing the expected files
(those with the contents of stdout and stderr, as well as the desired
PDF file). For the rest, we get the merchant name, and the values
from the two hashes (that contain values to be provided as arguments
for the activity report script), followed by:

The process cannot access the file because it is being used by another
process.
Problem creating activity report for xxxxxxxxxxx xxxxxxx

The second line shown here is obviously from the die clause, but where
is the message "The process cannot access the file because it is being
used by another process." coming from? None of the information that
is written to stdout by the activity report script appears (e.g. the
first statement is to write to standard out the values of the
arguments), and nothing appears to be written to stderr either. The
files that ought to contain what is supposed to be written to stdout
and stderr don't even get created, hence my guess that the problem is
with the call to system.

Having previously printed out the contents of the hashes, I know the
correct arguments are being passed to "activity.report.1.pl" (from
comparing what was printed to the contents of the database table from
which the values are obtained, and what the activity report script
prints to standard out for the arguments it has received), and
activity.report.1.pl always runs to completion successfully when I
invoke it manually with all the same arguments that this script uses,
so I am at a loss. Why is my call to system failing (or how can I
find out), and what can be done to fix it? And why would it fail for
only 10% of the merchants?

Thanks

Ted
==========================================================================



I posted an arglist with each one quoted. That is the correct fix with one
small change. I didn't see the redirect parameters.

Below does both show the real problem(s) in 'Consideration #1 and #2' and
the fix in 'Consideration #3 and #4'.

There is only one problem when the OP gets 'used by another process' message.


sln
============================================================================

Command line:
____________________________
c:\temp>perl ttt.pl

Consideration #1:
---------------------------
The process cannot access the file because it is being used by another process.
problem with jjj.pl: 256

Consideration #2:
---------------------------

Consideration #3:
---------------------------
Consideration #4:
---------------------------

c:\temp>

____________________________________________________________

-----------------------------------------
ttt.pl:
-----------------------------------------
use strict;
use warnings;

my @args;
my $dir = "c:\\temp";
my $merchant = "Home Depot";

my %merchants = ($merchant => 'home depot');
my %merchants_usernames = ($merchant => 'H DEPOT');
my $date_string = "Jan 1, 2009";

print "Consideration #1:\n---------------------------\n";
system("perl jjj.pl \"$merchant\" $merchants_usernames{$merchant} $merchants{$merchant} $date_string 1>$dir\\activity.Report.$merchant.stdout1 2>$dir\\activity.Report.$merchant.stderr1") == 0
or warn "problem with jjj.pl: $?\n";

print "\nConsideration #2:\n---------------------------\n";
@args = (
'jjj.pl',
"\"$merchant\"",
$merchants_usernames{$merchant},
$merchants{$merchant},
$date_string,
"1>\"$dir\\activity.Report.$merchant.stdout2\"",
"2>\"$dir\\activity.Report.$merchant.stderr2\"",
);
system("perl @args") == 0 or warn "problem with jjj.pl: $?";

print "\nConsideration #3:\n---------------------------\n";
@args = (
'jjj.pl',
"\"$merchant\"",
"\"$merchants_usernames{$merchant}\"",
"\"$merchants{$merchant}\"",
"\"$date_string\"",
"1>\"$dir\\activity.Report.$merchant.stdout3\"",
"2>\"$dir\\activity.Report.$merchant.stderr3\"",
);
system("perl @args") == 0 or warn "problem with jjj.pl: $?";


print "Consideration #4:\n---------------------------\n";
system("perl jjj.pl \"$merchant\" \"$merchants_usernames{$merchant}\" \"$merchants{$merchant}\" \"$date_string\" 1>\"$dir\\activity.Report.$merchant.stdout4\"
2>\"$dir\\activity.Report.$merchant.stderr4\"") == 0
or warn "problem with jjj.pl: $?\n";



----------------------------------------
jjj.pl:
----------------------------------------
use strict;
use warnings;

for (@ARGV) {
print "$_\n";
}


activity.Report.Home:
-------------------------------------


activity.Report.Home Depot.stdout2:
--------------------------------------
Home Depot
H
DEPOT
home
depot
Jan
1,
2009


activity.Report.Home Depot.stdout3:
activity.Report.Home Depot.stdout4:
 
R

Ron Bergin

You should get out of the habit of passing unquoted "$variables" to
the command shell, even if you know they don't need quotes.

system("..\\generic_download_child.pl \"$uid\" \"$req_url\" 1>\"$uid.stdout\" 2>\"$uid.stderr\"");
^^ ^^ ^^ ^^ ^^ ^^ ^^

sln

Personally, I think this is cleaner.

system(qq/..\\generic_download_child.pl "$uid" "$req_url"
1>"$uid.stdout" 2>"$uid.stderr"/);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top