Web programming: issues with large amounts og data

Discussion in 'Perl Misc' started by Ted Byers, Dec 3, 2008.

  1. Ted Byers

    Ted Byers Guest

    Running on Windows XP and on Windows Server 2003, Activestate Perl
    5.8.8 and 5.10.0.

    In the script that is giving me troubles, I am using LWP::RobotUA,
    LWP::UserAgent, HTTP::Request, HTTP::Request::Common, and
    HTTP::Response. However, I have seen similar issues with
    Finance::QuoteHist::Yahoo.

    The problem manifests itself as the download terminating and the
    script appearing to freeze. This only happens when the download is
    large (many megabytes in size). i have a data feed that I can only
    access using POST to a given URI (with query parameters specifyin what
    data I am trying to retrieve. Never does this result in an error.
    The script just freezes and the download ends. So far, I have been
    able to work around this by modifying my scripts to break the download
    into smaller peices to be handled in a child process. With the one
    script, each download is only a few kbytes in size, but there are over
    9000 downloads of about the same size, and the total amount of data
    across all downloads appeas to be the key, Doing each of those 9000
    downloads in its own child process results in a happy, successfully
    completed job. Doing them in a loop in a single process results in
    unhappy failure.

    Does anyone know what I can look at to make these download scripts
    either finish successfully or give me a message saying there's too
    much data for the script to handle? More importantly, what is the
    most likely cause of this misbehaviour and how can it be fixed?

    NB: The scripts I'm using work flawlessly when I use parameters that
    are guaranteed to restrict the total amount of data to be handled by
    the script to a few dozen kbytes, and this is diligently checking for
    problems i know about.

    Thanks

    Ted
    Ted Byers, Dec 3, 2008
    #1
    1. Advertising

  2. Ted Byers

    Tim Greer Guest

    Ted Byers wrote:

    > The problem manifests itself as the download terminating and the
    > script appearing to freeze.


    There can be a number of reasons for this. Please post the relevant
    portions of the (Perl) code. This might not be a perl code issue.
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
    Tim Greer, Dec 4, 2008
    #2
    1. Advertising

  3. Ted Byers

    Ted Byers Guest

    On Dec 4, 12:09 am, Tim Greer <> wrote:
    > Ted Byers wrote:
    > > The problem manifests itself as the download terminating and the
    > > script appearing to freeze.

    >
    > There can be a number of reasons for this.  Please post the relevant
    > portions of the (Perl) code.  This might not be a perl code issue.
    > --
    > Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    > Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    > and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
    > Industry's most experienced staff! -- Web Hosting With Muscle!


    Hi Tim,

    Thanks.

    Here is a simple program that shows this problem:

    use strict;
    use XML::Twig;
    use DBI;
    use IO::File;
    use POSIX qw(strftime);
    use Text::parseWords;
    use LWP::RobotUA;
    use LWP::UserAgent;
    use HTTP::Request;
    use HTTP::Request::Common;
    use HTTP::Response;
    use Date::Manip;

    Date_Init("TZ=EST5EDT");

    my $affiliate;
    my $merchant;
    my $uid;
    my $pwd;
    my $opfile;

    my $db='test';
    my $hostname = 'localhost';
    my $port = '3306';
    my $user = 'xxxxxxx';
    my $dbpwd = 'yyyyyyyyyyy';

    my $dbh = DBI->connect("DBI:mysql:database=$db;host=$hostname",
    $user, $dbpwd, {RaiseError => 1});

    my $uri = "https://our.datafeed.site/query.php";
    my $req_url;
    #my $ua = LWP::UserAgent->new;

    my $now_string = strftime "%a %b %d %Y Generic", localtime;
    mkdir "$now_string";
    chdir "$now_string";

    my $start_date = $ARGV[0];
    my $end_date = $ARGV[1];

    my $query = "SELECT merchant_name, uid, pwd FROM merchants WHERE
    is_active = 1";

    my $sth = $dbh->prepare($query) or die $dbh->errstr;
    $sth->execute();
    while (my $ref = $sth->fetchrow_hashref()) {
    $merchant = $ref->{'merchant_name'};
    $uid = $ref->{'uid'};
    $pwd = $ref->{'pwd'};
    $req_url = "$uri?username=$uid&password=$pwd&start_date=
    $start_date&end_date=$end_date";
    STDOUT->print($req_url);STDOUT->print("\n\n");
    $opfile = "$uid.xml";
    STDOUT->print($opfile);STDOUT->print("\n");
    system("..\\generic_download_child.pl $uid \"$req_url\" 1>
    $uid.stdout 2>$uid.stderr");
    # open(MYOUTPUT,"> $opfile");
    # my $response = $ua->request(POST "$uri", ['username' => "$uid",
    'password' => "$pwd",'start_date' => "$start_date",'end_date' =>
    "$end_date"]);
    # if ($response->is_success) {
    # print MYOUTPUT $response->content;
    # print $response->content;
    # } else {
    # print STDERR $response->status_line, "\n";
    # }
    }
    chdir "..";


    Note, the string works for short time periods and a handful of
    merchants is the presently commented out code is uncommented and the
    call to generic_download_child.pl is commented out. But the only way
    I could make this work for one week's worth of data was to move the
    code that is commented out here into generic_download_child.pl . The
    script shown here, when used as a driver or master for
    generic_download_child.pl works fine for one week's worth of data,
    but some merchants have so much data that even
    generic_download_child.pl freezes if I try to get two weeks' worth of
    data at a time.

    What would you recommend I do to prevent this script from freeezing
    while still allowing retrieval of larger chuncks of data?

    In general, what would you add to this sort of script to detect any
    kind of failure and take appropriate remedial action should a problem
    arise? Sometimes the company that is hosting our machine has problems
    with their ISP that prevents our machine from connecting to anything,
    and there are occassional power issues: thus we are wrestling with
    finding ways of reliably automatically detecting problems and redoing
    downloads that have been interrupted. What would you be recommending
    as best practice? this is a bit new to me as my strengths in software
    engineering are more in the realm of high performance number crunching
    and mathematical/statistical analysis (I recently completed a robust
    empirical model of operational risk applicable to the services we
    provide to our clients and used to manage this risk so that no one
    client's problems can seriously damage us).

    Thanks

    Ted
    Ted Byers, Dec 4, 2008
    #3
  4. Ted Byers <> wrote:


    > mkdir "$now_string";



    You should test to see if you actually got what you asked for.

    See also:

    perldoc -q vars

    What's wrong with always quoting "$vars"?

    Then make the line above something like:

    mkdir $now_string or die "could not create '$now_string' directory $!";


    > In general, what would you add to this sort of script to detect any
    > kind of failure and take appropriate remedial action should a problem
    > arise?



    eval BLOCK

    perldoc -f eval


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Dec 4, 2008
    #4
  5. Ted Byers

    Ted Zlatanov Guest

    On Wed, 3 Dec 2008 15:19:29 -0800 (PST) Ted Byers <> wrote:

    TB> The problem manifests itself as the download terminating and the
    TB> script appearing to freeze. This only happens when the download is
    TB> large (many megabytes in size). i have a data feed that I can only
    TB> access using POST to a given URI (with query parameters specifyin what
    TB> data I am trying to retrieve. Never does this result in an error.
    TB> The script just freezes and the download ends.

    Try using `curl' from the command line to do the same large download.
    Does it also freeze (or rather, hang waiting)? If so, the problem is
    not in the Perl code.

    If `curl' does not hang, you can trace the HTTP exchange that `curl'
    does and compare it with the one that LWP does. That will tell you what
    is different between the two, and perhaps point you to the solution.

    Ted
    Ted Zlatanov, Dec 4, 2008
    #5
  6. Ted Byers

    Ted Byers Guest

    On Dec 4, 11:41 am, Ted Zlatanov <> wrote:
    > On Wed, 3 Dec 2008 15:19:29 -0800 (PST) Ted Byers <>wrote:
    >
    > TB> The problem manifests itself as the download terminating and the
    > TB> script appearing to freeze.  This only happens when the download is
    > TB> large (many megabytes in size).  i have a data feed that I can only
    > TB> access using POST to a given URI (with query parameters specifyin what
    > TB> data I am trying to retrieve.  Never does this result in an error.
    > TB> The script just freezes and the download ends.
    >
    > Try using `curl' from the command line to do the same large download.
    > Does it also freeze (or rather, hang waiting)?  If so, the problem is
    > not in the Perl code.
    >
    > If `curl' does not hang, you can trace the HTTP exchange that `curl'
    > does and compare it with the one that LWP does.  That will tell you what
    > is different between the two, and perhaps point you to the solution.
    >
    > Ted


    Hi Ted,

    Thanks.

    Where is 'curl' to be found?

    Thanks

    Ted
    Ted Byers, Dec 4, 2008
    #6
  7. Ted Byers <> wrote:


    > Where is 'curl' to be found?



    type

    curl

    into the little dialog box at http://www.google.com.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
    Tad J McClellan, Dec 4, 2008
    #7
  8. Ted Byers

    Guest

    On Wed, 3 Dec 2008 23:33:43 -0800 (PST), Ted Byers <> wrote:

    >On Dec 4, 12:09 am, Tim Greer <> wrote:
    >> Ted Byers wrote:
    >> > The problem manifests itself as the download terminating and the
    >> > script appearing to freeze.

    >>
    >> There can be a number of reasons for this.  Please post the relevant
    >> portions of the (Perl) code.  This might not be a perl code issue.
    >> --
    >> Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    >> Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    >> and Custom Hosting.  24/7 support, 30 day guarantee, secure servers.
    >> Industry's most experienced staff! -- Web Hosting With Muscle!

    >
    >Hi Tim,
    >
    >Thanks.
    >
    >Here is a simple program that shows this problem:
    >
    >use strict;
    >use XML::Twig;
    >use DBI;
    >use IO::File;
    >use POSIX qw(strftime);
    >use Text::parseWords;
    >use LWP::RobotUA;
    >use LWP::UserAgent;
    >use HTTP::Request;
    >use HTTP::Request::Common;
    >use HTTP::Response;
    >use Date::Manip;
    >
    >Date_Init("TZ=EST5EDT");
    >
    >my $affiliate;
    >my $merchant;
    >my $uid;
    >my $pwd;
    >my $opfile;
    >
    >my $db='test';
    >my $hostname = 'localhost';
    >my $port = '3306';
    >my $user = 'xxxxxxx';
    >my $dbpwd = 'yyyyyyyyyyy';
    >
    >my $dbh = DBI->connect("DBI:mysql:database=$db;host=$hostname",
    > $user, $dbpwd, {RaiseError => 1});
    >
    >my $uri = "https://our.datafeed.site/query.php";
    >my $req_url;
    >#my $ua = LWP::UserAgent->new;
    >
    >my $now_string = strftime "%a %b %d %Y Generic", localtime;
    >mkdir "$now_string";
    >chdir "$now_string";
    >
    >my $start_date = $ARGV[0];
    >my $end_date = $ARGV[1];
    >
    >my $query = "SELECT merchant_name, uid, pwd FROM merchants WHERE
    >is_active = 1";
    >
    >my $sth = $dbh->prepare($query) or die $dbh->errstr;
    >$sth->execute();
    >while (my $ref = $sth->fetchrow_hashref()) {
    > $merchant = $ref->{'merchant_name'};
    > $uid = $ref->{'uid'};
    > $pwd = $ref->{'pwd'};
    > $req_url = "$uri?username=$uid&password=$pwd&start_date=
    >$start_date&end_date=$end_date";
    > STDOUT->print($req_url);STDOUT->print("\n\n");
    > $opfile = "$uid.xml";
    > STDOUT->print($opfile);STDOUT->print("\n");


    > system("..\\generic_download_child.pl $uid \"$req_url\" 1>
    >$uid.stdout 2>$uid.stderr");


    You should get out of the habit of passing unquoted "$variables" to
    the command shell, even if you know they don't need quotes.

    system("..\\generic_download_child.pl \"$uid\" \"$req_url\" 1>\"$uid.stdout\" 2>\"$uid.stderr\"");
    ^^ ^^ ^^ ^^ ^^ ^^ ^^

    sln
    , Dec 4, 2008
    #8
  9. Ted Byers

    Guest

    On Thu, 04 Dec 2008 20:14:14 GMT, wrote:

    [snip]
    >> system("..\\generic_download_child.pl $uid \"$req_url\" 1>
    >>$uid.stdout 2>$uid.stderr");

    >
    >You should get out of the habit of passing unquoted "$variables" to
    >the command shell, even if you know they don't need quotes.
    >
    > system("..\\generic_download_child.pl \"$uid\" \"$req_url\" 1>\"$uid.stdout\" 2>\"$uid.stderr\"");
    > ^^ ^^ ^^ ^^ ^^ ^^ ^^
    >
    >sln


    I'm not saying this is your problem, but some other guy was trying to do
    the same thing, not quoting $variables passed to the command shell.

    Below is his problem and the fix. Just something to watch out for.

    sln

    ________________________________________________________
    Here is the simple loop with the problem:

    print "\n\nProcessing merchant data: \n";
    foreach $merchant(sort keys %merchants) {
    print "\tmerchant name: $merchant\n\tmid: $merchants{$merchant}\n
    \tAPI user name: $merchants_usernames{$merchant}\n\n";
    system("activity.report.1.pl \"$merchant\" $merchants_usernames
    {$merchant} $merchants{$merchant} $date_string 1>$dir\\activity.Report.
    $merchant.stdout 2>$dir\\activity.Report.$merchant.stderr") == 0
    or warn "Problem creating activity report for $merchant\n";
    }

    I am using "use strict" and $| = 1, from the top of the script.

    The hash is fully populated and there are no null values in any
    variable in the code shown.

    The problem is that for 90% of the merchants, the script
    activity.report.1.pl executes fine, producing the expected files
    (those with the contents of stdout and stderr, as well as the desired
    PDF file). For the rest, we get the merchant name, and the values
    from the two hashes (that contain values to be provided as arguments
    for the activity report script), followed by:

    The process cannot access the file because it is being used by another
    process.
    Problem creating activity report for xxxxxxxxxxx xxxxxxx

    The second line shown here is obviously from the die clause, but where
    is the message "The process cannot access the file because it is being
    used by another process." coming from? None of the information that
    is written to stdout by the activity report script appears (e.g. the
    first statement is to write to standard out the values of the
    arguments), and nothing appears to be written to stderr either. The
    files that ought to contain what is supposed to be written to stdout
    and stderr don't even get created, hence my guess that the problem is
    with the call to system.

    Having previously printed out the contents of the hashes, I know the
    correct arguments are being passed to "activity.report.1.pl" (from
    comparing what was printed to the contents of the database table from
    which the values are obtained, and what the activity report script
    prints to standard out for the arguments it has received), and
    activity.report.1.pl always runs to completion successfully when I
    invoke it manually with all the same arguments that this script uses,
    so I am at a loss. Why is my call to system failing (or how can I
    find out), and what can be done to fix it? And why would it fail for
    only 10% of the merchants?

    Thanks

    Ted
    ==========================================================================



    I posted an arglist with each one quoted. That is the correct fix with one
    small change. I didn't see the redirect parameters.

    Below does both show the real problem(s) in 'Consideration #1 and #2' and
    the fix in 'Consideration #3 and #4'.

    There is only one problem when the OP gets 'used by another process' message.


    sln
    ============================================================================

    Command line:
    ____________________________
    c:\temp>perl ttt.pl

    Consideration #1:
    ---------------------------
    The process cannot access the file because it is being used by another process.
    problem with jjj.pl: 256

    Consideration #2:
    ---------------------------

    Consideration #3:
    ---------------------------
    Consideration #4:
    ---------------------------

    c:\temp>

    ____________________________________________________________

    -----------------------------------------
    ttt.pl:
    -----------------------------------------
    use strict;
    use warnings;

    my @args;
    my $dir = "c:\\temp";
    my $merchant = "Home Depot";

    my %merchants = ($merchant => 'home depot');
    my %merchants_usernames = ($merchant => 'H DEPOT');
    my $date_string = "Jan 1, 2009";

    print "Consideration #1:\n---------------------------\n";
    system("perl jjj.pl \"$merchant\" $merchants_usernames{$merchant} $merchants{$merchant} $date_string 1>$dir\\activity.Report.$merchant.stdout1 2>$dir\\activity.Report.$merchant.stderr1") == 0
    or warn "problem with jjj.pl: $?\n";

    print "\nConsideration #2:\n---------------------------\n";
    @args = (
    'jjj.pl',
    "\"$merchant\"",
    $merchants_usernames{$merchant},
    $merchants{$merchant},
    $date_string,
    "1>\"$dir\\activity.Report.$merchant.stdout2\"",
    "2>\"$dir\\activity.Report.$merchant.stderr2\"",
    );
    system("perl @args") == 0 or warn "problem with jjj.pl: $?";

    print "\nConsideration #3:\n---------------------------\n";
    @args = (
    'jjj.pl',
    "\"$merchant\"",
    "\"$merchants_usernames{$merchant}\"",
    "\"$merchants{$merchant}\"",
    "\"$date_string\"",
    "1>\"$dir\\activity.Report.$merchant.stdout3\"",
    "2>\"$dir\\activity.Report.$merchant.stderr3\"",
    );
    system("perl @args") == 0 or warn "problem with jjj.pl: $?";


    print "Consideration #4:\n---------------------------\n";
    system("perl jjj.pl \"$merchant\" \"$merchants_usernames{$merchant}\" \"$merchants{$merchant}\" \"$date_string\" 1>\"$dir\\activity.Report.$merchant.stdout4\"
    2>\"$dir\\activity.Report.$merchant.stderr4\"") == 0
    or warn "problem with jjj.pl: $?\n";



    ----------------------------------------
    jjj.pl:
    ----------------------------------------
    use strict;
    use warnings;

    for (@ARGV) {
    print "$_\n";
    }


    activity.Report.Home:
    -------------------------------------


    activity.Report.Home Depot.stdout2:
    --------------------------------------
    Home Depot
    H
    DEPOT
    home
    depot
    Jan
    1,
    2009


    activity.Report.Home Depot.stdout3:
    activity.Report.Home Depot.stdout4:
    ---------------------------------------
    Home Depot
    H DEPOT
    home depot
    Jan 1, 2009
    , Dec 4, 2008
    #9
  10. Ted Byers

    Ron Bergin Guest

    On Dec 4, 12:14 pm, wrote:
    >
    > You should get out of the habit of passing unquoted "$variables" to
    > the command shell, even if you know they don't need quotes.
    >
    > system("..\\generic_download_child.pl \"$uid\" \"$req_url\" 1>\"$uid.stdout\" 2>\"$uid.stderr\"");
    > ^^ ^^ ^^ ^^ ^^ ^^ ^^
    >
    > sln


    Personally, I think this is cleaner.

    system(qq/..\\generic_download_child.pl "$uid" "$req_url"
    1>"$uid.stdout" 2>"$uid.stderr"/);
    Ron Bergin, Dec 5, 2008
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brent
    Replies:
    3
    Views:
    496
    Raymond Lewallen
    Apr 7, 2004
  2. Andersen
    Replies:
    2
    Views:
    388
  3. Bint
    Replies:
    1
    Views:
    1,859
    Benjamin Niemann
    Mar 19, 2006
  4. David
    Replies:
    0
    Views:
    326
    David
    Jun 21, 2006
  5. flamesrock
    Replies:
    2
    Views:
    375
    flamesrock
    Mar 12, 2005
Loading...

Share This Page