Very Sluggish Code

GlenM · Jun 11, 2012

Hi;

I was hoping someone could help me shed some light on this.

I have a Perl script, which I will post the contents below. It seems to get 'stuck' and I have to do an actually kill of the process.

I ran it using the Perl debugger (perl -d <script.pl>) and it seems to slow right down when reading back the array.

However, it does work - it does populate the database with the XML files as it is supposed to - so, that much I know.

So, any assistance I can get. I would much appreciate the help.

++++++++++++++++++++++++++++++
#!/usr/bin/perl

#use strict;
use DBI;
use XML::XPath;
use XML::XPath::XMLParser;

# Set the input dir where all of the XML files live
my $input_dir = ".";

# Begin reading the directory
opendir(DIR, $input_dir) || die "sorry shit the bed $input_dir: $!";

# Read them into an array
my @files_in_dir = grep { /xml/ } readdir(DIR);
closedir DIR;

# connect to database and create parser object
my $dbh = DBI->connect ("DBI:mysql:can_us_ratecenters",
"xxxxx", "xxxxxxx",
{ RaiseError => 1, PrintError => 0});

# clear the database - new DIDs coming in
$dbh->do ('TRUNCATE TABLE rc_city_town');

#Now the fun begins - read each file and put it in the database
foreach my $f (@files_in_dir) {
open IN, "<$f";

my $xp = XML::XPath->new (filename => "./$f");
my $nodelist = $xp->find ("//row");
foreach my $row ($nodelist->get_nodelist ())
{
$dbh->do (
"INSERT IGNORE INTO rc_city_town (state_prov, city_town, did_number) VALUES (?,?,?)",
undef,
$row->find ("state")->string_value (),
$row->find ("ratecenter")->string_value (),
$row->find ("number")->string_value (),
);
}
close IN;
}
$dbh->disconnect ();

++++++++++++++++++++++++++++++++++++++++++++==

Dave Saville · Jun 12, 2012

Use a variable for the filehandle, as above. Also use 3-arg open, and
check the return value.

Hi Ben, that's the second time I have seen you advocate 3-arg open. I
think I now understand using a variable for the filehandle but I fail
to see the difference between "<foo" and "<", "foo".

<snip>

Rainer Weikusat · Jun 12, 2012

Dave Saville said:
[...]

Use a variable for the filehandle, as above. Also use 3-arg open, and
check the return value.

Click to expand...

Hi Ben, that's the second time I have seen you advocate 3-arg open. I
think I now understand using a variable for the filehandle but I fail
to see the difference between "<foo" and "<", "foo".

Logically, the open mode and the pathname to open are two different
things and putting them into the same string argument which either the
compiler or the runtime environment then need to take apart again by
parsing it wasn't a good idea: The 'open mode' characters Perl uses
are usually perfectly valid although practically somewhat 'rare' filename
characters as well.

Somewhat contrived example:

----------------------------
sub open_for_reading($)
{
my $fh;

open($fh, '<'.$_[0]) // die("open: $_[0]: $!");
return $fh;
}

my $fh0;

$fh0 = open_for_reading('&STDIN');
-----------------------------

This won't try to open the file named &STDIN for reading but will dup
the STDIN file handle instead. Given a situation like this one, it is
also somewhat stupid to first execute code to concatenate name and
mode and then execute code to take them apart again.

Peter J. Holzer · Jun 12, 2012

Is the process using a huge amount of memory at that point? Is the
system swapping?

Good questions.

And questions like these (also: Is the code CPU-bound or disk-bound) can
usually be answered without looking at the code, and can guide you in
the right direction.

But when optimizing some specific code, the most important question is:

Where is it spending most of its time?

You can use a profiler for this (Devel::NYTProf is quite good), but for
a first estimate a few well-placed print statements logging the elapsed
time since program start are often sufficient (and they distort
run-times much less than a profiler).

This will in principle go faster if you use ->prepare and ->execute
rather than ->do. I don't actually know whether DBD::mysql supports
server-side prepared statements, so it's possible this will make no
difference in practice.

I don't know about mysql (recent versions do support server-side
prepared statements, and DBD::mysql supports them, but I haven't run any
benchmarks), but IME (mostly with Oracle) using prepare/execute ist
noticably but not spectacularly faster. This is not surprising:
Preparing an insert is a simple and fast operation (unless you insert
into a view joining a dozen tables), so there is not much time to be
saved. Mostly you save the round-trip time to the database server.

Using array inserts OTOH can result in a massive speedup on databases
which support it. I can't find my old benchmark results (for Oracle) at
the moment, but I think I've seen a speedup of about 2 orders of
magnitude for some workloads. Unfortunately, using multi-row inserts on
MySQL ("insert into ... values(...) values(...) values(...) ...") didn't
gain much last time I tried it.

hp

Rainer Weikusat · Jun 13, 2012

Rainer Weikusat said:
[...]

Hi Ben, that's the second time I have seen you advocate 3-arg open. I
think I now understand using a variable for the filehandle but I fail
to see the difference between "<foo" and "<", "foo".

Click to expand...

Logically, the open mode and the pathname to open are two different
things and putting them into the same string argument which either the
compiler or the runtime environment then need to take apart again by
parsing it wasn't a good idea:

Additional remark: Very likely, this idea came from the UNIX(*)
(Bourne) shell which does the same thing (and mostly uses the same
'open mode' characters). But while the shell provides a very useful
programming language, being able to use it without Extreme Pain[tm]
relies on users of the system exercising 'wise restraint' when
creating named objects in the file system, ie, deliberatly avoiding
names the parser of the shell might trip over. If this cannot be
guaranteed, eg, because these users have either really no idea how this
parser works or might even be trying to trick scripts into performing
unintended operations with the help of suitable filename arguments,
every filesystem name needs to be quoted before using it and this
'quoting' is something people either don't bother to do or get wrong
(and even if they don't, that's nothing but code working around the
limitations of some parser). If this is easily possible, such as with
'3-arg open' in Perl, it is much better to avoid the issue altogether,
at least for software supposed to be generally useful/ usable.

How to make XML::XPath ignore namespaces?	0	May 21, 2013
Perl to SQLite bridge is not working, database connect fails ....	3	Apr 8, 2014
Php code issue	0	Aug 28, 2024
Need suggestions to improve code	2	Aug 24, 2013
yet another question about numbers and strings	3	Mar 15, 2012
DBI:Pg script returning 'can't call method' error	3	Aug 2, 2008
How to read big XML files using Perl - Memory Problem	4	Nov 21, 2005
problem with IO:Socket	2	Sep 24, 2009

Very Sluggish Code

GlenM

Dave Saville

Rainer Weikusat

Peter J. Holzer

Rainer Weikusat

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads