Speeding up writes to STDOUT

  • Thread starter enjoylife_95135
  • Start date
E

enjoylife_95135

So my little Perl program is reading a giant chunk of data from a MySql
DB, processing it, and then writing it out to STDOUT.

I have the Mysql DB read down, thanks to earlier help from the kind
folks here. I don't believe I can speed up the memory processing, it
takes about 2.5 mins right now to process about 2M entries, and given
the nature of what I'm doing I can't ask for anything better.

But writing to STDOUT is slow, and worse, there is a lot of variation
depending on when I run it. So run1 took 2m, while run 5 took 5m 8s.

I ran Devel::DProf, which informed me that the offending line is a
printf() statement. I read elsewhere on google that printf's tend to be
slow because of the overhead added by formatting.

Wondering if there were some tricks I could use to speed things up?

The line currently looks like:
printf "%10d %10d %20s %20s %5d %s\n",
$f,$pin,$str1,$str2,$filesize,$tstring;

Thanks everyone.
EL
 
U

Uri Guttman

e9> So my little Perl program is reading a giant chunk of data from a MySql
e9> DB, processing it, and then writing it out to STDOUT.

e9> I ran Devel::DProf, which informed me that the offending line is a
e9> printf() statement. I read elsewhere on google that printf's tend to be
e9> slow because of the overhead added by formatting.

i bet it isn't the formatting but the stdio overhead. but you should
look at both of them.

e9> The line currently looks like:
e9> printf "%10d %10d %20s %20s %5d %s\n",
e9> $f,$pin,$str1,$str2,$filesize,$tstring;

why do you need printf there? you don't do any serious formatting, just
some extra whitespace and maybe column alignment. is the alignment
needed?

my main suggestion is to delay printing and do you your own
buffering. build up a string using .= ops and sprintf:

my $out_text ;

then do these types of lines:

$out_text .= sprintf "%10d %10d %20s %20s %5d $tstring\n",
$f, $pin, $str1, $str2, $filesize ;

after you are done with all the sprintf calls just print that buffer.

print $out_text ;

i call this print rarely, print late.

uri
 
J

John Bokma

So my little Perl program is reading a giant chunk of data from a MySql
DB, processing it, and then writing it out to STDOUT.

I have the Mysql DB read down, thanks to earlier help from the kind
folks here. I don't believe I can speed up the memory processing, it
takes about 2.5 mins right now to process about 2M entries, and given
the nature of what I'm doing I can't ask for anything better.

But writing to STDOUT is slow, and worse, there is a lot of variation
depending on when I run it. So run1 took 2m, while run 5 took 5m 8s.

I ran Devel::DProf, which informed me that the offending line is a
printf() statement. I read elsewhere on google that printf's tend to be
slow because of the overhead added by formatting.

Wondering if there were some tricks I could use to speed things up?

Maybe let the db engine do the formatting for you?

Also, no idea what processing is taking place, but I prefer to let the db
engine do as much as possible.
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Uri Guttman
my main suggestion is to delay printing and do you your own
buffering. build up a string using .= ops and sprintf:

my $out_text ;

then do these types of lines:

$out_text .= sprintf "%10d %10d %20s %20s %5d $tstring\n",
$f, $pin, $str1, $str2, $filesize ;

after you are done with all the sprintf calls just print that buffer.

print $out_text ;

i call this print rarely, print late.

The only thought which comes to me is: what were you smoking? Do you
think that buffering done in Perl is going to be better than buffering
done in C?

(I do not think your suggestion is necessarily bullshit ;-); given
what kind of lunacy is the current implementation of PerlIO, one
should not automatically reject such idea; but I would say the chance
of PerlIO being SO broken is not high.) Anyway, the OP did not
specify enough info; at least, one could try setting binmode() (if
appropriate).

Another idea is to dramatically expand the PerlIO buffer size of
STDOUT. Moreover, the may be some glitch related to the FH being
STDOUT; does the problem happen with other filehandles? They the OP
may want to try opening their FH to a file in Perl...

Hope this helps,
Ilya
 
U

Uri Guttman

IZ> The only thought which comes to me is: what were you smoking? Do you
IZ> think that buffering done in Perl is going to be better than buffering
IZ> done in C?

wanna buy some? :)

IZ> (I do not think your suggestion is necessarily bullshit ;-); given
IZ> what kind of lunacy is the current implementation of PerlIO, one
IZ> should not automatically reject such idea; but I would say the chance
IZ> of PerlIO being SO broken is not high.) Anyway, the OP did not
IZ> specify enough info; at least, one could try setting binmode() (if
IZ> appropriate).

you are missing my whole point (besides the fact that my answer will
usually be faster). print makes all all sort of stdio and possible
system calls each time you call it. .= will only do simple buffer
management (yes, it can trigger reallocs) in general. it doesn't have to
do all the overhead that print does on each call and it too is written
in c so that point is moot.

and the other side of print rarely, print late is to delay printing
until you get back to code that can actually decide where to print
things. this allows simpler ways to print in multiple places, sockets,
logs or nowhere at all. doing prints inside lower logic means you have
to change all that code to handle print decisions. and printing to a
tied file handle is a kludge that doesn't solve all the problems
either. it can't handle the case of printing and then also passing the
results to another sub. best to simply collect all output text yourself
and then at the top level decide what to do.

and of course this is not a fixed rule, it is a way to speed up text
output and to give more control over how and where its gets printed.

IZ> Another idea is to dramatically expand the PerlIO buffer size of
IZ> STDOUT. Moreover, the may be some glitch related to the FH being
IZ> STDOUT; does the problem happen with other filehandles? They the OP
IZ> may want to try opening their FH to a file in Perl...

why do that manually when scalar vars will grow on demand. .= is such an
easy thing to use.

uri
 
E

enjoylife_95135

Thank you.

One of the things I noticed while running an strace was that my program
would printf those lines (buffered of course) and after writing
everything, would pause for about 2 minutes before exiting. My program
does no processing between the printfs (which are in a loop) and the
end of the program.

Is that Perl's garbage collection? If so is there a way to
avoid/optimize this?

I will try the string approach, thanks.

EL
 
X

xhoster

Thank you.

One of the things I noticed while running an strace was that my program
would printf those lines (buffered of course) and after writing
everything, would pause for about 2 minutes before exiting. My program
does no processing between the printfs (which are in a loop) and the
end of the program.

Is that Perl's garbage collection? If so is there a way to
avoid/optimize this?

It probably is garbage collection. You can avoid it by using POSIX::_exit,
but if you do then you need to close all your writing file-handles first,
or they won't get flushed.

Xho
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Uri Guttman
IZ> (I do not think your suggestion is necessarily bullshit ;-); given
IZ> what kind of lunacy is the current implementation of PerlIO, one
IZ> should not automatically reject such idea; but I would say the chance
IZ> of PerlIO being SO broken is not high.) Anyway, the OP did not
IZ> specify enough info; at least, one could try setting binmode() (if
IZ> appropriate).

you are missing my whole point (besides the fact that my answer will
usually be faster). print makes all all sort of stdio and possible
system calls each time you call it.

What year do you think it is now? On modern systems, print will make
no system call (and/or stdio calls) until the FH buffers are full
(unless one sets autoflush).
.= will only do simple buffer
management (yes, it can trigger reallocs) in general. it doesn't have to
do all the overhead that print does on each call

Same overhead.
and it too is written in c so that point is moot.

No it is not: you forget about opcode dispatch.
and the other side of print rarely, print late is to delay printing
until you get back to code that can actually decide where to print
things. this allows simpler ways to print in multiple places, sockets,
logs or nowhere at all. doing prints inside lower logic means you have
to change all that code to handle print decisions. and printing to a
tied file handle is a kludge that doesn't solve all the problems
either.

Sorry. Can't parse this.
IZ> Another idea is to dramatically expand the PerlIO buffer size of
IZ> STDOUT. Moreover, the may be some glitch related to the FH being
IZ> STDOUT; does the problem happen with other filehandles? They the OP
IZ> may want to try opening their FH to a file in Perl...
why do that manually when scalar vars will grow on demand. .= is such an
easy thing to use.

Compare with C: Why use stdio, and not invent your own buffering
library for each program of yours? Witness the PerlIO disaster, when
one (an extremely competent C programmer) tried to go the latter road...

Hope this helps,
Ilya
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,045
Messages
2,570,389
Members
47,052
Latest member
ketan

Latest Threads

Top