Can I Force Perl to Bypass File Write Buffers?

Hal Vaughan · Aug 30, 2005

I'm using Perl 5.6.1 (and in some cases 5.8) on Linux. I've noticed that
when I'm processing files, that Perl writes in blocks, so it'll process a
number of items, and instead of the file having one line at a time written
to it, it'll get a whole block at once suddenly written to the disk.

Is there any way to avoid this and force Perl to write each line as I use a
"print" statement to output the line? I log (in MySQL) each item as I
finish it, so if power fails or the program is aborted, the system can pick
up right where it left off. Because of the buffers, the log is ahead of
what is written to the file, which would mean I'd lose the data between
what's written and what's logged.

Thanks!

Hal

Simon Taylor · Aug 30, 2005

Hal said:
I'm using Perl 5.6.1 (and in some cases 5.8) on Linux. I've noticed that
when I'm processing files, that Perl writes in blocks, so it'll process a
number of items, and instead of the file having one line at a time written
to it, it'll get a whole block at once suddenly written to the disk.

Is there any way to avoid this and force Perl to write each line as I use a
"print" statement to output the line?

You'll need to disable buffering by setting $| to non-zero.
See $| in

perldoc perlvar

and also checkout

perldoc -f select

This sample should do what you want:

#!/usr/bin/perl
use strict;
use warnings;

open (OUTPUT, '>', 'sample') or die "Could not create file: $!";
my $fd = select(OUTPUT);
$| = 1;
select($fd);
for (0..20) {
print OUTPUT "some data...\n";
sleep 2;
}
close OUTPUT;

Regards,

Simon Taylor

Anno Siegel · Aug 30, 2005

Simon Taylor said:
You'll need to disable buffering by setting $| to non-zero.

[good advice snipped]

Just one note: "$| = 1" doesn't disable buffering, it enables auto-flushing.
The buffer(s) remain in place and active, but after each print-statement
the buffer is automatically emptied (presumably into the next buffer down
the line). You still have character buffering (and you want it).

Anno

xhoster · Aug 30, 2005

Hal Vaughan said:
I'm using Perl 5.6.1 (and in some cases 5.8) on Linux. I've noticed that
when I'm processing files, that Perl writes in blocks, so it'll process a
number of items, and instead of the file having one line at a time
written to it, it'll get a whole block at once suddenly written to the
disk.

To answer the question you asked, check out the variable $|.

To answer the question you didn't ask, your method isn't very good. If you
are truly concerned about data integrity, use a transactional database for
both the data and the log, and make sure both data write and log write are
in the same transaction. Or make your program, upon restarting, tail the
existing data file and figure out where to pick up based solely on the data
file, and dispense with the logging altogether. Or do both--write the data
into a database, and have the entry in the database by its own log.

Is there any way to avoid this and force Perl to write each line as I use
a "print" statement to output the line? I log (in MySQL) each item as I
finish it, so if power fails or the program is aborted, the system can
pick up right where it left off. Because of the buffers, the log is
ahead of what is written to the file, which would mean I'd lose the data
between what's written and what's logged.

Xho

Hal Vaughan · Aug 30, 2005

Anno said:
Simon Taylor said:

You'll need to disable buffering by setting $| to non-zero.

Click to expand...

[good advice snipped]

Just one note: "$| = 1" doesn't disable buffering, it enables
auto-flushing. The buffer(s) remain in place and active, but after each
print-statement the buffer is automatically emptied (presumably into the
next buffer down
the line). You still have character buffering (and you want it).

Anno

I have $| = 1 set, since I had to redirect output to a file for debugging
and needed the errors to sync with the output, but it doesn't seem to make
a difference in the problem I'm talking about. You seem to be the only
person that has pointed out this doesn't effect the buffers directly.

Hal

Hal Vaughan · Aug 30, 2005

To answer the question you asked, check out the variable $|.

Thanks. I've used it and it helps with syncing out put so if I redirect
output to a file, the error messages and other output is synced, but it
doesn't seem to help here.

To answer the question you didn't ask, your method isn't very good. If
you are truly concerned about data integrity, use a transactional database
for both the data and the log, and make sure both data write and log write
are
in the same transaction. Or make your program, upon restarting, tail the
existing data file and figure out where to pick up based solely on the
data
file, and dispense with the logging altogether. Or do both--write the
data into a database, and have the entry in the database by its own log.

I seriously thought about putting the info into a database, but there were a
number of reasons I didn't. Part is because different programs on
different systems can use this, and it works better to make the directory
shared through NFS and I'd rather share that than the database. I've also
got a stream of data coming in, and it has been working much better to save
it to a capture file. Trying to break it up into chunks so it could be put
into a database as it comes in would be a nightmare.

Thanks!

Hal

Ilya Zakharevich · Aug 30, 2005

[A complimentary Cc of this posting was sent to
Hal Vaughan

I have $| = 1 set, since I had to redirect output to a file for debugging
and needed the errors to sync with the output, but it doesn't seem to make
a difference in the problem I'm talking about. You seem to be the only
person that has pointed out this doesn't effect the buffers directly.

Remember that $| affects the currently select(1arg)ed filehandle. Let
me see... Yes, the ->autoflush() method will do select()ing for you....

Hope this helps,
Ilya

P.S. I needed a lot of time to find the source of autoflush(). Best
try (do not know how to do it better so it would work if IO::Handle
would define it in an XSUB...; would some Emacs package help here?):

perl -MFileHandle -wdle "(my $fh = new FileHandle)->open(q[> xx]); $fh->autoflush(1)"
n
s
v

IO::Handle::autoflush(i:/perllib/lib/5.8.2/os2/IO/Handle.pm:465):
464 sub autoflush {
465==> my $old = new SelectSaver qualify($_[0], caller);
466: my $prev = $|;
467: $| = @_ > 1 ? $_[1] : 1;
468: $prev;
469 }

Actually, doing
n
|m $fh

thinks that autoflush() *is* in FileHandle module; it is not. Is it
some bug related to a change of semantic of
exists &function
vs
defined &function
recently?

Tad McClellan · Aug 30, 2005

Hal Vaughan said:
Perl writes in blocks,

Is there any way to avoid this and force Perl to write each line as I use a
"print" statement to output the line?

Your Question is Asked Frequently:

perldoc -q buffer

How do I flush/unbuffer an output filehandle? Why must I do this?

You must have missed it when you checked the Perl FAQ before
posting to the Perl newsgroup.

FAQ 3.23 Can I write useful Perl programs on the command line?	0	Mar 5, 2011
How can I flush file input buffers?	4	May 25, 2007
How to copy a file and force overwirte in perl	3	Nov 29, 2006
FAQ 1.10 Can I do [task] in Perl?	0	Jan 22, 2011
How to bypass Windows 'cooking' the I/O? (One more time, please) II	2	Jul 7, 2008
FAQ 5.21 How can I lock a file?	0	Apr 4, 2011
FAQ 3.15 How can I make my Perl program run faster?	0	Jan 1, 2011
Net::SSH installation problems - how do I force local path?	4	Jan 5, 2007

Can I Force Perl to Bypass File Write Buffers?

Hal Vaughan

Simon Taylor

Anno Siegel

xhoster

Hal Vaughan

Hal Vaughan

Ilya Zakharevich

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads