Can I Force Perl to Bypass File Write Buffers?

H

Hal Vaughan

I'm using Perl 5.6.1 (and in some cases 5.8) on Linux. I've noticed that
when I'm processing files, that Perl writes in blocks, so it'll process a
number of items, and instead of the file having one line at a time written
to it, it'll get a whole block at once suddenly written to the disk.

Is there any way to avoid this and force Perl to write each line as I use a
"print" statement to output the line? I log (in MySQL) each item as I
finish it, so if power fails or the program is aborted, the system can pick
up right where it left off. Because of the buffers, the log is ahead of
what is written to the file, which would mean I'd lose the data between
what's written and what's logged.

Thanks!

Hal
 
S

Simon Taylor

Hal said:
I'm using Perl 5.6.1 (and in some cases 5.8) on Linux. I've noticed that
when I'm processing files, that Perl writes in blocks, so it'll process a
number of items, and instead of the file having one line at a time written
to it, it'll get a whole block at once suddenly written to the disk.

Is there any way to avoid this and force Perl to write each line as I use a
"print" statement to output the line?

You'll need to disable buffering by setting $| to non-zero.
See $| in

perldoc perlvar

and also checkout

perldoc -f select

This sample should do what you want:

#!/usr/bin/perl
use strict;
use warnings;

open (OUTPUT, '>', 'sample') or die "Could not create file: $!";
my $fd = select(OUTPUT);
$| = 1;
select($fd);
for (0..20) {
print OUTPUT "some data...\n";
sleep 2;
}
close OUTPUT;


Regards,

Simon Taylor
 
A

Anno Siegel

Simon Taylor said:
You'll need to disable buffering by setting $| to non-zero.

[good advice snipped]

Just one note: "$| = 1" doesn't disable buffering, it enables auto-flushing.
The buffer(s) remain in place and active, but after each print-statement
the buffer is automatically emptied (presumably into the next buffer down
the line). You still have character buffering (and you want it).

Anno
 
X

xhoster

Hal Vaughan said:
I'm using Perl 5.6.1 (and in some cases 5.8) on Linux. I've noticed that
when I'm processing files, that Perl writes in blocks, so it'll process a
number of items, and instead of the file having one line at a time
written to it, it'll get a whole block at once suddenly written to the
disk.

To answer the question you asked, check out the variable $|.

To answer the question you didn't ask, your method isn't very good. If you
are truly concerned about data integrity, use a transactional database for
both the data and the log, and make sure both data write and log write are
in the same transaction. Or make your program, upon restarting, tail the
existing data file and figure out where to pick up based solely on the data
file, and dispense with the logging altogether. Or do both--write the data
into a database, and have the entry in the database by its own log.
Is there any way to avoid this and force Perl to write each line as I use
a "print" statement to output the line? I log (in MySQL) each item as I
finish it, so if power fails or the program is aborted, the system can
pick up right where it left off. Because of the buffers, the log is
ahead of what is written to the file, which would mean I'd lose the data
between what's written and what's logged.


Xho
 
H

Hal Vaughan

Anno said:
Simon Taylor said:
You'll need to disable buffering by setting $| to non-zero.

[good advice snipped]

Just one note: "$| = 1" doesn't disable buffering, it enables
auto-flushing. The buffer(s) remain in place and active, but after each
print-statement the buffer is automatically emptied (presumably into the
next buffer down
the line). You still have character buffering (and you want it).

Anno

I have $| = 1 set, since I had to redirect output to a file for debugging
and needed the errors to sync with the output, but it doesn't seem to make
a difference in the problem I'm talking about. You seem to be the only
person that has pointed out this doesn't effect the buffers directly.

Hal
 
H

Hal Vaughan

To answer the question you asked, check out the variable $|.

Thanks. I've used it and it helps with syncing out put so if I redirect
output to a file, the error messages and other output is synced, but it
doesn't seem to help here.
To answer the question you didn't ask, your method isn't very good. If
you are truly concerned about data integrity, use a transactional database
for both the data and the log, and make sure both data write and log write
are
in the same transaction. Or make your program, upon restarting, tail the
existing data file and figure out where to pick up based solely on the
data
file, and dispense with the logging altogether. Or do both--write the
data into a database, and have the entry in the database by its own log.

I seriously thought about putting the info into a database, but there were a
number of reasons I didn't. Part is because different programs on
different systems can use this, and it works better to make the directory
shared through NFS and I'd rather share that than the database. I've also
got a stream of data coming in, and it has been working much better to save
it to a capture file. Trying to break it up into chunks so it could be put
into a database as it comes in would be a nightmare.

Thanks!

Hal
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Hal Vaughan
I have $| = 1 set, since I had to redirect output to a file for debugging
and needed the errors to sync with the output, but it doesn't seem to make
a difference in the problem I'm talking about. You seem to be the only
person that has pointed out this doesn't effect the buffers directly.

Remember that $| affects the currently select(1arg)ed filehandle. Let
me see... Yes, the ->autoflush() method will do select()ing for you....

Hope this helps,
Ilya

P.S. I needed a lot of time to find the source of autoflush(). Best
try (do not know how to do it better so it would work if IO::Handle
would define it in an XSUB...; would some Emacs package help here?):

perl -MFileHandle -wdle "(my $fh = new FileHandle)->open(q[> xx]); $fh->autoflush(1)"
n
s
v

IO::Handle::autoflush(i:/perllib/lib/5.8.2/os2/IO/Handle.pm:465):
464 sub autoflush {
465==> my $old = new SelectSaver qualify($_[0], caller);
466: my $prev = $|;
467: $| = @_ > 1 ? $_[1] : 1;
468: $prev;
469 }

Actually, doing
n
|m $fh

thinks that autoflush() *is* in FileHandle module; it is not. Is it
some bug related to a change of semantic of
exists &function
vs
defined &function
recently?
 
T

Tad McClellan

Hal Vaughan said:
Perl writes in blocks,
Is there any way to avoid this and force Perl to write each line as I use a
"print" statement to output the line?


Your Question is Asked Frequently:

perldoc -q buffer

How do I flush/unbuffer an output filehandle? Why must I do this?


You must have missed it when you checked the Perl FAQ before
posting to the Perl newsgroup.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top