Michele Dondi (
[email protected]) wrote:
: On Thu, 16 Oct 2003 12:35:51 +0100, "Alan J. Flavell"
: >[unattributed quote - original poster was Malcolm Dew-Jones:]
: >> > My understanding is that a single `write()' of less than a certain (fairly
: >> > large) size is supposed to be atomic on unix (and presumably on linux) -
: >> > that means atomic relative to other write() calls.
: >
: >Indeed. A lot of unix-based software appears to rely on the semantics
: >of opening for append, and atomically appending a record, without
: >using locking. You can't rely on the *sequence* in which asynchronous
: >processes write their records, but they don't fragment records and
: >they don't damage file structure. There's a great deal of software
: >that would go horribly wrong if that stopped working.
: I will trust your word and your experience, however, for anyone
: interested, here are the results of a test I made (following the idea
: of another poster):
: corrupt.pl:
: #!/usr/bin/perl -l
: use strict;
: use warnings;
:
: $|++;
: unlink 'log.txt';
: open my $out1, '>>log.txt' or die $!;
: open my $out2, '>>log.txt' or die $!;
:
: my $stop=time+5;
: my $count;
: while (time<$stop) {
: $count++;
: print $out1 "out1: $count";
: print $out2 "out2: $count";
: }
: __END__
: test.pl:
: #!/usr/bin/perl -ln
: use strict;
: use warnings;
:
: our $n;
: ++$n, print if !/^out[12]: \d+$/;
: END { print "$n/$." }
: __END__
: An here are the results of './test.pl log.txt' respectively on
: Linux[1] and Windows[2]:
: report.lnx.txt:
: 5189/4354160
: 4445/3845960
: 5326/4447462
: 4604/3955650
: 4186/3669472
: report.win.txt:
: 62/223608
: 102/247178
: 86/238094
: 21/195220
: 38/210018
: I don't think these data are statistically significative, however I
: find for the average of the ratios of the given numbers and for the
: corresponding standard deviation the following values:
: Linux: 1.17e-3, 2.41e-5
: Win: 2.64e-4, 1.26e-4
: Any comments?
Sure
#!/usr/bin/perl -l
# not-corrupt.pl:
use strict;
use warnings;
$|++;
my $stop=time+30;
my $count;
while (time<$stop) {
$count++;
print "out1: $count";
}
$ cat > log.txt
$ ./not-corrupt.pl >> log.txt & ./not-corrupt.pl >> log.txt &
../not-corrupt.pl >> log.txt & ./not-corrupt.pl >> log.txt &
same test routine as above
$ ./test.pl < log.txt
/1490113
Eye ball examination shows the outputs are intermingled
(E.g. a few lines)
out1: 569
out1: 570
out1: 16
out1: 11
out1: 12
out1: 17
out1: 882
out1: 883
out1: 18
out1: 571
so. no corruption when the routines pipe to standard output.
Perldoc says "$| ... STDOUT will typically be line buffered if output is
to the terminal"
If stdio uses line mode buffering, then each line will be passed as a
single entity to write() and the lines will be written uncorrupted.
If you use C then you control the buffering and a C program can append to
a log with no problem.
If you use perl then you must be careful because of the buffering, over
which you have less control.
Our perl programs, that log in this manner, run once and write at most one
or two lines. Presumably the size of the prints are smaller than the
stdio buffer, and so the entire print out ends up being written with a
single write(), and the so log files are uncorrupted.
However, if you have a perl program that logs many lines then the stdio
buffers will end up being flushed at (effectively) random times without
regard for line breaks, and so the logs will end up being corrupted.
But we can control that, so lets try a test where we force the data to be
flushed at the end of each line. (The best way would be to line mode the
stdio buffering, but I don't know how to do that.)
#!/usr/bin/perl
# flog
use warnings;
use strict;
# unlink 'log.txt';
open (my $out1, ">>log.txt") or die $!;
open (my $out2, ">>log.txt") or die $!;
my $now = time;
my $count;
until( time > $now+10 ) {
$count++;
print $out1 "out1: $count\n";
print $out2 "out2: $count\n";
select $out1; $|=1; $|=0;
select $out2; $|=1; $|=0;
}
close $out1;
close $out2;
$ cat > log.txt
$ ./flog & ./flog & ./flog & ./flog & ./flog &
# note minor typo in print, test.pl altered to check for two spaces
$ ./test.pl < log.txt
/582740
So, when five versions of the program were running, including two
writes per process, and after both writes then we force a flush, then
the data is uncorrupted.
One more test, how many lines can we print and then flush and still be ok?
I added the following lines after the second print
if ($count % 50 == 0)
{
select $out1; $|=1; $|=0;
select $out2; $|=1; $|=0;
}
examining log.txt confirms that each output stream is being flushed after
50 lines.
$ ./test.pl < log.txt
/4110602
You'll notice that the buffering was more efficient (we have four times as
many lines written in the same time) and as before we have still avoided
corruption.
Final conclusion - you can safely log by appending as long as you make
sure the data is written in line mode. In C you can set the buffering to
do this. In perl you need to be more careful, but as long as line mode is
used then it works. If logging continually then flush the lines
frequently enough to ensure that stdio is not flushing for you at some
less opportune time. If logging only a single line then the program will
flush just at the end, which is effectively the same as line mode
buffering for small amounts of data, so programs that write single lines
and then exit should normally be able to to safely log with out explicit
locking.