Is flock Needed for Logs?

P

Public Interest

I want to run a banner exchange. Click-through exchange. So far, my script
works fine in testing. I am not using flock in the file now. I am worried
about if 2 people hit the same cgi the same time. I think 2 cgis will be
started and try to write to the same log file. What will happen? Will the
second one hold and wait for the first done with the log file?

open (my $out1, ">>log.txt");
open (my $out2, ">>log.txt");
print $out1 1;
print $out2 2;
close $out1;
close $out2;


here is from perldoc -f flock:

Two potentially non-obvious but traditional "flock" semantics
are that it waits indefinitely until the lock is granted, and
that its locks merely advisory. Such discretionary locks are
more flexible, but offer fewer guarantees. This means that files
locked with "flock" may be modified by programs that do not also
use "flock". See perlport, your port's specific documentation,
or your system-specific local manpages for details. It's best to
assume traditional behavior if you're writing portable programs.


What if the first process is taking long time to finish or it is totally
crashed? Will the file be unlocked automatically after certain time? I am
worried using flock lose my log file totally.
 
P

Public Interest

One very interesting test:
open (my $out1, ">>log.txt");
open (my $out2, ">>log.txt");
print $out2 2;
print $out1 1;
close $out1;
close $out2;

Guess what the result is? Nothing is crashed. Perl can take care of the
flock itself automatically. At least it is on Win32 system. Unlike what many
think, it is no 21, but 12. Perl will like the first opener to finish the
job before leting the second starts. Isn't that GREAT? No flock is ever
needed now... haha
 
J

Jay Tilton

: One very interesting test:
: open (my $out1, ">>log.txt");
: open (my $out2, ">>log.txt");
: print $out2 2;
: print $out1 1;
: close $out1;
: close $out2;
:
: Guess what the result is? Nothing is crashed. Perl can take care of the
: flock itself automatically. At least it is on Win32 system. Unlike what many
: think, it is no 21, but 12.

: Perl will like the first opener to finish the
: job before leting the second starts.

That's a bold conclusion (reckless, even) to draw from a single test.

See what happens when you swap the order of the close() statements.
Do you still believe that the first filehandle opened gets predence?

The circumstances where simultaneous file access will cause problems
occur rarely, and writing two measly characters is unlikely to coincide
with those circumstances. Try this test instead:

#!perl
use warnings;
use strict;

unlink 'log.txt';
open (my $out1, ">>log.txt") or die $!;
open (my $out2, ">>log.txt") or die $!;

my $now = time;
my $count;
until( time > $now+10 ) {
$count++;
print $out1 "out1: $count\n";
print $out2 "out2: $count\n";
}
close $out1;
close $out2;

That's more likely to corrupt the log file like,

out1: 18930
out1: 18931
ou8639
out2: 18640
out2: 18641

And that's just one process with two filehandles. There could be dozens
of those processes at once in a CGI environment.

: Isn't that GREAT? No flock is ever
: needed now... haha

Haha indeed.
 
M

Malcolm Dew-Jones

Public Interest ([email protected]) wrote:
: I want to run a banner exchange. Click-through exchange. So far, my script
: works fine in testing. I am not using flock in the file now. I am worried
: about if 2 people hit the same cgi the same time. I think 2 cgis will be
: started and try to write to the same log file. What will happen? Will the
: second one hold and wait for the first done with the log file?

: open (my $out1, ">>log.txt");
: open (my $out2, ">>log.txt");
: print $out1 1;
: print $out2 2;
: close $out1;
: close $out2;


We have various perl scripts and C programs running on linux that append
logging messages straight into shared files and which are often running
with multiple copies (i.e. 40 or 50 copies at a time), and I have never
seen these log files corrupted by intermingled lines. The files are
opened in append mode. The perl scripts typically use one print statement
per logical line. The C programs use stdio but don't do anything to
buffer the output into lines, other than whatever the stdio does to buffer
the output. One file in particular gets about 10 thousand lines per hour
written this way. As I said, I have never seen a problem with these log
files.

My understanding is that a single `write()' of less than a certain (fairly
large) size is supposed to be atomic on unix (and presumably on linux) -
that means atomic relative to other write() calls. I.e. if two programs
both make two seperate calls to write, under the size limit, destined to a
single receiver, then the receiver will receive all of the data from one
write before any of the data from the other write.

Higher level io (i.e. fopen, fwrite etc) will typically buffer writes
together so that many individual writes end up being a single write(), and
therefore also atomic upto that same size, though this becomes somewhat of
a coincidence as you no longer explicitly control it.

I do not think the above is true for windows based machines.

Counter claims welcome.
 
P

Public Interest

We have various perl scripts and C programs running on linux that append
logging messages straight into shared files and which are often running
with multiple copies (i.e. 40 or 50 copies at a time), and I have never
seen these log files corrupted by intermingled lines. The files are
opened in append mode. The perl scripts typically use one print statement
per logical line. The C programs use stdio but don't do anything to
buffer the output into lines, other than whatever the stdio does to buffer
the output. One file in particular gets about 10 thousand lines per hour
written this way. As I said, I have never seen a problem with these log
files.

My understanding is that a single `write()' of less than a certain (fairly
large) size is supposed to be atomic on unix (and presumably on linux) -
that means atomic relative to other write() calls. I.e. if two programs
both make two seperate calls to write, under the size limit, destined to a
single receiver, then the receiver will receive all of the data from one
write before any of the data from the other write.

Higher level io (i.e. fopen, fwrite etc) will typically buffer writes
together so that many individual writes end up being a single write(), and
therefore also atomic upto that same size, though this becomes somewhat of
a coincidence as you no longer explicitly control it.

I do not think the above is true for windows based machines.

So, did you use flock in C or Perl programs? 40-50 copies are a lot...
What I want to find what is when it is nessary to implement flock and when
it is taken care automatically.
Again, for my log or ad exchange program, can anyone SHOW me a sample of
using flock and give me an idea how many CGI files can wait on a locked
file?

I also would like to know what is LOCK_SH, I understand LOCK_EX, but how can
a lock shared?
In perldoc -f flock, it also states "To avoid the possibility of
miscoordination, Perl now flushes
FILEHANDLE before locking or unlocking it." What is a flush of filehandle?
flush the buffer into the handle?
 
P

Public Interest

out1: 691
out1: 6: 351
out2: 352
..
..
..

out1: 1030
out1:92
out2: 693


why is that? when 1 writes:eek:ut1: 6, 2 wrties 351, then how could there be
another out1: before 92?
 
M

Michele Dondi

[unattributed quote - original poster was Malcolm Dew-Jones:]
Indeed. A lot of unix-based software appears to rely on the semantics
of opening for append, and atomically appending a record, without
using locking. You can't rely on the *sequence* in which asynchronous
processes write their records, but they don't fragment records and
they don't damage file structure. There's a great deal of software
that would go horribly wrong if that stopped working.

I will trust your word and your experience, however, for anyone
interested, here are the results of a test I made (following the idea
of another poster):

corrupt.pl:
#!/usr/bin/perl -l
use strict;
use warnings;

$|++;
unlink 'log.txt';
open my $out1, '>>log.txt' or die $!;
open my $out2, '>>log.txt' or die $!;

my $stop=time+5;
my $count;
while (time<$stop) {
$count++;
print $out1 "out1: $count";
print $out2 "out2: $count";
}
__END__

test.pl:
#!/usr/bin/perl -ln
use strict;
use warnings;

our $n;
++$n, print if !/^out[12]: \d+$/;
END { print "$n/$." }
__END__

An here are the results of './test.pl log.txt' respectively on
Linux[1] and Windows[2]:

report.lnx.txt:
5189/4354160
4445/3845960
5326/4447462
4604/3955650
4186/3669472

report.win.txt:
62/223608
102/247178
86/238094
21/195220
38/210018

I don't think these data are statistically significative, however I
find for the average of the ratios of the given numbers and for the
corresponding standard deviation the following values:

Linux: 1.17e-3, 2.41e-5
Win: 2.64e-4, 1.26e-4

Any comments?


[1] CRUX Linux 1.1, 2.4.21 kernel, perl 5.8.0.
[2] Windows 98, perl 5.8.0 (AS build 806).


Michele
 
M

Malcolm Dew-Jones

Michele Dondi ([email protected]) wrote:
: On Thu, 16 Oct 2003 12:35:51 +0100, "Alan J. Flavell"

: >[unattributed quote - original poster was Malcolm Dew-Jones:]
: >> > My understanding is that a single `write()' of less than a certain (fairly
: >> > large) size is supposed to be atomic on unix (and presumably on linux) -
: >> > that means atomic relative to other write() calls.
: >
: >Indeed. A lot of unix-based software appears to rely on the semantics
: >of opening for append, and atomically appending a record, without
: >using locking. You can't rely on the *sequence* in which asynchronous
: >processes write their records, but they don't fragment records and
: >they don't damage file structure. There's a great deal of software
: >that would go horribly wrong if that stopped working.

: I will trust your word and your experience, however, for anyone
: interested, here are the results of a test I made (following the idea
: of another poster):

: corrupt.pl:
: #!/usr/bin/perl -l
: use strict;
: use warnings;
:
: $|++;
: unlink 'log.txt';
: open my $out1, '>>log.txt' or die $!;
: open my $out2, '>>log.txt' or die $!;
:
: my $stop=time+5;
: my $count;
: while (time<$stop) {
: $count++;
: print $out1 "out1: $count";
: print $out2 "out2: $count";
: }
: __END__

: test.pl:
: #!/usr/bin/perl -ln
: use strict;
: use warnings;
:
: our $n;
: ++$n, print if !/^out[12]: \d+$/;
: END { print "$n/$." }
: __END__

: An here are the results of './test.pl log.txt' respectively on
: Linux[1] and Windows[2]:

: report.lnx.txt:
: 5189/4354160
: 4445/3845960
: 5326/4447462
: 4604/3955650
: 4186/3669472

: report.win.txt:
: 62/223608
: 102/247178
: 86/238094
: 21/195220
: 38/210018

: I don't think these data are statistically significative, however I
: find for the average of the ratios of the given numbers and for the
: corresponding standard deviation the following values:

: Linux: 1.17e-3, 2.41e-5
: Win: 2.64e-4, 1.26e-4

: Any comments?


Sure

#!/usr/bin/perl -l
# not-corrupt.pl:

use strict;
use warnings;

$|++;

my $stop=time+30;
my $count;
while (time<$stop) {
$count++;
print "out1: $count";
}


$ cat > log.txt
$ ./not-corrupt.pl >> log.txt & ./not-corrupt.pl >> log.txt &
../not-corrupt.pl >> log.txt & ./not-corrupt.pl >> log.txt &


same test routine as above

$ ./test.pl < log.txt
/1490113

Eye ball examination shows the outputs are intermingled
(E.g. a few lines)
out1: 569
out1: 570
out1: 16
out1: 11
out1: 12
out1: 17
out1: 882
out1: 883
out1: 18
out1: 571


so. no corruption when the routines pipe to standard output.

Perldoc says "$| ... STDOUT will typically be line buffered if output is
to the terminal"

If stdio uses line mode buffering, then each line will be passed as a
single entity to write() and the lines will be written uncorrupted.

If you use C then you control the buffering and a C program can append to
a log with no problem.

If you use perl then you must be careful because of the buffering, over
which you have less control.

Our perl programs, that log in this manner, run once and write at most one
or two lines. Presumably the size of the prints are smaller than the
stdio buffer, and so the entire print out ends up being written with a
single write(), and the so log files are uncorrupted.

However, if you have a perl program that logs many lines then the stdio
buffers will end up being flushed at (effectively) random times without
regard for line breaks, and so the logs will end up being corrupted.

But we can control that, so lets try a test where we force the data to be
flushed at the end of each line. (The best way would be to line mode the
stdio buffering, but I don't know how to do that.)

#!/usr/bin/perl
# flog
use warnings;
use strict;

# unlink 'log.txt';
open (my $out1, ">>log.txt") or die $!;
open (my $out2, ">>log.txt") or die $!;

my $now = time;
my $count;
until( time > $now+10 ) {
$count++;
print $out1 "out1: $count\n";
print $out2 "out2: $count\n";
select $out1; $|=1; $|=0;
select $out2; $|=1; $|=0;
}
close $out1;
close $out2;

$ cat > log.txt
$ ./flog & ./flog & ./flog & ./flog & ./flog &

# note minor typo in print, test.pl altered to check for two spaces

$ ./test.pl < log.txt
/582740


So, when five versions of the program were running, including two
writes per process, and after both writes then we force a flush, then
the data is uncorrupted.

One more test, how many lines can we print and then flush and still be ok?
I added the following lines after the second print

if ($count % 50 == 0)
{
select $out1; $|=1; $|=0;
select $out2; $|=1; $|=0;
}

examining log.txt confirms that each output stream is being flushed after
50 lines.

$ ./test.pl < log.txt
/4110602

You'll notice that the buffering was more efficient (we have four times as
many lines written in the same time) and as before we have still avoided
corruption.

Final conclusion - you can safely log by appending as long as you make
sure the data is written in line mode. In C you can set the buffering to
do this. In perl you need to be more careful, but as long as line mode is
used then it works. If logging continually then flush the lines
frequently enough to ensure that stdio is not flushing for you at some
less opportune time. If logging only a single line then the program will
flush just at the end, which is effectively the same as line mode
buffering for small amounts of data, so programs that write single lines
and then exit should normally be able to to safely log with out explicit
locking.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top