Is flock Needed for Logs?

Discussion in 'Perl Misc' started by Public Interest, Oct 15, 2003.

  1. I want to run a banner exchange. Click-through exchange. So far, my script
    works fine in testing. I am not using flock in the file now. I am worried
    about if 2 people hit the same cgi the same time. I think 2 cgis will be
    started and try to write to the same log file. What will happen? Will the
    second one hold and wait for the first done with the log file?

    open (my $out1, ">>log.txt");
    open (my $out2, ">>log.txt");
    print $out1 1;
    print $out2 2;
    close $out1;
    close $out2;


    here is from perldoc -f flock:

    Two potentially non-obvious but traditional "flock" semantics
    are that it waits indefinitely until the lock is granted, and
    that its locks merely advisory. Such discretionary locks are
    more flexible, but offer fewer guarantees. This means that files
    locked with "flock" may be modified by programs that do not also
    use "flock". See perlport, your port's specific documentation,
    or your system-specific local manpages for details. It's best to
    assume traditional behavior if you're writing portable programs.


    What if the first process is taking long time to finish or it is totally
    crashed? Will the file be unlocked automatically after certain time? I am
    worried using flock lose my log file totally.
     
    Public Interest, Oct 15, 2003
    #1
    1. Advertising

  2. One very interesting test:
    open (my $out1, ">>log.txt");
    open (my $out2, ">>log.txt");
    print $out2 2;
    print $out1 1;
    close $out1;
    close $out2;

    Guess what the result is? Nothing is crashed. Perl can take care of the
    flock itself automatically. At least it is on Win32 system. Unlike what many
    think, it is no 21, but 12. Perl will like the first opener to finish the
    job before leting the second starts. Isn't that GREAT? No flock is ever
    needed now... haha
     
    Public Interest, Oct 15, 2003
    #2
    1. Advertising

  3. Public Interest

    Jay Tilton Guest

    "Public Interest" <> wrote:

    : One very interesting test:
    : open (my $out1, ">>log.txt");
    : open (my $out2, ">>log.txt");
    : print $out2 2;
    : print $out1 1;
    : close $out1;
    : close $out2;
    :
    : Guess what the result is? Nothing is crashed. Perl can take care of the
    : flock itself automatically. At least it is on Win32 system. Unlike what many
    : think, it is no 21, but 12.

    : Perl will like the first opener to finish the
    : job before leting the second starts.

    That's a bold conclusion (reckless, even) to draw from a single test.

    See what happens when you swap the order of the close() statements.
    Do you still believe that the first filehandle opened gets predence?

    The circumstances where simultaneous file access will cause problems
    occur rarely, and writing two measly characters is unlikely to coincide
    with those circumstances. Try this test instead:

    #!perl
    use warnings;
    use strict;

    unlink 'log.txt';
    open (my $out1, ">>log.txt") or die $!;
    open (my $out2, ">>log.txt") or die $!;

    my $now = time;
    my $count;
    until( time > $now+10 ) {
    $count++;
    print $out1 "out1: $count\n";
    print $out2 "out2: $count\n";
    }
    close $out1;
    close $out2;

    That's more likely to corrupt the log file like,

    out1: 18930
    out1: 18931
    ou8639
    out2: 18640
    out2: 18641

    And that's just one process with two filehandles. There could be dozens
    of those processes at once in a CGI environment.

    : Isn't that GREAT? No flock is ever
    : needed now... haha

    Haha indeed.
     
    Jay Tilton, Oct 15, 2003
    #3
  4. Public Interest () wrote:
    : I want to run a banner exchange. Click-through exchange. So far, my script
    : works fine in testing. I am not using flock in the file now. I am worried
    : about if 2 people hit the same cgi the same time. I think 2 cgis will be
    : started and try to write to the same log file. What will happen? Will the
    : second one hold and wait for the first done with the log file?

    : open (my $out1, ">>log.txt");
    : open (my $out2, ">>log.txt");
    : print $out1 1;
    : print $out2 2;
    : close $out1;
    : close $out2;


    We have various perl scripts and C programs running on linux that append
    logging messages straight into shared files and which are often running
    with multiple copies (i.e. 40 or 50 copies at a time), and I have never
    seen these log files corrupted by intermingled lines. The files are
    opened in append mode. The perl scripts typically use one print statement
    per logical line. The C programs use stdio but don't do anything to
    buffer the output into lines, other than whatever the stdio does to buffer
    the output. One file in particular gets about 10 thousand lines per hour
    written this way. As I said, I have never seen a problem with these log
    files.

    My understanding is that a single `write()' of less than a certain (fairly
    large) size is supposed to be atomic on unix (and presumably on linux) -
    that means atomic relative to other write() calls. I.e. if two programs
    both make two seperate calls to write, under the size limit, destined to a
    single receiver, then the receiver will receive all of the data from one
    write before any of the data from the other write.

    Higher level io (i.e. fopen, fwrite etc) will typically buffer writes
    together so that many individual writes end up being a single write(), and
    therefore also atomic upto that same size, though this becomes somewhat of
    a coincidence as you no longer explicitly control it.

    I do not think the above is true for windows based machines.

    Counter claims welcome.
     
    Malcolm Dew-Jones, Oct 16, 2003
    #4
  5. >
    > We have various perl scripts and C programs running on linux that append
    > logging messages straight into shared files and which are often running
    > with multiple copies (i.e. 40 or 50 copies at a time), and I have never
    > seen these log files corrupted by intermingled lines. The files are
    > opened in append mode. The perl scripts typically use one print statement
    > per logical line. The C programs use stdio but don't do anything to
    > buffer the output into lines, other than whatever the stdio does to buffer
    > the output. One file in particular gets about 10 thousand lines per hour
    > written this way. As I said, I have never seen a problem with these log
    > files.
    >
    > My understanding is that a single `write()' of less than a certain (fairly
    > large) size is supposed to be atomic on unix (and presumably on linux) -
    > that means atomic relative to other write() calls. I.e. if two programs
    > both make two seperate calls to write, under the size limit, destined to a
    > single receiver, then the receiver will receive all of the data from one
    > write before any of the data from the other write.
    >
    > Higher level io (i.e. fopen, fwrite etc) will typically buffer writes
    > together so that many individual writes end up being a single write(), and
    > therefore also atomic upto that same size, though this becomes somewhat of
    > a coincidence as you no longer explicitly control it.
    >
    > I do not think the above is true for windows based machines.
    >


    So, did you use flock in C or Perl programs? 40-50 copies are a lot...
    What I want to find what is when it is nessary to implement flock and when
    it is taken care automatically.
    Again, for my log or ad exchange program, can anyone SHOW me a sample of
    using flock and give me an idea how many CGI files can wait on a locked
    file?

    I also would like to know what is LOCK_SH, I understand LOCK_EX, but how can
    a lock shared?
    In perldoc -f flock, it also states "To avoid the possibility of
    miscoordination, Perl now flushes
    FILEHANDLE before locking or unlocking it." What is a flush of filehandle?
    flush the buffer into the handle?
     
    Public Interest, Oct 16, 2003
    #5
  6. out1: 691
    out1: 6: 351
    out2: 352
    ..
    ..
    ..

    out1: 1030
    out1:92
    out2: 693


    why is that? when 1 writes:eek:ut1: 6, 2 wrties 351, then how could there be
    another out1: before 92?
     
    Public Interest, Oct 16, 2003
    #6
  7. On Thu, 16 Oct 2003 12:35:51 +0100, "Alan J. Flavell"
    <> wrote:

    >[unattributed quote - original poster was Malcolm Dew-Jones:]
    >> > My understanding is that a single `write()' of less than a certain (fairly
    >> > large) size is supposed to be atomic on unix (and presumably on linux) -
    >> > that means atomic relative to other write() calls.

    >
    >Indeed. A lot of unix-based software appears to rely on the semantics
    >of opening for append, and atomically appending a record, without
    >using locking. You can't rely on the *sequence* in which asynchronous
    >processes write their records, but they don't fragment records and
    >they don't damage file structure. There's a great deal of software
    >that would go horribly wrong if that stopped working.


    I will trust your word and your experience, however, for anyone
    interested, here are the results of a test I made (following the idea
    of another poster):

    corrupt.pl:
    #!/usr/bin/perl -l
    use strict;
    use warnings;

    $|++;
    unlink 'log.txt';
    open my $out1, '>>log.txt' or die $!;
    open my $out2, '>>log.txt' or die $!;

    my $stop=time+5;
    my $count;
    while (time<$stop) {
    $count++;
    print $out1 "out1: $count";
    print $out2 "out2: $count";
    }
    __END__

    test.pl:
    #!/usr/bin/perl -ln
    use strict;
    use warnings;

    our $n;
    ++$n, print if !/^out[12]: \d+$/;
    END { print "$n/$." }
    __END__

    An here are the results of './test.pl log.txt' respectively on
    Linux[1] and Windows[2]:

    report.lnx.txt:
    5189/4354160
    4445/3845960
    5326/4447462
    4604/3955650
    4186/3669472

    report.win.txt:
    62/223608
    102/247178
    86/238094
    21/195220
    38/210018

    I don't think these data are statistically significative, however I
    find for the average of the ratios of the given numbers and for the
    corresponding standard deviation the following values:

    Linux: 1.17e-3, 2.41e-5
    Win: 2.64e-4, 1.26e-4

    Any comments?


    [1] CRUX Linux 1.1, 2.4.21 kernel, perl 5.8.0.
    [2] Windows 98, perl 5.8.0 (AS build 806).


    Michele
    --
    # This prints: Just another Perl hacker,
    seek DATA,15,0 and print q... <DATA>;
    __END__
     
    Michele Dondi, Oct 17, 2003
    #7
  8. Michele Dondi () wrote:
    : On Thu, 16 Oct 2003 12:35:51 +0100, "Alan J. Flavell"
    : <> wrote:

    : >[unattributed quote - original poster was Malcolm Dew-Jones:]
    : >> > My understanding is that a single `write()' of less than a certain (fairly
    : >> > large) size is supposed to be atomic on unix (and presumably on linux) -
    : >> > that means atomic relative to other write() calls.
    : >
    : >Indeed. A lot of unix-based software appears to rely on the semantics
    : >of opening for append, and atomically appending a record, without
    : >using locking. You can't rely on the *sequence* in which asynchronous
    : >processes write their records, but they don't fragment records and
    : >they don't damage file structure. There's a great deal of software
    : >that would go horribly wrong if that stopped working.

    : I will trust your word and your experience, however, for anyone
    : interested, here are the results of a test I made (following the idea
    : of another poster):

    : corrupt.pl:
    : #!/usr/bin/perl -l
    : use strict;
    : use warnings;
    :
    : $|++;
    : unlink 'log.txt';
    : open my $out1, '>>log.txt' or die $!;
    : open my $out2, '>>log.txt' or die $!;
    :
    : my $stop=time+5;
    : my $count;
    : while (time<$stop) {
    : $count++;
    : print $out1 "out1: $count";
    : print $out2 "out2: $count";
    : }
    : __END__

    : test.pl:
    : #!/usr/bin/perl -ln
    : use strict;
    : use warnings;
    :
    : our $n;
    : ++$n, print if !/^out[12]: \d+$/;
    : END { print "$n/$." }
    : __END__

    : An here are the results of './test.pl log.txt' respectively on
    : Linux[1] and Windows[2]:

    : report.lnx.txt:
    : 5189/4354160
    : 4445/3845960
    : 5326/4447462
    : 4604/3955650
    : 4186/3669472

    : report.win.txt:
    : 62/223608
    : 102/247178
    : 86/238094
    : 21/195220
    : 38/210018

    : I don't think these data are statistically significative, however I
    : find for the average of the ratios of the given numbers and for the
    : corresponding standard deviation the following values:

    : Linux: 1.17e-3, 2.41e-5
    : Win: 2.64e-4, 1.26e-4

    : Any comments?


    Sure

    #!/usr/bin/perl -l
    # not-corrupt.pl:

    use strict;
    use warnings;

    $|++;

    my $stop=time+30;
    my $count;
    while (time<$stop) {
    $count++;
    print "out1: $count";
    }


    $ cat > log.txt
    $ ./not-corrupt.pl >> log.txt & ./not-corrupt.pl >> log.txt &
    ../not-corrupt.pl >> log.txt & ./not-corrupt.pl >> log.txt &


    same test routine as above

    $ ./test.pl < log.txt
    /1490113

    Eye ball examination shows the outputs are intermingled
    (E.g. a few lines)
    out1: 569
    out1: 570
    out1: 16
    out1: 11
    out1: 12
    out1: 17
    out1: 882
    out1: 883
    out1: 18
    out1: 571


    so. no corruption when the routines pipe to standard output.

    Perldoc says "$| ... STDOUT will typically be line buffered if output is
    to the terminal"

    If stdio uses line mode buffering, then each line will be passed as a
    single entity to write() and the lines will be written uncorrupted.

    If you use C then you control the buffering and a C program can append to
    a log with no problem.

    If you use perl then you must be careful because of the buffering, over
    which you have less control.

    Our perl programs, that log in this manner, run once and write at most one
    or two lines. Presumably the size of the prints are smaller than the
    stdio buffer, and so the entire print out ends up being written with a
    single write(), and the so log files are uncorrupted.

    However, if you have a perl program that logs many lines then the stdio
    buffers will end up being flushed at (effectively) random times without
    regard for line breaks, and so the logs will end up being corrupted.

    But we can control that, so lets try a test where we force the data to be
    flushed at the end of each line. (The best way would be to line mode the
    stdio buffering, but I don't know how to do that.)

    #!/usr/bin/perl
    # flog
    use warnings;
    use strict;

    # unlink 'log.txt';
    open (my $out1, ">>log.txt") or die $!;
    open (my $out2, ">>log.txt") or die $!;

    my $now = time;
    my $count;
    until( time > $now+10 ) {
    $count++;
    print $out1 "out1: $count\n";
    print $out2 "out2: $count\n";
    select $out1; $|=1; $|=0;
    select $out2; $|=1; $|=0;
    }
    close $out1;
    close $out2;

    $ cat > log.txt
    $ ./flog & ./flog & ./flog & ./flog & ./flog &

    # note minor typo in print, test.pl altered to check for two spaces

    $ ./test.pl < log.txt
    /582740


    So, when five versions of the program were running, including two
    writes per process, and after both writes then we force a flush, then
    the data is uncorrupted.

    One more test, how many lines can we print and then flush and still be ok?
    I added the following lines after the second print

    if ($count % 50 == 0)
    {
    select $out1; $|=1; $|=0;
    select $out2; $|=1; $|=0;
    }

    examining log.txt confirms that each output stream is being flushed after
    50 lines.

    $ ./test.pl < log.txt
    /4110602

    You'll notice that the buffering was more efficient (we have four times as
    many lines written in the same time) and as before we have still avoided
    corruption.

    Final conclusion - you can safely log by appending as long as you make
    sure the data is written in line mode. In C you can set the buffering to
    do this. In perl you need to be more careful, but as long as line mode is
    used then it works. If logging continually then flush the lines
    frequently enough to ensure that stdio is not flushing for you at some
    less opportune time. If logging only a single line then the program will
    flush just at the end, which is effectively the same as line mode
    buffering for small amounts of data, so programs that write single lines
    and then exit should normally be able to to safely log with out explicit
    locking.
     
    Malcolm Dew-Jones, Oct 22, 2003
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Babu
    Replies:
    0
    Views:
    531
  2. thakadu
    Replies:
    3
    Views:
    533
    thakadu
    Nov 4, 2005
  3. Ckid

    flock() working mechanism..

    Ckid, Mar 7, 2005, in forum: C Programming
    Replies:
    1
    Views:
    356
    Peter Nilsson
    Mar 7, 2005
  4. Time Waster

    flock usage

    Time Waster, Aug 31, 2007, in forum: C Programming
    Replies:
    4
    Views:
    5,594
  5. Ruby Student

    Logs maintenance tool/gem needed

    Ruby Student, Dec 24, 2009, in forum: Ruby
    Replies:
    1
    Views:
    92
    Ryan Davis
    Dec 24, 2009
Loading...

Share This Page