Is syswrite faster or print

Discussion in 'Perl Misc' started by saurabh hirani, Jan 27, 2009.

  1. Hi guys,

    I am working on a logging module. I was just studying the existing
    perl log modules and read that - syswrite is useful in a condition
    like logging as during its write operation it prevents someone else
    from writing at the same time. To verify that I wrote a small perl
    program:

    #!/usr/bin/perl -w
    use strict;
    use Fcntl;
    use Data::Dumper;
    $| = 1;

    sub testme {
    my ($flag, $data) = @_;
    if ($flag == 1) {
    sysopen(FILE, 'bangbang', O_WRONLY | O_APPEND | O_CREAT);
    while (1) {
    syswrite FILE, "$data\n";
    }
    } else {
    open(FILE, ">>bangbang");
    while (1) {
    print FILE "$data\n";
    }
    }
    close(FILE);
    }
    testme($ARGV[0], $ARGV[1]);

    Testing both for syswrite and print - I found that $data printing in
    file is corrupted when I run 2 instances of the same program together.
    But using syswrite saves me from that. It has every line intact and no
    line is a mix of different data sets of the 2 programs.

    But what struck me more was the size of the files created when I ran
    the following programs together:

    1. ./program 1 first
    2. ./program 1 second

    this uses syswrite to write and check whether 'first' and 'second'
    clobber each other. During this run, I got the following stats:

    1st run - 30 seconds - file size 40 MB
    2nd run - 50 seconds - file size 61 MB

    Now for the print run,

    1. ./program 2 first
    2. ./program 2 second

    this uses print to write and check whether 'first' and 'second'
    clobber each other. During this run, I got the following stats:

    1st run - 30 seconds - file size 315 MB
    2nd run - 50 seconds - file size 586 MB

    I was running a VM so I know that my profiling wouldn't be so
    accurate. But still, I had done autoflush for both syswrite and print.
    And I read somewhere that for output greater than 3 KB syswrite is
    faster?

    Is print this faster than syswrite? Is my testing flawed? Am I missing
    something? What would you use if you had to write a logging module in
    perl - print or syswrite? and why?

    Thanks for going through it.
     
    saurabh hirani, Jan 27, 2009
    #1
    1. Advertising

  2. On Jan 27, 6:05 pm, saurabh hirani <> wrote:
    > Hi guys,
    >
    > I am working on a logging module. I was just studying the existing
    > perl log modules and read that - syswrite is useful in a condition
    > like logging as during its write operation it prevents someone else
    > from writing at the same time. To verify that I wrote a small perl
    > program:
    >
    > #!/usr/bin/perl -w
    > use strict;
    > use Fcntl;
    > use Data::Dumper;
    > $| = 1;
    >
    > sub testme {
    >     my ($flag, $data) = @_;
    >     if ($flag == 1) {
    >         sysopen(FILE, 'bangbang',  O_WRONLY | O_APPEND | O_CREAT);
    >         while (1) {
    >             syswrite FILE, "$data\n";
    >         }
    >     } else {
    >         open(FILE, ">>bangbang");
    >         while (1) {
    >             print FILE "$data\n";
    >         }
    >     }
    >     close(FILE);}
    >
    > testme($ARGV[0], $ARGV[1]);



    Small but important addition - And after every run I moved the file to
    bangbang.print or bangbang.syswrite.So I got new file at each run. I
    did not append to existing ones.
     
    saurabh hirani, Jan 27, 2009
    #2
    1. Advertising

  3. saurabh hirani

    smallpond Guest

    On Jan 27, 8:05 am, saurabh hirani <> wrote:
    > Hi guys,
    >
    > I am working on a logging module. I was just studying the existing
    > perl log modules and read that - syswrite is useful in a condition
    > like logging as during its write operation it prevents someone else
    > from writing at the same time. To verify that I wrote a small perl
    > program:
    >
    > #!/usr/bin/perl -w
    > use strict;
    > use Fcntl;
    > use Data::Dumper;
    > $| = 1;
    >
    > sub testme {
    >     my ($flag, $data) = @_;
    >     if ($flag == 1) {
    >         sysopen(FILE, 'bangbang',  O_WRONLY | O_APPEND | O_CREAT);
    >         while (1) {
    >             syswrite FILE, "$data\n";
    >         }
    >     } else {
    >         open(FILE, ">>bangbang");
    >         while (1) {
    >             print FILE "$data\n";
    >         }
    >     }
    >     close(FILE);}
    >
    > testme($ARGV[0], $ARGV[1]);
    >
    > Testing both for syswrite and print - I found that $data printing in
    > file is corrupted when I run 2 instances of the same program together.
    > But using syswrite saves me from that. It has every line intact and no
    > line is a mix of different data sets of the 2 programs.
    >
    > But what struck me more was the size of the files created when I ran
    > the following programs together:
    >
    > 1. ./program 1 first
    > 2. ./program 1 second
    >
    > this uses syswrite to write and check whether 'first' and 'second'
    > clobber each other. During this run, I got the following stats:
    >
    > 1st run - 30 seconds - file size 40 MB
    > 2nd run - 50 seconds - file size 61 MB
    >
    > Now for the print run,
    >
    > 1. ./program 2 first
    > 2. ./program 2 second
    >
    > this uses print to write and check whether 'first' and 'second'
    > clobber each other. During this run, I got the following stats:
    >
    > 1st run - 30 seconds - file size 315 MB
    > 2nd run - 50 seconds - file size 586 MB
    >
    > I was running a VM so I know that my profiling wouldn't be so
    > accurate. But still, I had done autoflush for both syswrite and print.
    > And I read somewhere that for output greater than 3 KB syswrite is
    > faster?
    >
    > Is print this faster than syswrite? Is my testing flawed? Am I missing
    > something? What would you use if you had to write a logging module in
    > perl - print or syswrite? and why?
    >
    > Thanks for going through it.



    print does buffered output while syswrite is direct. That
    is the reason for both the performance difference and the
    non-overlapping behavior.

    Why do you care about performance for log messages?
     
    smallpond, Jan 27, 2009
    #3
  4. saurabh hirani wrote:
    >
    >
    > I was running a VM so I know that my profiling wouldn't be so
    > accurate.


    If you are running on VM, then why would profiling on VM not be accurate?

    > But still, I had done autoflush for both syswrite and print.


    > Is print this faster than syswrite? Is my testing flawed? Am I missing
    > something?



    In the code you showed us, you were only autoflushing STDOUT, which
    is not the filehandle that either syswrite or print were writing to.


    > What would you use if you had to write a logging module in
    > perl - print or syswrite? and why?


    I'd probably just use of the already existing ones.


    Xho
     
    Xho Jingleheimerschmidt, Jan 28, 2009
    #4

  5. > print does buffered output while syswrite is direct.  That
    > is the reason for both the performance difference and the
    > non-overlapping behavior.


    Does print do buffered output even after autoflush is on? I have added
    $| = 1 in the beginning of my code.

    >
    > Why do you care about performance for log messages?


    Why shouldn't I? The faster my log method is, the earlier I return and
    get control back to my main programs.
     
    saurabh hirani, Jan 28, 2009
    #5
  6. saurabh hirani wrote:
    >> print does buffered output while syswrite is direct. That
    >> is the reason for both the performance difference and the
    >> non-overlapping behavior.

    >
    > Does print do buffered output even after autoflush is on? I have added
    > $| = 1 in the beginning of my code.


    $| only effects the *currently selected* file handle.

    Xho
     
    Xho Jingleheimerschmidt, Jan 28, 2009
    #6
  7. On Jan 28, 7:50 am, Xho Jingleheimerschmidt <> wrote:
    > saurabh hirani wrote:
    >
    > > I was running a VM so I know that my profiling wouldn't be so
    > > accurate.

    >
    > If you are running on VM, then why would profiling on VM not be accurate?


    I wrote it the wrong way. It doesn't sound the way it should. I meant
    that as I was running a VM, I may not have the right amount of fire
    power to verify my tests and due to load on my host the VM might run
    slow at times and degrade to the speed of the program.

    >
    > I'd probably just use of the already existing ones.
    >


    I didn't understand that.
     
    saurabh hirani, Jan 28, 2009
    #7

  8. > $| only effects the *currently selected* file handle.


    Thanks for pointing that out. I made a similar post on perlmonks and
    got this reply from the user ikegami:

    $| = 1; only affects STDOUT, not FILE. That accounts for the speed
    difference and the clobbering.
    Add FILE->autoflush(1); after the open statement. (Don't forget to use
    use IO::Handle;) to bring print inline with syswrite.
     
    saurabh hirani, Jan 28, 2009
    #8
  9. saurabh hirani <> wrote:
    >
    >> print does buffered output while syswrite is direct.  That
    >> is the reason for both the performance difference and the
    >> non-overlapping behavior.

    >
    > Does print do buffered output even after autoflush is on?



    Yes.

    This can be inferred by the name of the feature.

    Q: What is it that is being (auto)flushed?

    A: The buffer is being flushed.

    :)


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Jan 28, 2009
    #9
  10. saurabh hirani

    Uri Guttman Guest

    >>>>> "BaB" == Big and Blue <> writes:

    BaB> saurabh hirani wrote:
    >> Why shouldn't I? The faster my log method is, the earlier I return and
    >> get control back to my main programs.


    BaB> fork() an let the child do the logging. If you have multiple cores
    BaB> this will speed things up.

    logging is typically more i/o than cpu so forking is not a
    solution. would you fork for each log entry to be written? would you
    want the logger and main program to communicate? then you have another
    bottleneck. if you want the logging to be synchronous and in order you
    can't delegate it or fork it as you lose control then.

    to the OP:

    print vs syswrite is not a proper decision as they do different
    things. if you turn off buffering with autoflush then you reduce print
    to being syswrite but with more overhead. as for worrying about logging
    speed, that is a case of premature optimization. use a log module if you
    really care. worry about the important stuff, not nits like log speed

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Free Perl Training --- http://perlhunter.com/college.html ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Jan 29, 2009
    #10
  11. Uri Guttman wrote:
    >>>>>> "BaB" == Big and Blue <> writes:

    >
    > BaB> saurabh hirani wrote:
    > >> Why shouldn't I? The faster my log method is, the earlier I return and
    > >> get control back to my main programs.

    >
    > BaB> fork() an let the child do the logging. If you have multiple cores
    > BaB> this will speed things up.
    >
    > logging is typically more i/o than cpu so forking is not a
    > solution. would you fork for each log entry to be written? would you
    > want the logger and main program to communicate? then you have another
    > bottleneck. if you want the logging to be synchronous and in order you
    > can't delegate it or fork it as you lose control then.
    >
    > to the OP:
    >
    > print vs syswrite is not a proper decision as they do different
    > things. if you turn off buffering with autoflush then you reduce print
    > to being syswrite but with more overhead.


    You can't turn off buffering with autoflush.

    man 3 setvbuf
    man 3 fflush

    > as for worrying about logging
    > speed, that is a case of premature optimization. use a log module if you
    > really care. worry about the important stuff, not nits like log speed



    John
    --
    Those people who think they know everything are a great
    annoyance to those of us who do. -- Isaac Asimov
     
    John W. Krahn, Jan 29, 2009
    #11
  12. saurabh hirani

    Uri Guttman Guest

    >>>>> "JWK" == John W Krahn <> writes:

    JWK> Uri Guttman wrote:
    >>>>>>> "BaB" == Big and Blue <> writes:

    BaB> saurabh hirani wrote:
    >> >> Why shouldn't I? The faster my log method is, the earlier I return and
    >> >> get control back to my main programs.

    BaB> fork() an let the child do the logging. If you have
    >> multiple cores

    BaB> this will speed things up.
    >> logging is typically more i/o than cpu so forking is not a
    >> solution. would you fork for each log entry to be written? would you
    >> want the logger and main program to communicate? then you have another
    >> bottleneck. if you want the logging to be synchronous and in order you
    >> can't delegate it or fork it as you lose control then.
    >> to the OP:
    >> print vs syswrite is not a proper decision as they do different
    >> things. if you turn off buffering with autoflush then you reduce print
    >> to being syswrite but with more overhead.


    JWK> You can't turn off buffering with autoflush.

    JWK> man 3 setvbuf
    JWK> man 3 fflush

    i meant buffering many prints before flushing to the file. normal stdio
    does that but that is also the reason why log entries can cross each
    other when using print from different processes. (this doesn't matter if
    you only log from a single process). syswrite is atomic and can't screw
    up the log entries. it is slower since it does a full syscall each time
    whereas print will usually be fully buffered. turning on autoflush will
    make print act more like syswrite and flush its buffer with likely a
    single call to write (c's write which is perl's syswrite).

    the point is still that speed isn't the issue when you write logs,
    atomicity it. print can't guarantee it and syswrite can. it is a case of
    getting it to work correctly before worrying about speed.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Free Perl Training --- http://perlhunter.com/college.html ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Jan 29, 2009
    #12
  13. > print vs syswrite is not a proper decision as they do different
    > things. if you turn off buffering with autoflush then you reduce print
    > to being syswrite but with more overhead. as for worrying about logging
    > speed, that is a case of premature optimization. use a log module if you
    > really care. worry about the important stuff, not nits like log speed
    >


    Thanks a lot to all for replying. I appreciate your viewpoints.

    I did find that turning on autoflush on my file handle helps to
    prevent print buffering upto 4 K before writing to a log file. While
    I was writing this query, I was thinking about whether I am doing
    premature optimization by concentrating on these parts. But I am
    creating a customized log module and hence I wanted to know. I have
    ensured that even if 2 instances of the same program are running they
    will be logging to different log files. Atomicity was taken care of.
    After doing that I started thinking about the speed of logging. But
    looking back it does seem like a case of concentrating on laying
    bricks when I should be building a house. : ) Live and learn.



    > uri
    >
    > --
    > Uri Guttman  ------    --------  http://www.sysarch.com--
    > -----  Perl Code Review , Architecture, Development, Training, Support ------
    > --------- Free Perl Training ---http://perlhunter.com/college.html---------
    > ---------  Gourmet Hot Cocoa Mix  ----  http://bestfriendscocoa.com---------
     
    saurabh hirani, Jan 29, 2009
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Alan Davies
    Replies:
    7
    Views:
    211
    Alan Davies
    Nov 27, 2003
  2. Zach Dennis
    Replies:
    2
    Views:
    191
    Brian Schröder
    Aug 11, 2005
  3. Tom Gregus

    syswrite changes from 1.8 to 1.9

    Tom Gregus, Jul 22, 2010, in forum: Ruby
    Replies:
    2
    Views:
    122
    Tom Gregus
    Jul 22, 2010
  4. Ben Bullock
    Replies:
    11
    Views:
    506
    Peter J. Holzer
    May 20, 2006
  5. Robert Jacobson
    Replies:
    3
    Views:
    252
    Robert Jacobson
    Jan 11, 2008
Loading...

Share This Page