Is syswrite faster or print

S

saurabh hirani

Hi guys,

I am working on a logging module. I was just studying the existing
perl log modules and read that - syswrite is useful in a condition
like logging as during its write operation it prevents someone else
from writing at the same time. To verify that I wrote a small perl
program:

#!/usr/bin/perl -w
use strict;
use Fcntl;
use Data::Dumper;
$| = 1;

sub testme {
my ($flag, $data) = @_;
if ($flag == 1) {
sysopen(FILE, 'bangbang', O_WRONLY | O_APPEND | O_CREAT);
while (1) {
syswrite FILE, "$data\n";
}
} else {
open(FILE, ">>bangbang");
while (1) {
print FILE "$data\n";
}
}
close(FILE);
}
testme($ARGV[0], $ARGV[1]);

Testing both for syswrite and print - I found that $data printing in
file is corrupted when I run 2 instances of the same program together.
But using syswrite saves me from that. It has every line intact and no
line is a mix of different data sets of the 2 programs.

But what struck me more was the size of the files created when I ran
the following programs together:

1. ./program 1 first
2. ./program 1 second

this uses syswrite to write and check whether 'first' and 'second'
clobber each other. During this run, I got the following stats:

1st run - 30 seconds - file size 40 MB
2nd run - 50 seconds - file size 61 MB

Now for the print run,

1. ./program 2 first
2. ./program 2 second

this uses print to write and check whether 'first' and 'second'
clobber each other. During this run, I got the following stats:

1st run - 30 seconds - file size 315 MB
2nd run - 50 seconds - file size 586 MB

I was running a VM so I know that my profiling wouldn't be so
accurate. But still, I had done autoflush for both syswrite and print.
And I read somewhere that for output greater than 3 KB syswrite is
faster?

Is print this faster than syswrite? Is my testing flawed? Am I missing
something? What would you use if you had to write a logging module in
perl - print or syswrite? and why?

Thanks for going through it.
 
S

saurabh hirani

Hi guys,

I am working on a logging module. I was just studying the existing
perl log modules and read that - syswrite is useful in a condition
like logging as during its write operation it prevents someone else
from writing at the same time. To verify that I wrote a small perl
program:

#!/usr/bin/perl -w
use strict;
use Fcntl;
use Data::Dumper;
$| = 1;

sub testme {
    my ($flag, $data) = @_;
    if ($flag == 1) {
        sysopen(FILE, 'bangbang',  O_WRONLY | O_APPEND | O_CREAT);
        while (1) {
            syswrite FILE, "$data\n";
        }
    } else {
        open(FILE, ">>bangbang");
        while (1) {
            print FILE "$data\n";
        }
    }
    close(FILE);}

testme($ARGV[0], $ARGV[1]);


Small but important addition - And after every run I moved the file to
bangbang.print or bangbang.syswrite.So I got new file at each run. I
did not append to existing ones.
 
S

smallpond

Hi guys,

I am working on a logging module. I was just studying the existing
perl log modules and read that - syswrite is useful in a condition
like logging as during its write operation it prevents someone else
from writing at the same time. To verify that I wrote a small perl
program:

#!/usr/bin/perl -w
use strict;
use Fcntl;
use Data::Dumper;
$| = 1;

sub testme {
    my ($flag, $data) = @_;
    if ($flag == 1) {
        sysopen(FILE, 'bangbang',  O_WRONLY | O_APPEND | O_CREAT);
        while (1) {
            syswrite FILE, "$data\n";
        }
    } else {
        open(FILE, ">>bangbang");
        while (1) {
            print FILE "$data\n";
        }
    }
    close(FILE);}

testme($ARGV[0], $ARGV[1]);

Testing both for syswrite and print - I found that $data printing in
file is corrupted when I run 2 instances of the same program together.
But using syswrite saves me from that. It has every line intact and no
line is a mix of different data sets of the 2 programs.

But what struck me more was the size of the files created when I ran
the following programs together:

1. ./program 1 first
2. ./program 1 second

this uses syswrite to write and check whether 'first' and 'second'
clobber each other. During this run, I got the following stats:

1st run - 30 seconds - file size 40 MB
2nd run - 50 seconds - file size 61 MB

Now for the print run,

1. ./program 2 first
2. ./program 2 second

this uses print to write and check whether 'first' and 'second'
clobber each other. During this run, I got the following stats:

1st run - 30 seconds - file size 315 MB
2nd run - 50 seconds - file size 586 MB

I was running a VM so I know that my profiling wouldn't be so
accurate. But still, I had done autoflush for both syswrite and print.
And I read somewhere that for output greater than 3 KB syswrite is
faster?

Is print this faster than syswrite? Is my testing flawed? Am I missing
something? What would you use if you had to write a logging module in
perl - print or syswrite? and why?

Thanks for going through it.


print does buffered output while syswrite is direct. That
is the reason for both the performance difference and the
non-overlapping behavior.

Why do you care about performance for log messages?
 
X

Xho Jingleheimerschmidt

saurabh said:
I was running a VM so I know that my profiling wouldn't be so
accurate.

If you are running on VM, then why would profiling on VM not be accurate?
But still, I had done autoflush for both syswrite and print.
Is print this faster than syswrite? Is my testing flawed? Am I missing
something?


In the code you showed us, you were only autoflushing STDOUT, which
is not the filehandle that either syswrite or print were writing to.

What would you use if you had to write a logging module in
perl - print or syswrite? and why?

I'd probably just use of the already existing ones.


Xho
 
S

saurabh hirani

print does buffered output while syswrite is direct.  That
is the reason for both the performance difference and the
non-overlapping behavior.

Does print do buffered output even after autoflush is on? I have added
$| = 1 in the beginning of my code.
Why do you care about performance for log messages?

Why shouldn't I? The faster my log method is, the earlier I return and
get control back to my main programs.
 
X

Xho Jingleheimerschmidt

saurabh said:
Does print do buffered output even after autoflush is on? I have added
$| = 1 in the beginning of my code.

$| only effects the *currently selected* file handle.

Xho
 
S

saurabh hirani

If you are running on VM, then why would profiling on VM not be accurate?

I wrote it the wrong way. It doesn't sound the way it should. I meant
that as I was running a VM, I may not have the right amount of fire
power to verify my tests and due to load on my host the VM might run
slow at times and degrade to the speed of the program.
I'd probably just use of the already existing ones.

I didn't understand that.
 
S

saurabh hirani

$| only effects the *currently selected* file handle.

Thanks for pointing that out. I made a similar post on perlmonks and
got this reply from the user ikegami:

$| = 1; only affects STDOUT, not FILE. That accounts for the speed
difference and the clobbering.
Add FILE->autoflush(1); after the open statement. (Don't forget to use
use IO::Handle;) to bring print inline with syswrite.
 
T

Tad J McClellan

saurabh hirani said:
Does print do buffered output even after autoflush is on?


Yes.

This can be inferred by the name of the feature.

Q: What is it that is being (auto)flushed?

A: The buffer is being flushed.

:)
 
U

Uri Guttman

BaB> fork() an let the child do the logging. If you have multiple cores
BaB> this will speed things up.

logging is typically more i/o than cpu so forking is not a
solution. would you fork for each log entry to be written? would you
want the logger and main program to communicate? then you have another
bottleneck. if you want the logging to be synchronous and in order you
can't delegate it or fork it as you lose control then.

to the OP:

print vs syswrite is not a proper decision as they do different
things. if you turn off buffering with autoflush then you reduce print
to being syswrite but with more overhead. as for worrying about logging
speed, that is a case of premature optimization. use a log module if you
really care. worry about the important stuff, not nits like log speed

uri
 
J

John W. Krahn

Uri said:
BaB> fork() an let the child do the logging. If you have multiple cores
BaB> this will speed things up.

logging is typically more i/o than cpu so forking is not a
solution. would you fork for each log entry to be written? would you
want the logger and main program to communicate? then you have another
bottleneck. if you want the logging to be synchronous and in order you
can't delegate it or fork it as you lose control then.

to the OP:

print vs syswrite is not a proper decision as they do different
things. if you turn off buffering with autoflush then you reduce print
to being syswrite but with more overhead.

You can't turn off buffering with autoflush.

man 3 setvbuf
man 3 fflush
as for worrying about logging
speed, that is a case of premature optimization. use a log module if you
really care. worry about the important stuff, not nits like log speed


John
 
U

Uri Guttman

JWK> You can't turn off buffering with autoflush.

JWK> man 3 setvbuf
JWK> man 3 fflush

i meant buffering many prints before flushing to the file. normal stdio
does that but that is also the reason why log entries can cross each
other when using print from different processes. (this doesn't matter if
you only log from a single process). syswrite is atomic and can't screw
up the log entries. it is slower since it does a full syscall each time
whereas print will usually be fully buffered. turning on autoflush will
make print act more like syswrite and flush its buffer with likely a
single call to write (c's write which is perl's syswrite).

the point is still that speed isn't the issue when you write logs,
atomicity it. print can't guarantee it and syswrite can. it is a case of
getting it to work correctly before worrying about speed.

uri
 
S

saurabh hirani

print vs syswrite is not a proper decision as they do different
things. if you turn off buffering with autoflush then you reduce print
to being syswrite but with more overhead. as for worrying about logging
speed, that is a case of premature optimization. use a log module if you
really care. worry about the important stuff, not nits like log speed

Thanks a lot to all for replying. I appreciate your viewpoints.

I did find that turning on autoflush on my file handle helps to
prevent print buffering upto 4 K before writing to a log file. While
I was writing this query, I was thinking about whether I am doing
premature optimization by concentrating on these parts. But I am
creating a customized log module and hence I wanted to know. I have
ensured that even if 2 instances of the same program are running they
will be logging to different log files. Atomicity was taken care of.
After doing that I started thinking about the speed of logging. But
looking back it does seem like a case of concentrating on laying
bricks when I should be building a house. : ) Live and learn.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top