fwrite output is not efficient (fast) ?

A

Abubakar

Hi,

recently some C programmer told me that using fwrite/fopen functions
are not efficient because the output that they do to the file is
actually buffered and gets late in writing. Is that true?

regards,

...ab
 
R

Richard Bos

Abubakar said:
recently some C programmer told me that using fwrite/fopen functions
are not efficient because the output that they do to the file is
actually buffered and gets late in writing.

That most probably wasn't a C programmer, but a GNU-almost-POSIX-
with-a-bit-of-C-thrown-in-for-appearance's-sake programmer.
Is that true?

As phrased, it is not true. It may be true for some systems, under some
circumstances; but you should never worry about optimisation until you
_know_ that you have to, and don't just think you might.

Richard
 
N

Nate Eldredge

Abubakar said:
Hi,

recently some C programmer told me that using fwrite/fopen functions
are not efficient because the output that they do to the file is
actually buffered and gets late in writing. Is that true?

It's true that the output is buffered (usually). Depending on your
situation, that may be good or bad. If you are calling fwrite with
large chunks of data (many kbytes), the buffering will not gain you
much, but a sensible implementation will bypass it and you'll only have
a small amount of overhead. If you are writing small chunks or single
characters at a time (for instance, with fprintf() or putc()), the
buffering can speed things up drastically, since it cuts down on calls
to the operating system, which can be expensive.

It is true in this case that some data may not be written as soon as you
call fwrite(), but this doesn't slow down the data transfer in the long
run. If it's important for some reason that a small bit of data go out
immediately, you can call fflush() or use setvbuf() to change the
buffering mode, but at a cost to overall efficiency.
 
S

s0suk3

Hi,

recently some C programmer told me that using fwrite/fopen functions
are not efficient because the output that they do to the file is
actually buffered and gets late in writing. Is that true?

The purpose of buffering is actually efficiency; transferring a large
block of data to disk at once is faster than transferring it byte by
byte (and by this I don't mean that singe-byte functions such as
getc() and putc() are inefficient; these functions also read/write to
or from a buffer in memory).

This is usually true not only for disk files, but also for I/O devices
in general such as network sockets. In some applications, however, it
can be convenient to develop a more specialized I/O layer model that
can provide more functionality and improve efficiency in specific
situations.

(If you're just worried that the data will remain in the buffer
indefinitely, just call fflush() or close the file.)

Sebastian
 
A

Abubakar

well he is saying that using the *native* read and write to do the
same task using file descriptors is much faster than the fwrite etc.
He says using the FILE * that is used in case of the fopen/fwrite
buffers data and is not as fast and predictable as read write are.

I have given him the link to this discussion and he is going to be
posting his replies soon hopefully.
 
A

Abubakar

Well he read the link and now he says that the replies by Nate
Eldredge and the guy whose email starts with "s0s" have said what
proves what he says is right so he says there is no need to reply to
anything. ummmm , i dont know whats up. I am going to be posting more
questions to clear things that he has told me because he has much more
experience in C than me. Thanks for the replies so far, if you guys
have more comments please continue posting.
 
J

James Kuyper

Abubakar said:
Hi,

recently some C programmer told me that using fwrite/fopen functions
are not efficient because the output that they do to the file is
actually buffered and gets late in writing. Is that true?

It's actually backwards, in general. It would be closer to being
accurate to say that they are more efficient because they are buffered
(but event that statement isn't quite true). The purpose of buffering
the data is to take advantage of the fact that, in most cases, it's more
efficient to transfer many bytes at a time. When transferring a large
number of bytes, using unbuffered writes gets the first byte out earlier
than using buffered writes; but gets the last byte out later.

Which approach is better depends upon your application, but in many
contexts the earlier bytes can't (or at least won't) even be used until
after the later bytes have been written, which makes buffered I/O the
clear winner.

Furthermore, C gives you the option of turning off buffering with
setvbuf(). If you do, then the behavior should be quite similar to that
you would get from using system-specific unbuffered I/O functions (such
as write() on Unix-like systems).
 
J

James Kuyper

Abubakar said:
well he is saying that using the *native* read and write to do the
same task using file descriptors is much faster than the fwrite etc.
He says using the FILE * that is used in case of the fopen/fwrite
buffers data and is not as fast and predictable as read write are.

fwrite() buffers data only if you don't tell it not to, by calling
setvbuf(). When talking about buffered I/O, he's right about the
predictability, if he's talking about predicting when data gets written
to the file. That depends upon the details of the buffering scheme,
which are in general unknown to the user.

However, what he's saying about speed is true only if he's mainly
concerned with the speed with which the first byte reaches a file. If
he's concerned with the speed with which the last byte reaches the file,
buffering is generally faster, at least with sufficiently large files.
 
R

Richard Bos

James Kuyper said:
fwrite() buffers data only if you don't tell it not to, by calling
setvbuf(). When talking about buffered I/O, he's right about the
predictability, if he's talking about predicting when data gets written
to the file.

Not even that. The OS may have its own buffers, and so may the drive
firmware. Bottom line, if you want absolute file security, you have to
nail it down to the hardware level. If you don't need that, 99+% of the
time, ISO C <stdio.h> functions are good enough.

Richard
 
C

CBFalconer

Abubakar said:
well he is saying that using the *native* read and write to do the
same task using file descriptors is much faster than the fwrite etc.
He says using the FILE * that is used in case of the fopen/fwrite
buffers data and is not as fast and predictable as read write are.

I have given him the link to this discussion and he is going to be
posting his replies soon hopefully.

Please do not top-post. Your answer belongs after (or intermixed
with) the quoted material to which you reply, after snipping all
irrelevant material. Your top-posting has lost all continuity from
the thread. See the following links:

<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
<http://members.fortunecity.com/nnqweb/> (newusers)
 
B

Bartc

Nowadays it is very seldom worth the extra effort and loss of portability
in not calling fwrite(). However that wasn't always true. General-purpose
PCs used to be slower by a factor of 1000 than modern machines, and then
you had to squeeze every last drop of performance out of the machine to
make your games run fast enough.

But sometimes modern machines are being asked to do 1000 times more work.
Performance can still be an issue.
 
F

Flash Gordon

Malcolm McLean wrote, On 25/10/08 16:46:
But disk IO is less likely to be the bottleneck.

Except when it is. You know absolutely nothing about the type of program
this might be used for and some programs are definitely disk bound even
with the fastest disk sub-systems.
My programs take up to
two weeks to run on as many processors as I can lay my hands on, however
they only read and write a few kilobytes of data.

So yours are not IO bound, that says nothing about the situations the OP
is concerned with.
 
P

Phil Carmody

Malcolm McLean said:
But disk IO is less likely to be the bottleneck. My programs take up
to two weeks to run on as many processors as I can lay my hands on,
however they only read and write a few kilobytes of data.

If processors increase in speed at a faster rate than
I/O bandwidth does, which happens to be the case, then
I/O can do nothing apart from become more of a bottle-
neck. If you, like me, run /embarassingly parallel/ code,
then more times nothing is nothing, but we're in a very
fortunate minority.

Phil
 
F

Flash Gordon

Malcolm McLean wrote, On 25/10/08 21:56:
However data has got to mean something. 1000 times more processing power
doesn't necessarily mean 1000 times more data. For instance a database
with the address of every taxpayer in the country would comfortably sit
on my PC hard drive. However the cost of collecting and checking that
data would be several milion pounds. The limit is the data itself, not
the machine power needed to process it.

I can tell you with 100% certainty that there *are* applications that
are running against very high end storage devices on high end servers
configured by people who really do know what they are doing where the
applications *are* I/O bound. I know because I have been sitting there
monitoring server performance seeing the processors bone-idle, the
memory mostly being used to cache data, and the I/O subsystems running
flat out. During certain processes (not performed very often) the
servers can be like this for a couple of days during which all users
have to be locked out of the system. Other tasks are scheduled as
overnight jobs so as not to kill the server performance for a few hours
during the day. Oh, and the code is mostly written in C although in
certain areas extensions and/or other languages are used.
 
P

Phil Carmody

Malcolm McLean said:
However data has got to mean something. 1000 times more processing
power doesn't necessarily mean 1000 times more data. For instance a
database with the address of every taxpayer in the country would
comfortably sit on my PC hard drive. However the cost of collecting
and checking that data would be several milion pounds. The limit is
the data itself, not the machine power needed to process it.

So when you said "disk IO is less likely to be the bottleneck"
you were really trying to say "form-filling and paperwork is
the bottleneck"? Has one of a.f.c's droolers escaped?

Phil
 
A

Abubakar

He's right.

No, he's not. *Sometimes* it is faster to use write() instead
of fwrite(). Other times it's not. Here's the result of a
quick test:

$ time ./a.out 500000 > /dev/null # Use putc
real 0m0.015s
user 0m0.012s
sys 0m0.002s

$ time ./a.out 500000 USE_WRITE > /dev/null
real 0m0.891s
user 0m0.329s
sys 0m0.558s

(The code used to generate this is below.)

The interpretation of this is simple: using putc
improves performance precisely because of the buffering.
There is no doubt that using write() will often improve
the performance, but NOT if you are making lots of
small writes. When in doubt (and if there is an
actual, observed performance problem) you must
profile the process to determine the bottleneck.
If the problem is IO performance due to fwrite(),
it might be worthwhile to use write() instead.
Might be. Repeated again for emphasis. *Might* be.
If you invest the time re-writing the code to
use write(), you must verify that the performance
gain (or loss) is what you want. You may find
that you have improved throughput by .00001%. Or
maybe you have reduced it by 20%. There are many
managers who will take the 20% performance hit and
call it an improvement and give you a bonus. Take
the money, but find a more competent manager.

Here's the code used to generate the above timings:

/* Unix specific code */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int
use_write( int c )
{
int status = c;
char C = c;
if( write( STDOUT_FILENO, &C, 1 ) != 1 )
status = EOF;

return status;

}

int
use_putc( int c )
{
return putchar( c );

}

int
main( int argc, char **argv )
{
unsigned count;
int (*f)(int);
f = ( argc > 2 ) ? use_write : use_putc;
count = ( argc > 1 ) ? strtoul( argv[ 1 ], NULL, 10 ) : BUFSIZ;

for( ; count; count-- )
if( f( 'y' ) != 'y' )
break;;

return count ? EXIT_FAILURE : EXIT_SUCCESS;

}

hey thanks for the code. And Thanks to all the guys for discussing, it
was a lot of good information.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top