... I haven't used setvbuf before. what does it do?
The setvbuf() function is a Standard C function. Its action is a
bit overcomplicated due to its origins -- it came from a system
whose designers tended to write functions that served their immediate
needs, without ever thinking about generalization and abstraction.
If it had a real-world counterpart, it might be a device that would
both pick out tie *and* choose an amount of money to tip the cab
driver, on the theory that the only reason anyone ever puts on a
tie is to go out, and everyone lives in New York City and always
takes a cab anywhere they go.
The first argument to setvbuf() is a stdio stream. This stream
must be one that was "freshly opened", i.e., has not had any input
or output performed on it yet. (The three standard streams are
valid candidates as long as you have done no I/O on them yourself,
i.e., the system must act as if there are no putchar() calls before
it initially calls main(), for instance.)
If the second argument is non-NULL, it must be the address of the
first element of an array of "char" whose size is given by the
fourth argument. Thus, for instance:
char block[99];
setvbuf(file, block, _IOFBF, sizeof block);
is a correct call (albeit odd, as 99 is probably not a very good
buffer size). (The array can actually be larger than the size you
specify, so:
setbuf(file, block, _IOFBF, 42);
is also valid in this case, but even weirder.)
The third argument must be one of the three macros:
_IONBF
_IOLBF
_IOFBF
which stand for unbuffered, line-buffered, and fully-buffered
respectively. Normally you, the C programmer, must never use
identifiers beginning with an underscore followed by an uppercase
letter, but in this case, you *must* use them.
If the fourth argument is non-zero, it is a size you, the programmer,
are "suggesting" that the stdio routines use for the underlying
file. What non-zero number is good? Well, BUFSIZ is probably not
*bad*. (It is typically 512, 1024, 4096, 16384, or some other
power of two.) Unfortunately, since it is a #define for some
integer constant, it can only be optimal for some, not all, cases.
A good stdio should pick the best buffer size automatically.
Your best bet is (in my opinion) generally to pass NULL and 0 for
the second and fourth arguments; however, these are also OK:
setvbuf(in, NULL, _IOFBF, BUFSIZ);
setvbuf(out, NULL, _IOFBF, BUFSIZ);
as they will simply force the "in" and "out" streams to be
fully-buffered. Of course, if these two streams are connected
to anything other than an "interactive device", they should be
fully-buffered anyway. Hence, in a good stdio, on typical
files, these two calls should have no real effect, except
perhaps (if BUFSIZ is less than ideal) to make things run
more slowly.
anyway, without setvbuf(), it resulted into 2.580000 seconds but
with setvbuf(), it resulted into 1.230000 seconds.
This suggests that there is something wrong (or at least "not so
good") in your stdio implementation. (But be wary of "testing
artifacts": if you run the same program, or several similar programs,
multiple times on the same files, they may produce very different
times on some runs. In particular, they may be much slower on the
first one, in which may have to cache the input file. Subsequent
runs can use the cached file, without ever bothering to read from
a disk file.)
As I mentioned elsethread, one should always be suspicious of a
loop of this form. In this particular case, the code was OK only
if the input file has no errors. If you were to run it with input
directed to, e.g., a partly-erased floppy disk, it could loop
forever trying to read the bad part of the disk.