fread 1 byte x N vs N bytes x 1

D

David Mathog

When reading a binary input stream with fread() one can
read N bytes in two ways :

count=fread(buffer,1,N,fin); /* N bytes at a time */

or

count=fread(buffer,N,1,fin); /* 1 buffer at a time */

I would assume the latter form would be faster, or at least
less of a load on the CPU. That's just an assumption though,
is it typically true?

Speed matters here, but the form which (I suspect) is faster can't
handle a partial input block. If the input file isn't a multiple
of N it will give an error on the final read, and fread()
provides no possible way of figuring how many, if any of the
data bytes in buffer are valid.

For the program I'm working on at the moment when it's
reading from a tape drive I'm pretty much guaranteed that
the input is a multiple of the block size. If it isn't, then there
was some sort of read error or maybe the wrong block size was
specified. However when it's reading across the network the
input might be some odd size. If it is, it needs to be padded
out to the full block size before it is written to the tape drive.

As far as I can tell the only way to do that is with the first form.
Well, unless I put "dd" with padding turned on in a pipe upstream
from the program, but I'd rather handle this situation internally
within the one program.

Thanks,

David Mathog
 
B

Ben Pfaff

David Mathog said:
When reading a binary input stream with fread() one can
read N bytes in two ways :

count=fread(buffer,1,N,fin); /* N bytes at a time */

or

count=fread(buffer,N,1,fin); /* 1 buffer at a time */

I would assume the latter form would be faster, or at least
less of a load on the CPU. That's just an assumption though,
is it typically true?

No. Typically fread will multiply the second and third arguments
together and read (at least) that many bytes. The two forms are
then equivalent.
 
R

Richard Heathfield

David Mathog said:
When reading a binary input stream with fread() one can
read N bytes in two ways :

count=fread(buffer,1,N,fin); /* N bytes at a time */

or

count=fread(buffer,N,1,fin); /* 1 buffer at a time */

I would assume the latter form would be faster, or at least
less of a load on the CPU. That's just an assumption though,
is it typically true?

I doubt it very much, because it's all buffered up anyway. Whichever way you
do it, the implementation is very likely to multiply the two middle
parameters, and do something like this:

bytestoread = size * nobj;
bytesread = _platform_specific_stream_reader(buffer, bytestoread, fin);
return bytesread / size;
 
P

Peter Nilsson

Ben said:
No. Typically fread will multiply the second and third arguments
together and read (at least) that many bytes. The two forms are
then equivalent.

There's still a significant difference in the return value.
Particularly on
end of file. There's also a subtle difference in the contents of the
buffer
on partial read.

But the reason that the parameters are typically multiplied is because
of the standard...

The fread function reads, into the array pointed to by ptr, up to
nmemb
elements whose size is specified by size, from the stream pointed to
by stream. For each object, size calls are made to the fgetc function
and the results stored, in the order read, in an array of unsigned
char
exactly overlaying the object. ... If a partial element is read, its
value
is indeterminate.

So, fread must work even if the buffer is not suitably aligned for an
object of the given size. Beacuse of that, plus the fact that I/O to
disc
itself is typically _much_ slower than any post I/O processing, and
standard I/O functions are synchronous, most implementations won't
bother looking for very minor performance improvement transferring
object size (2nd parameter) chunks.

Aside: Note that fread is typically used on binary streams.
Unfortunately, the standard still allows for binary streams to have
trailing null bytes. That means that using fread to read an unknown
number of objects can fail, because fread can return a count that
isn't actually correct.

So, when using fread/fwrite, it can make life slightly easier for file
formats include a true count of the number of objects being read
before the objects themselves are stored. In which case, unless
there is a particular need to detect partially read objects, you may
as well use the 'sizeof *p,N' form.
 
R

Richard Heathfield

Peter Nilsson said:
There's still a significant difference in the return value.

Yes, but I don't think Ben intended to claim otherwise. It seems to me that
his comments were directed only towards the performance aspects that the OP
was asking about.

<snip>
 
B

Ben Pfaff

Richard Heathfield said:
Peter Nilsson said:


Yes, but I don't think Ben intended to claim otherwise. It seems to me that
his comments were directed only towards the performance aspects that the OP
was asking about.

In context, I meant "equivalent in performance". But I should
have said so explicitly, to avoid confusion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top