ssize_t and size_t

kid joe · May 19, 2009

Hi all,

I was thinking about interfaces like this one for Unix read

ssize_t read(int fd, void *buf, size_t count);

(I know that read isnt an ISO C function, but the question is nothing to
do with read, just the function interface).

The return value is -1 in case of error, or an integer between 0 and count
giving the number of bytes read into buf.

This seems like a poor choice to me.... If I pass SIZE_MAX for count, then
there's no way of distinguishing between an error (-1) and a successful
read of ((size_t) -1) bytes.

Wouldn't it be better for the function to return an unsigned size_t and
notify the caller of errors in another way (eg an extra parameter)?

Cheers,
Joe

Antoninus Twink · May 20, 2009

This seems like a poor choice to me.... If I pass SIZE_MAX for count,
then there's no way of distinguishing between an error (-1) and a
successful read of ((size_t) -1) bytes.

This is hardly likely to be a problem in practise - if you try to
allocate a buffer of size SIZE_MAX to pass to read(), it's rather likely
that malloc() will return a null pointer...

Nate Eldredge · May 20, 2009

kid joe said:
Hi all,

I was thinking about interfaces like this one for Unix read

ssize_t read(int fd, void *buf, size_t count);

(I know that read isnt an ISO C function, but the question is nothing to
do with read, just the function interface).

The return value is -1 in case of error, or an integer between 0 and count
giving the number of bytes read into buf.

This seems like a poor choice to me.... If I pass SIZE_MAX for count, then
there's no way of distinguishing between an error (-1) and a successful
read of ((size_t) -1) bytes.

There is, actually: set `errno' to 0 beforehand and check afterwards to
see if it is nonzero. Not the most convenient thing, but your only
option if you might pass SIZE_MAX as an argument.

However, better would be not to do that at all, and that's usually the
officia stance: don't do that. For instance, my system's man page for
`read' says that any value for `count' larger than INT_MAX is
errnoneous, and read() returns -1 and sets errno to EINVAL. So if
`count' is SIZE_MAX (typically larger than INT_MAX), you know ahead of
time that the read() is going to fail.

You're right, though, it isn't an ideally designed interface. However,
it has the benefit of being simple.

Wouldn't it be better for the function to return an unsigned size_t and
notify the caller of errors in another way (eg an extra parameter)?

Probably. But the design of the interface is so historical that it
isn't likely to be changed at this point. When it was designed, I
presume all these arguments were expected to be `int', and sizes that
might not fit in a positive `int' were probably considered outlandishly
large (on 32-bit machines, at least).

Keith Thompson · May 20, 2009

kid joe said:
I was thinking about interfaces like this one for Unix read

ssize_t read(int fd, void *buf, size_t count);

(I know that read isnt an ISO C function, but the question is nothing to
do with read, just the function interface).

The return value is -1 in case of error, or an integer between 0 and count
giving the number of bytes read into buf.

This seems like a poor choice to me.... If I pass SIZE_MAX for count, then
there's no way of distinguishing between an error (-1) and a successful
read of ((size_t) -1) bytes.

Wouldn't it be better for the function to return an unsigned size_t and
notify the caller of errors in another way (eg an extra parameter)?

There's certainly a good case to be made for that.

On the other hand, both Unix and C have a long tradition of squeezing
error indications into results alongside useful data. Examples in
standard C include the time() function, which returns either the
current time or (time_t)-1, and getchar(), which returns either a
character value or EOF; both of these can cause problems.

The read function originally returned an int (source: K&R1 chapter 8,
The UNIX System Interface). It was probably just assumed that you'd
never try to read more than INT_MAX bytes as a time.

Today, on a 32-bit system, size_t and ssize_t are probably going to be
32 bits; you *might* want to read between 2 and 4 gigabytes in a
single operation, but it's not likely. On a 64-bit system, size_t and
ssize_t are probably going to be 64 bits; it's going to be a while
before single read operations between 8 and 16 exabytes become common,
or even possible. (Of course those aren't the only possibilities
consistent with the standard.)

Keith Thompson · May 20, 2009

Nate Eldredge said:
There is, actually: set `errno' to 0 beforehand and check afterwards to
see if it is nonzero. Not the most convenient thing, but your only
option if you might pass SIZE_MAX as an argument.

[...]

That doesn't work reliably. I'm not sure about read(), but generally
for functions that can set errno, you shouldn't check the value of
errno until the function actually tells you that there's been an error
(by whatever mechanism it uses).

For example, I've seen cases where an output routine checks whether
its output stream is at tty. The code that performs this check might
set errno as a side effect. The higher-level routine needn't set
errno to a meaningful value unless it fails.

So you need to set errno to 0, then call the routine, then check
whether it signals an error, and only then check the value of errno.

Nate Eldredge · May 20, 2009

pete said:
If ssize_t can represent SIZE_MAX,

then (-1 != (ssize_t)SIZE_MAX) is defined as true.

Typically it cannot. On all of the four different Unix systems I just
checked, size_t and ssize_t are unsigned and signed versions of the same
integer type (e.g. unsigned int vs signed int, unsigned long vs signed
long, uint64_t vs int64_t).

James Kuyper · May 20, 2009

kid said:
Hi all,

I was thinking about interfaces like this one for Unix read

ssize_t read(int fd, void *buf, size_t count);

(I know that read isnt an ISO C function, but the question is nothing to
do with read, just the function interface).

The return value is -1 in case of error, or an integer between 0 and count
giving the number of bytes read into buf.

This seems like a poor choice to me.... If I pass SIZE_MAX for count, then
there's no way of distinguishing between an error (-1) and a successful
read of ((size_t) -1) bytes.

The man page for read() on my desktop says "If count is greater than
SSIZE_MAX, the result is unspecified." ((size_t)-1) will generally be
greater than SSIZE_MAX, so you shouldn't even be attempting this.

Stephen Sprunk · May 21, 2009

Nate said:
There is, actually: set `errno' to 0 beforehand and check afterwards to
see if it is nonzero. Not the most convenient thing, but your only
option if you might pass SIZE_MAX as an argument.

However, better would be not to do that at all, and that's usually the
officia stance: don't do that. For instance, my system's man page for
`read' says that any value for `count' larger than INT_MAX is
errnoneous, and read() returns -1 and sets errno to EINVAL. So if
`count' is SIZE_MAX (typically larger than INT_MAX), you know ahead of
time that the read() is going to fail.

Aside: This is due to the fact that read() hearkens from C's "everything
an int" days long ago. The "count" parameter was originally an int,
therefore it was not possible to pass a count larger than INT_MAX, and
therefore a negative (int) return value unquestionably meant an error.
However, when ANSI created size_t, the POSIX folks went back and changed
many of their types to size_t, which enabled passing larger values in
many cases because size_t was unsigned; however, the return value for
many other functions needed to accommodate negative values so it was
changed to ssize_t (which they invented for the purpose and is not part
of ANSI/ISO C).

The result is what Nate explains: you can't meaningfully pass a value
larger than SIZE_MAX/2 to read(), even though the parameter has type
size_t -- and most implementations will check for that case and
immediately return an error if you try it. Implementations with a
64-bit size_t and a 32-bit int may also disallow passing a count greater
than INT_MAX, even though that would be meaningful.

Any time you see a return value of ssize_t, expect this limitation to
rear its ugly head.

S

chunpulee · May 25, 2009

ssize_t read(int fd, void *buf, size_t count, int *err_code);

size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
return -1 using size_t???	44	Feb 11, 2012
size_t in inttypes.h	4	May 26, 2011
error: conflicting declaration 'typedef int32_t ssize_t' (mingw versus berkeley db)	5	Nov 27, 2011
mixed declarations and code (and size_t)?	7	Nov 15, 2010
kernel module compile error in c.	4	Dec 6, 2013
sizeof and strlen()	29	May 7, 2010
Plauger, size_t and ptrdiff_t	26	Feb 17, 2006

ssize_t and size_t

kid joe

Antoninus Twink

Nate Eldredge

Keith Thompson

Keith Thompson

Nate Eldredge

James Kuyper

Stephen Sprunk

chunpulee

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads