usage of size_t

Keith Thompson · Mar 5, 2010

James said:
Is this 100% guaranteed to return NULL:

assert(! calloc(2, SIZE_MAX));

I don't think so. Note that I referred to "most implementations".
I think an implementation *may* allow the creation of objects
bigger than size_max bytes. Note that size_t is defined as the
result yielded by the sizeof operator; if the above calloc() call
succeeds, you can't apply sizeof to it -- unless you cast the result
to char(*)[2][SIZE_MAX].

For that matter, unless you assume that sizeof *cannot* fail, this:

char huge[2][SIZE_MAX];

isn't guaranteed to be rejected.

My current opinion is that (a) an implementation is permitted to
allow the creation of objects bigger than SIZE_MAX bytes, (b) an
implementation is definitely not *required* to allow the creation
of such objects, and (c) most or all implementations do and should
disallow the creation of such objects, simply by making size_t big
enough to represent the size of any possible object.

Phil Carmody · Mar 5, 2010

Certainly, but he wasn't a writer of English _for skilled readers_, at
least not in his more well-known work. He wrote for beginners - and did
so well. But one mustn't confuse his skill with that of his audience.

It was also constrained writing; as such, harder to write than its
unconstrained equivalent.

Phil

lawrence.jones · Mar 5, 2010

Keith Thompson said:
My current opinion is that (a) an implementation is permitted to
allow the creation of objects bigger than SIZE_MAX bytes, (b) an
implementation is definitely not *required* to allow the creation
of such objects, and (c) most or all implementations do and should
disallow the creation of such objects, simply by making size_t big
enough to represent the size of any possible object.

From what I remember of the discussions, the sense of the committee is
that implementations should not allow the declaration of objects bigger
than SIZE_MAX bytes, but there's really no way to forbid it in the
standard.

Keith Thompson · Mar 6, 2010

From what I remember of the discussions, the sense of the committee is
that implementations should not allow the declaration of objects bigger
than SIZE_MAX bytes, but there's really no way to forbid it in the
standard.

Why couldn't it be a constraint violation (at least for non-VLAs)?

Tim Rentsch · Mar 6, 2010

Keith Thompson said:
I'm skeptical that that would even be possible in most
implementations. On the implementations I've seen, size_t is big
enough to span the machine's entire addressing space; an object
bigger than 2**SIZE_MAX bytes isn't even possible.

I agree this case is more common. But the case where size_t is
smaller than the entire virtual address space also seems
reasonably plausible, especially as virtual address spaces get
bigger.

The only plausible scenario I can think of is an implementation
that deliberately restricts the size of an object for some reason.
If some external magic provides the address of a huge object, it's
not clear that a program could even index it (an attempt to do so
presumably would invoke undefined behavior). I suspect we're in
DS9K territory.

Not at all. Choosing a 32-bit size_t for 64-bit virtual address
space is a perfectly reasonable choice on most home-PC-class
machines. Very very few objects will have a size larger than 4 GB;
why make every single datum of type size_t pay an extra 4 bytes
that will basically never be used?

Tim Rentsch · Mar 6, 2010

Keith Thompson said:
Why couldn't it be a constraint violation (at least for non-VLAs)?

Of course it could be a constrain violation (since any condition
checkable at compile time could in principle be specified as a CV),
but it shouldn't be. The reason it shouldn't be is that constraint
violations all (I haven't checked this but I'm pretty sure) have
the property that they are not implementation-specific; a given
program source either has or does not have a constraint violation
no matter which implementation [*] it's compiled on. Making a
constraint violation depend on SIZE_MAX would invalidate that
invariant.

[*] Actually there is one exception to this, namely the particular
allowance for implementation-specific forms of constant expressions.
But I don't think that exception weakens the argument.

Ben Bacarisse · Mar 6, 2010

Tim Rentsch said:
Keith Thompson said:

Why couldn't it be a constraint violation (at least for non-VLAs)?

Click to expand...

Of course it could be a constrain violation (since any condition
checkable at compile time could in principle be specified as a CV),
but it shouldn't be. The reason it shouldn't be is that constraint
violations all (I haven't checked this but I'm pretty sure) have
the property that they are not implementation-specific; a given
program source either has or does not have a constraint violation
no matter which implementation [*] it's compiled on. Making a
constraint violation depend on SIZE_MAX would invalidate that
invariant.

There are so many counter examples to this that I suspect I have
misunderstood your point. A few examples:

clock_t s = 2; int x = 1; x = x % s; /* 6.5.5 p2 */

struct s { int bf : 24; }; /* 7.2.1 p3 */

void f(fpos_t restrict p1; int *restrict p2); /* 6.7.3 p2 */

Directly analogous to the case being discussed is the constraint in
6.4.4 p2 that constants shall have values representable in the type of
the constant.

[*] Actually there is one exception to this, namely the particular
allowance for implementation-specific forms of constant expressions.
But I don't think that exception weakens the argument.

I don't think that is a fair exception since so many constraints
require constants. For example, is this:

(int [n]){0}

a CV or not? It depends on whether n is or is not a constant
expression.

Tim Rentsch · Mar 6, 2010

Ben Bacarisse said:
Tim Rentsch said:

Keith Thompson said:

(e-mail address removed) writes:
My current opinion is that (a) an implementation is permitted to
allow the creation of objects bigger than SIZE_MAX bytes, (b) an
implementation is definitely not *required* to allow the creation
of such objects, and (c) most or all implementations do and should
disallow the creation of such objects, simply by making size_t big
enough to represent the size of any possible object.

From what I remember of the discussions, the sense of the committee is
that implementations should not allow the declaration of objects bigger
than SIZE_MAX bytes, but there's really no way to forbid it in the
standard.

Why couldn't it be a constraint violation (at least for non-VLAs)?

Click to expand...

Of course it could be a constrain violation (since any condition
checkable at compile time could in principle be specified as a CV),
but it shouldn't be. The reason it shouldn't be is that constraint
violations all (I haven't checked this but I'm pretty sure) have
the property that they are not implementation-specific; a given
program source either has or does not have a constraint violation
no matter which implementation [*] it's compiled on. Making a
constraint violation depend on SIZE_MAX would invalidate that
invariant.

Click to expand...

There are so many counter examples to this that I suspect I have
misunderstood your point.

I don't think you did, but that's very kind of you to say...

A few examples:

clock_t s = 2; int x = 1; x = x % s; /* 6.5.5 p2 */

struct s { int bf : 24; }; /* 7.2.1 p3 */

void f(fpos_t restrict p1; int *restrict p2); /* 6.7.3 p2 */

Directly analogous to the case being discussed is the constraint in
6.4.4 p2 that constants shall have values representable in the type of
the constant.

A huge shotgun blast right through what I was suggesting!
That'll teach me to make a blanket statement without checking
first (as if I needed to be taught how to make such statements).
Thank you for blowing a huge hole in my thesis. (I might
discount the cases involving types from the Standard library,
but that still leaves the other cases. By the way, I think you
mean 6.7.2.1 p3 for the second case above.)

Trying to regroup... The constraints in the above cases in
some sense necessary because there is no reasonable way to
make sense of what's being expressed otherwise. I contend
that having objects larger than SIZE_MAX doesn't fall in
that category, because there is a way to make sense of
such objects; it's only if we insist on using values
of type size_t to hold their sizes or their lengths that
we run into trouble.

[*] Actually there is one exception to this, namely the particular
allowance for implementation-specific forms of constant expressions.
But I don't think that exception weakens the argument.

Click to expand...

I don't think that is a fair exception since so many constraints
require constants. For example, is this:

(int [n]){0}

a CV or not? It depends on whether n is or is not a constant
expression.

That's my point; here we have a possible CV that depends very
directly on an implementation-specific freedom. It's different from
the other examples you mention in that it takes something that
normally _would_ be a CV and makes it _not_ a CV, whereas for the
others there isn't any "normal" choice exactly (although in some
cases a CV can always be avoided by staying inside minimum limits).
It's because of this difference that this case seems to fall in a
different category than the objects-larger-than-SIZE_MAX question.

Keith Thompson · Mar 6, 2010

Tim Rentsch said:
Not at all. Choosing a 32-bit size_t for 64-bit virtual address
space is a perfectly reasonable choice on most home-PC-class
machines. Very very few objects will have a size larger than 4 GB;
why make every single datum of type size_t pay an extra 4 bytes
that will basically never be used?

Ok, not quite DS9K; maybe DS8K?

I've never seen a system with a 64-bit virtual address space (which
usually implies 64-bit pointers) and a 32-bit size_t. Of course
such systems could exist, but I don't think that's very likely.
A bit of googling indicates that Win64 has 32-bit longs, 64-bit
pointers, and 64-bit size_t. (Cue counterexample?)

Keith Thompson · Mar 6, 2010

Tim Rentsch said:
Trying to regroup... The constraints in the above cases in
some sense necessary because there is no reasonable way to
make sense of what's being expressed otherwise. I contend
that having objects larger than SIZE_MAX doesn't fall in
that category, because there is a way to make sense of
such objects; it's only if we insist on using values
of type size_t to hold their sizes or their lengths that
we run into trouble.

[...]

The fundamental question is this: can the size of any object *always*
be represented as a size_t?

The standard doesn't currently answer this question, at least not
clearly. It says that size_t is the type of the result of sizeof,
but there are objects whose size cannot be (directly) determined
using sizeof. We can't use malloc() to create an object bigger than
SIZE_MAX bytes, because malloc()'s size parameter is of type size_t.
We *might* be able to create such an object with calloc(SIZE_MAX,
2); on the other hand, an implementation needn't support such calls
(calloc() can fail and return NULL), and I suspect the inventor(s)
of calloc() didn't intend it for that purpose.

A new version of the standard *could* establish a new rule that no
object may exceed SIZE_MAX bytes. (It's been argued, unpersuasively
IMHO, that this rule is already implicit in the current standard.)
I think the following would be sufficient to establish this:

-- A non-VLA type whose size exceeds SIZE_MAX bytes is a constraint
violation.
-- A VLA type whose size exceeds SIZE_MAX bytes cause the program's
behavior to be undefined.
-- A call to calloc() where the mathematical product of the two
arguments exceeds SIZE_MAX must return a null pointer.

Now the question is whether this would be a good idea, and that's
a matter of opinion. In my opinion, it would be. Implementations
could still support objects as large as they're able to; they'd just
have to define size_t appropriately. (All implementations I'm aware
of already do this.) If you create an object by calling calloc(),
you wouldn't have to worry about how to represent its size. If a
function takes a pointer to the first element of an array, you can
reliably use size_t to index it.

The alternative would be to permit objects bigger than SIZE_MAX
bytes, but such objects couldn't be created by malloc(), which
strikes me as an unduly arbitrary restriction. Usually, if I
want to create a really big object, malloc() is the way to do it.
Switching to calloc() because it can create a bigger object seems
silly, since it imposes the overhead of zeroing the object (which
could be significant for something that big).

I just think that a clear statement that size_t can represent the
size of any object (because that's what size_t is for) makes for
a cleaner language.

And if you want a collection of data larger than SIZE_MAX bytes,
you can always use a file.

lawrence.jones · Mar 6, 2010

Keith Thompson said:
Why couldn't it be a constraint violation (at least for non-VLAs)?

Because that would be too easy.

Keith Thompson · Mar 7, 2010

Because that would be too easy.

Ah, *finally* I understand how the committee operates.

}

Malcolm McLean · Mar 8, 2010

Why not just this:

for(i=N-1; i != -1; i--)

which works for any integer type, either signed or
unsigned, whose conversion rank is at least that of int.

i is an unsigned type, and we're terminating the loop when it doesn't
equal -1.

Anyone who doesn't know C like the back of his hand will curse you for
writing code like that.

Ersek, Laszlo · Mar 8, 2010

i is an unsigned type, and we're terminating the loop when it doesn't
equal -1.

Anyone who doesn't know C like the back of his hand will curse you for
writing code like that.

Integer promotions and integer conversions are very important. I think
with the stream of published security vulnerabilities due to integer
overflows they should start to become common knowledge (not their
details, but at least their existence). I agree that a bit more
verbosity would be helpful, as in "(type)-1".

--o--

Yet another question on integer overflow. This was bugging me for some
time.

When *converting* an already existing value (ie. the result of an
evaluated expression) to a signed integer type, so that the value to
convert is not representable by the target type, the result is
implementation-defined (in C89), or the result is implementation-defined
or an implementation-defined signal is raised (in C99).

However, when *computing* such a value by arithmetic operators, if the
result cannot be produced at all, as represented in the signed integer
"target type", selected by the individual operators, then the behavior
is undefined.

Suppose "int" has one sign bit, 31 value bits, no padding bits; and that
"long" has one sign bit, 63 value bits, no padding bits. Then

int i = (long)INT_MAX + INT_MAX;

initializes "i" to an implementation-defined value (or raises an
implementation-defined signal, under C99), while

int i = INT_MAX + INT_MAX;

is undefined behavior (in the evaluation of the additive operator).

For unsigned integer types, both categories (ie. conversion and the
arithmetic operators) define reduction modulo (maxval+1) for such cases.

/* conversion, and suppose no overflow in "long unsigned" */
unsigned u1 = (long unsigned)UINT_MAX + UINT_MAX;

/* overflow */
unsigned u2 = UINT_MAX + UINT_MAX;

Both variables shall be initialized to "UINT_MAX - 1u".

Is this correct? I was (am) confused about the distinction between
implementation-defined and undefined behavior in case of the signed
integer types.

Thanks,
lacos

Malcolm McLean · Mar 8, 2010

Is this correct? I was (am) confused about the distinction between
implementation-defined and undefined behavior in case of the signed
integer types.

The result of overflowing a signed integer is undefined.
In practice this means that on some machines you will get an overflow
to numbers of the opposite sign, whilst on others a signal will be
raised and the program will terminate with an error message. However
the behaviour is "undefined" rather than "implementation defined" so
that compilers have more freedom to optimise code without worrying
about the consequences of overflow - bascially the program may do
anything if you overflow a signed integer.

Tim Rentsch · Mar 22, 2010

Malcolm McLean said:
i is an unsigned type, and we're terminating the loop when it doesn't
equal -1.

Anyone who doesn't know C like the back of his hand will curse you for
writing code like that.

They won't curse me, because I don't write such code. Among
other things, it transgresses the warning criteria for comparing
signed and unsigned operands. My point was that it is more
robust, in terms of possible changes in type, to omit the cast
and compare against just -1.

A better way to write the condition is 'i + 1 != 0', which avoids
both the signed/unsigned comparison warning and the above criticism
about loop termination. Furthermore, on writing this for loop as

for(i=N-1; i + 1 != 0; i--)

it is immediately obvious that it could be rewritten as

i = N;
while(--i + 1 != 0)

or more simply just as

i = N;
while(i-- != 0)

which is how it should have been written in the first
place.

Tim Rentsch · Mar 22, 2010

Integer promotions and integer conversions are very important. I think
with the stream of published security vulnerabilities due to integer
overflows they should start to become common knowledge (not their
details, but at least their existence). I agree that a bit more
verbosity would be helpful, as in "(type)-1".

--o--

Yet another question on integer overflow. This was bugging me for some
time.

When *converting* an already existing value (ie. the result of an
evaluated expression) to a signed integer type, so that the value to
convert is not representable by the target type, the result is
implementation-defined (in C89), or the result is implementation-defined
or an implementation-defined signal is raised (in C99).

However, when *computing* such a value by arithmetic operators, if the
result cannot be produced at all, as represented in the signed integer
"target type", selected by the individual operators, then the behavior
is undefined.

Suppose "int" has one sign bit, 31 value bits, no padding bits; and that
"long" has one sign bit, 63 value bits, no padding bits. Then

int i = (long)INT_MAX + INT_MAX;

initializes "i" to an implementation-defined value (or raises an
implementation-defined signal, under C99), while

int i = INT_MAX + INT_MAX;

is undefined behavior (in the evaluation of the additive operator).

For unsigned integer types, both categories (ie. conversion and the
arithmetic operators) define reduction modulo (maxval+1) for such cases.

/* conversion, and suppose no overflow in "long unsigned" */
unsigned u1 = (long unsigned)UINT_MAX + UINT_MAX;

/* overflow */
unsigned u2 = UINT_MAX + UINT_MAX;

Both variables shall be initialized to "UINT_MAX - 1u".

Is this correct? I was (am) confused about the distinction between
implementation-defined and undefined behavior in case of the signed
integer types.

Yes, your analysis is exactly correct as far as I can see,
except that the '/* overflow */' comment is right only
informally, ie, the meaning is a little different from
how the Standard uses the term 'overflow'.

In particular, _computing_ an out-of-range value in a signed
type is undefined behavior, but _converting_ an out-of-range
value to a signed type is implmentation-defined behavior
(or an ID signal as you mention).

I think there also are cases involving converting floating
point values to both signed and unsigned types that are
undefined behavior, but I haven't taken the time to look
these up.

Tim Rentsch · Mar 22, 2010

Keith Thompson said:
Ok, not quite DS9K; maybe DS8K?

Perhaps you're thinking of the DS8080.

Ersek, Laszlo · Mar 22, 2010

Tim Rentsch said:
In particular, _computing_ an out-of-range value in a signed
type is undefined behavior, but _converting_ an out-of-range
value to a signed type is implmentation-defined behavior
(or an ID signal as you mention).

Thanks!
lacos

Tim Rentsch · Mar 22, 2010

Keith Thompson said:
Tim Rentsch said:

Trying to regroup... The constraints in the above cases in
some sense necessary because there is no reasonable way to
make sense of what's being expressed otherwise. I contend
that having objects larger than SIZE_MAX doesn't fall in
that category, because there is a way to make sense of
such objects; it's only if we insist on using values
of type size_t to hold their sizes or their lengths that
we run into trouble.

Click to expand...

[...]

The fundamental question is this: can the size of any object *always*
be represented as a size_t?

The standard doesn't currently answer this question, at least not
clearly. It says that size_t is the type of the result of sizeof,
but there are objects whose size cannot be (directly) determined
using sizeof. We can't use malloc() to create an object bigger than
SIZE_MAX bytes, because malloc()'s size parameter is of type size_t.
We *might* be able to create such an object with calloc(SIZE_MAX,
2); on the other hand, an implementation needn't support such calls
(calloc() can fail and return NULL), and I suspect the inventor(s)
of calloc() didn't intend it for that purpose.

It's also possible that large objects could be created using
non-portable system-specific calls.

A new version of the standard *could* establish a new rule that no
object may exceed SIZE_MAX bytes. (It's been argued, unpersuasively
IMHO, that this rule is already implicit in the current standard.)
I think the following would be sufficient to establish this:

-- A non-VLA type whose size exceeds SIZE_MAX bytes is a constraint
violation.
-- A VLA type whose size exceeds SIZE_MAX bytes cause the program's
behavior to be undefined.

Beep! Once you allow undefined behavior, there can't also be a
guarantee that the "no large objects" condition is always observed.

-- A call to calloc() where the mathematical product of the two
arguments exceeds SIZE_MAX must return a null pointer.

Now the question is whether this would be a good idea, and that's
a matter of opinion. In my opinion, it would be. Implementations
could still support objects as large as they're able to; they'd just
have to define size_t appropriately. (All implementations I'm aware
of already do this.) If you create an object by calling calloc(),
you wouldn't have to worry about how to represent its size. If a
function takes a pointer to the first element of an array, you can
reliably use size_t to index it.

Here is the real problem. No matter what guarantees are made for
completely portable code (and no code with an object size larger
than 65535 is strictly conforming), these guarantees cannot be
made for extensions or extra-linguistic system calls, because
they operate (by definition) outside the bounds of what the
Standard prescribes. So even if the Standard seems to prohibit
objects larger than than SIZE_MAX bytes, it can't actually
prevent them from coming into existence even in implementations
that are fully conforming.

The alternative would be to permit objects bigger than SIZE_MAX
bytes, but such objects couldn't be created by malloc(), which
strikes me as an unduly arbitrary restriction. Usually, if I
want to create a really big object, malloc() is the way to do it.
Switching to calloc() because it can create a bigger object seems
silly, since it imposes the overhead of zeroing the object (which
could be significant for something that big).

The calloc() question seems like a non-issue to me, mostly
because I presume most people interpret the Standard as
meaning calloc() does its calculations using a (size_t)
type.

I just think that a clear statement that size_t can represent the
size of any object (because that's what size_t is for) makes for
a cleaner language.

The problem is such a guarantee cannot be made without radically
changing the rules about what constitutes conforming implementations
(not counting impractical solutions like making size_t be 100
bytes or something). Given that, it seems better to accept
the possibility of overly large objects, and either ignore it or
take it into account, depending on one's own predilictions.
IMO it's almost always a mistake to use 'size_t' commonly
in regular code; better to use appropriate typedef's.

And if you want a collection of data larger than SIZE_MAX bytes,
you can always use a file.

If I want a collection of data larger than 65,535 bytes, that can be
done quite portably using a file (assuming the local file system
supports files that big, which I don't think is actually required,
even in a hosted implementation). But I certainly can imagine
scenarios when I want a regular object with more than SIZE_MAX
bytes or more than SIZE_MAX elements; I don't see any reason
to forbid a "small" implementation just because someone here
or there wants to use that implementation outside of its normal
envelope.

size_t, ssize_t and ptrdiff_t	56	Oct 12, 2013
size_t in inttypes.h	4	May 26, 2011
The problem with size_t	45	Oct 15, 2009
return -1 using size_t???	44	Feb 11, 2012
Plauger, size_t and ptrdiff_t	26	Feb 17, 2006
size_t and ptr_diff_t	9	Aug 23, 2007
size_t	18	Dec 6, 2004
finding max value of size_t	22	Mar 9, 2007

usage of size_t

Keith Thompson

Phil Carmody

lawrence.jones

Keith Thompson

Tim Rentsch

Tim Rentsch

Ben Bacarisse

Tim Rentsch

Keith Thompson

Keith Thompson

lawrence.jones

Keith Thompson

Malcolm McLean

Ersek, Laszlo

Malcolm McLean

Tim Rentsch

Tim Rentsch

Tim Rentsch

Ersek, Laszlo

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads