usage of size_t

K

Keith Thompson

Tim Rentsch said:
Beep! Once you allow undefined behavior, there can't also be a
guarantee that the "no large objects" condition is always observed.

Once you allow undefined behavior, there can be no guarantees of
anything whatsoever.

We can't make creation of an object bigger than SIZE_MAX bytes a
constraint violation in all cases, because there are ways to create
objects whose size isn't known at compile time. What I suggest above
is the best we can do if we want the standard to state explicitly
that objects bigger than SIZE_MAX bytes are not supported.
Here is the real problem. No matter what guarantees are made for
completely portable code (and no code with an object size larger
than 65535 is strictly conforming), these guarantees cannot be
made for extensions or extra-linguistic system calls, because
they operate (by definition) outside the bounds of what the
Standard prescribes. So even if the Standard seems to prohibit
objects larger than than SIZE_MAX bytes, it can't actually
prevent them from coming into existence even in implementations
that are fully conforming.

Nor can it prevent demons from flying out of your nose.
The calloc() question seems like a non-issue to me, mostly
because I presume most people interpret the Standard as
meaning calloc() does its calculations using a (size_t)
type.

Do you mean that the multiplication is done using size_t?
If that were the case, then on a system with SIZE_MAX == 2**31-1,
calloc(641, 6700417) would allocate just 1 byte. But that
doesn't allocate "space for an array of _nmemb_ objects, each of
whose size is _size_", which is what calloc() is required to do.
(Some implementations do have this bug.)
The problem is such a guarantee cannot be made without radically
changing the rules about what constitutes conforming implementations
(not counting impractical solutions like making size_t be 100
bytes or something).

I'm not concerned about the behavior of programs whose behavior
is undefined. I'm merely suggesting that the standard state more
explicitly that objects bigger than SIZE_MAX bytes are not supported.

There could also be a statement that pointer arithmetic invokes
undefined behavior if the result is more than SIZE_MAX bytes away
from the address of the object (that needs some wordsmithing).

[...]
If I want a collection of data larger than 65,535 bytes, that can be
done quite portably using a file (assuming the local file system
supports files that big, which I don't think is actually required,
even in a hosted implementation). But I certainly can imagine
scenarios when I want a regular object with more than SIZE_MAX
bytes or more than SIZE_MAX elements; I don't see any reason
to forbid a "small" implementation just because someone here
or there wants to use that implementation outside of its normal
envelope.

If I want a regular object bigger than SIZE_MAX bytes, that just means
that SIZE_MAX is too small (perhaps because I need a bigger computer).
 
K

Keith Thompson

Keith Thompson said:
Once you allow undefined behavior, there can be no guarantees of
anything whatsoever.

We can't make creation of an object bigger than SIZE_MAX bytes a
constraint violation in all cases, because there are ways to create
objects whose size isn't known at compile time. What I suggest above
is the best we can do if we want the standard to state explicitly
that objects bigger than SIZE_MAX bytes are not supported.
[...]

Something I thought of after I posted the above: It's about the
same as the current situation where int values can't exceed INT_MAX.
An implementation *can* support such values if it does so only for
programs whose behavior is undefined:

int i = INT_MAX;
i ++; /* undefined behavior */
if (i > INT_MAX) {
puts("The implementation seems to support "
"int values greater than INT_MAX");
}

Of course there's no guarantee that the implementation will do so
consistently. And if it does, it probably just means that INT_MAX
should have been defined with a bigger value.
 
T

Tim Rentsch

Keith Thompson said:
Once you allow undefined behavior, there can be no guarantees of
anything whatsoever.

We can't make creation of an object bigger than SIZE_MAX bytes a
constraint violation in all cases, because there are ways to create
objects whose size isn't known at compile time. What I suggest above
is the best we can do if we want the standard to state explicitly
that objects bigger than SIZE_MAX bytes are not supported.

Do you mean that the Standard state a requirement that implementations
that support objects bigger than SIZE_MAX are not conforming? If you
do mean that, do you see the problem with the conflict between that
and what happens on undefined behavior? If you don't mean that, what
do you mean?


[snip]
I'm not concerned about the behavior of programs whose behavior
is undefined. I'm merely suggesting that the standard state more
explicitly that objects bigger than SIZE_MAX bytes are not supported.

There could also be a statement that pointer arithmetic invokes
undefined behavior if the result is more than SIZE_MAX bytes away
from the address of the object (that needs some wordsmithing).

Do you see that there's an inherent conflict here? Adding
more cases of undefined behavior makes the language less
constrained, not more constrained. We can't impose additional
requirements by taking away some requirements.
If I want a regular object bigger than SIZE_MAX bytes, that just means
that SIZE_MAX is too small (perhaps because I need a bigger computer).

Or it means that you're running on implementation that's
made a choice that you want to exceed occasionally. Like
wanting to do 128-bit arithmetic in an implementation
where uintmax_t is 64 bits.
 
T

Tim Rentsch

Keith Thompson said:
Keith Thompson said:
Once you allow undefined behavior, there can be no guarantees of
anything whatsoever.

We can't make creation of an object bigger than SIZE_MAX bytes a
constraint violation in all cases, because there are ways to create
objects whose size isn't known at compile time. What I suggest above
is the best we can do if we want the standard to state explicitly
that objects bigger than SIZE_MAX bytes are not supported.
[...]

Something I thought of after I posted the above: It's about the
same as the current situation where int values can't exceed INT_MAX.
An implementation *can* support such values if it does so only for
programs whose behavior is undefined:

int i = INT_MAX;
i ++; /* undefined behavior */
if (i > INT_MAX) {
puts("The implementation seems to support "
"int values greater than INT_MAX");
}

Yes, this analogy is quite similar to the SIZE_MAX case.
Of course there's no guarantee that the implementation will do so
consistently.

No guarantee in the Standard, but of course the implementation
could make such a guarantee.

And if it does, it probably just means that INT_MAX
should have been defined with a bigger value.

Does this mean you can think of no cases where it would be helpful
to have an implementation that held 'int' values that could cover
a range larger than INT_MIN to INT_MAX (and actually did
arithmetic over the larger range)? For example, a checking
implementation that is used to validate code against overflow? Or
as part of some secure coding development effort? Of course
usually we want integer types to use all the CHAR_BIT*sizeof(type)
bits available if possible, but it seems useful to have this
extra flexibility around in some cases.
 
K

Keith Thompson

Tim Rentsch said:
Do you mean that the Standard state a requirement that implementations
that support objects bigger than SIZE_MAX are not conforming? If you
do mean that, do you see the problem with the conflict between that
and what happens on undefined behavior? If you don't mean that, what
do you mean?

I mean that I'd like the standard to state that SIZE_MAX is an
upper bound on the maximum size of any object about as clearly as
it currently states that INT_MAX is the (least) upper bound on the
value of an object or expression of type int.

It's not possible to make this an absolute prohibition. What I'm
proposing is that creating objects bigger than SIZE_MAX bytes be made
a constraint violation when the size is known at compilation time,
and impossible for calloc() (which was never intended to be a way
to create such huge objects). For other cases where this can't be
determined at compile time, undefined behavior is the best we can do.

And by making the behavior undefined, we allow clever compilers to
reject code that exceeds the limit whenever they can figure it out
(i.e., in more cases than the standard can reasonably require).

The real point is not to impose an arbitrary restriction on the
sizes of objects; it's to encourage implementations to make SIZE_MAX
big enough.
[snip]
I'm not concerned about the behavior of programs whose behavior
is undefined. I'm merely suggesting that the standard state more
explicitly that objects bigger than SIZE_MAX bytes are not supported.

There could also be a statement that pointer arithmetic invokes
undefined behavior if the result is more than SIZE_MAX bytes away
from the address of the object (that needs some wordsmithing).

Do you see that there's an inherent conflict here? Adding
more cases of undefined behavior makes the language less
constrained, not more constrained. We can't impose additional
requirements by taking away some requirements.

A fair point, but given the goal of making SIZE_MAX big enough to
represent the size of any possible object, I can't think of a better
way to do it. If you don't share that goal, that's another matter.
Or it means that you're running on implementation that's
made a choice that you want to exceed occasionally. Like
wanting to do 128-bit arithmetic in an implementation
where uintmax_t is 64 bits.

Then either you need to use something other than an integer type,
such as an extended-width library of some sort, or uintmax_t should
be 128 bits.

On most existing systems, SIZE_MAX is big enough to span the entire
address space anyway, and objects bigger than SIZE_MAX bytes just
aren't possible. (By itself, that's not a strong argument for
banning such objects.)

Either SIZE_MAX is an upper bound on the maximum size of any object,
or it's an upper bound except in a few unusual cases.

And we already have files as a mechanism for handling data sets
larger than any object.
 
K

Keith Thompson

Tim Rentsch said:
Keith Thompson said:
Keith Thompson said:
[...]
A new version of the standard *could* establish a new rule that no
object may exceed SIZE_MAX bytes. (It's been argued, unpersuasively
IMHO, that this rule is already implicit in the current standard.)
I think the following would be sufficient to establish this:

-- A non-VLA type whose size exceeds SIZE_MAX bytes is a constraint
violation.
-- A VLA type whose size exceeds SIZE_MAX bytes cause the program's
behavior to be undefined.

Beep! Once you allow undefined behavior, there can't also be a
guarantee that the "no large objects" condition is always observed.

Once you allow undefined behavior, there can be no guarantees of
anything whatsoever.

We can't make creation of an object bigger than SIZE_MAX bytes a
constraint violation in all cases, because there are ways to create
objects whose size isn't known at compile time. What I suggest above
is the best we can do if we want the standard to state explicitly
that objects bigger than SIZE_MAX bytes are not supported.
[...]

Something I thought of after I posted the above: It's about the
same as the current situation where int values can't exceed INT_MAX.
An implementation *can* support such values if it does so only for
programs whose behavior is undefined:

int i = INT_MAX;
i ++; /* undefined behavior */
if (i > INT_MAX) {
puts("The implementation seems to support "
"int values greater than INT_MAX");
}

Yes, this analogy is quite similar to the SIZE_MAX case.
Of course there's no guarantee that the implementation will do so
consistently.

No guarantee in the Standard, but of course the implementation
could make such a guarantee.

And if it does, it probably just means that INT_MAX
should have been defined with a bigger value.

Does this mean you can think of no cases where it would be helpful
to have an implementation that held 'int' values that could cover
a range larger than INT_MIN to INT_MAX (and actually did
arithmetic over the larger range)? For example, a checking
implementation that is used to validate code against overflow? Or
as part of some secure coding development effort? Of course
usually we want integer types to use all the CHAR_BIT*sizeof(type)
bits available if possible, but it seems useful to have this
extra flexibility around in some cases.

There's such a thing as too much flexibility. For any int object i, I
can safely assume that I <= INT_MAX (unless I've already invoked
undefined behavior). I wouldn't be comfortable with that guarantee
being taken away.
 
K

Kenny McCormack

Keith Thompson said:
There's such a thing as too much flexibility. For any int object i, I
can safely assume that I <= INT_MAX (unless I've already invoked

That's a syntax error (unless, of course, there is another object I
declared somewhere else in your program).

C is a case sensitive language.

--
(This discussion group is about C, ...)

Wrong. It is only OCCASIONALLY a discussion group
about C; mostly, like most "discussion" groups, it is
off-topic Rorsharch revelations of the childhood
traumas of the participants...
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
Keith Thompson said:
[...]
A new version of the standard *could* establish a new rule that no
object may exceed SIZE_MAX bytes. (It's been argued, unpersuasively
IMHO, that this rule is already implicit in the current standard.)
I think the following would be sufficient to establish this:

-- A non-VLA type whose size exceeds SIZE_MAX bytes is a constraint
violation.
-- A VLA type whose size exceeds SIZE_MAX bytes cause the program's
behavior to be undefined.

Beep! Once you allow undefined behavior, there can't also be a
guarantee that the "no large objects" condition is always observed.

Once you allow undefined behavior, there can be no guarantees of
anything whatsoever.

We can't make creation of an object bigger than SIZE_MAX bytes a
constraint violation in all cases, because there are ways to create
objects whose size isn't known at compile time. What I suggest above
is the best we can do if we want the standard to state explicitly
that objects bigger than SIZE_MAX bytes are not supported.
[...]

Something I thought of after I posted the above: It's about the
same as the current situation where int values can't exceed INT_MAX.
An implementation *can* support such values if it does so only for
programs whose behavior is undefined:

int i = INT_MAX;
i ++; /* undefined behavior */
if (i > INT_MAX) {
puts("The implementation seems to support "
"int values greater than INT_MAX");
}

Yes, this analogy is quite similar to the SIZE_MAX case.
Of course there's no guarantee that the implementation will do so
consistently.

No guarantee in the Standard, but of course the implementation
could make such a guarantee.

And if it does, it probably just means that INT_MAX
should have been defined with a bigger value.

Does this mean you can think of no cases where it would be helpful
to have an implementation that held 'int' values that could cover
a range larger than INT_MIN to INT_MAX (and actually did
arithmetic over the larger range)? For example, a checking
implementation that is used to validate code against overflow? Or
as part of some secure coding development effort? Of course
usually we want integer types to use all the CHAR_BIT*sizeof(type)
bits available if possible, but it seems useful to have this
extra flexibility around in some cases.

There's such a thing as too much flexibility. For any int object i, I
can safely assume that I <= INT_MAX (unless I've already invoked
undefined behavior). I wouldn't be comfortable with that guarantee
being taken away.

Now I'm not sure what concern is causing your objection. If
we have a type 'T' and integer constant expressions 'M' and 'N'
(these have been #define'd), and a declaration

T x[M][N] = {0};

then we already have 'sizeof x <= SIZE_MAX', again unless there
has been undefined behavior. I don't know why the case with
SIZE_MAX seems to bother you more than the case with INT_MAX
does.
 
T

Tim Rentsch

Keith Thompson said:
Tim Rentsch said:
Keith Thompson said:
[...]
A new version of the standard *could* establish a new rule that no
object may exceed SIZE_MAX bytes. (It's been argued, unpersuasively
IMHO, that this rule is already implicit in the current standard.)
I think the following would be sufficient to establish this:

-- A non-VLA type whose size exceeds SIZE_MAX bytes is a constraint
violation.
-- A VLA type whose size exceeds SIZE_MAX bytes cause the program's
behavior to be undefined.

Beep! Once you allow undefined behavior, there can't also be a
guarantee that the "no large objects" condition is always observed.

Once you allow undefined behavior, there can be no guarantees of
anything whatsoever.

We can't make creation of an object bigger than SIZE_MAX bytes a
constraint violation in all cases, because there are ways to create
objects whose size isn't known at compile time. What I suggest above
is the best we can do if we want the standard to state explicitly
that objects bigger than SIZE_MAX bytes are not supported.

Do you mean that the Standard state a requirement that implementations
that support objects bigger than SIZE_MAX are not conforming? If you
do mean that, do you see the problem with the conflict between that
and what happens on undefined behavior? If you don't mean that, what
do you mean?

I mean that I'd like the standard to state that SIZE_MAX is an
upper bound on the maximum size of any object about as clearly as
it currently states that INT_MAX is the (least) upper bound on the
value of an object or expression of type int.

As far as I know both of these limits hold in the same
way -- it's possible to exceed them only after there
has been undefined behavior.
It's not possible to make this an absolute prohibition. What I'm
proposing is that creating objects bigger than SIZE_MAX bytes be made
a constraint violation when the size is known at compilation time,
and impossible for calloc() (which was never intended to be a way
to create such huge objects).

I think that's a mistake. Probably 999 times out of 1000 I'd
want the diagnostic but I think it's wrong to always require it
in any conforming compilation. What's wrong with leaving the
choice up to the implementation? Most implementations are
going to give a diagnostic by default (or maybe always) anyway.
For other cases where this can't be
determined at compile time, undefined behavior is the best we can do.

Not the best that a program can do -- program code can always check
that the total size being allocated doesn't exceed SIZE_MAX. Such
checks are pretty much exactly analogous to checks against overflow
in integer arithmetic.
And by making the behavior undefined, we allow clever compilers to
reject code that exceeds the limit whenever they can figure it out
(i.e., in more cases than the standard can reasonably require).

Or, other compilers to choose different behaviors that might appeal
to other developers.
The real point is not to impose an arbitrary restriction on the
sizes of objects; it's to encourage implementations to make SIZE_MAX
big enough.

It sounds like you're assuming that limiting objects to SIZE_MAX
bytes or arrays to SIZE_MAX extents (or encouraging implementations
to do so) is always desirable. I don't agree with that assumption.
[snip]
I just think that a clear statement that size_t can represent the
size of any object (because that's what size_t is for) makes for
a cleaner language.

The problem is such a guarantee cannot be made without radically
changing the rules about what constitutes conforming implementations
(not counting impractical solutions like making size_t be 100
bytes or something).

I'm not concerned about the behavior of programs whose behavior
is undefined. I'm merely suggesting that the standard state more
explicitly that objects bigger than SIZE_MAX bytes are not supported.

There could also be a statement that pointer arithmetic invokes
undefined behavior if the result is more than SIZE_MAX bytes away
from the address of the object (that needs some wordsmithing).

Do you see that there's an inherent conflict here? Adding
more cases of undefined behavior makes the language less
constrained, not more constrained. We can't impose additional
requirements by taking away some requirements.

A fair point, but given the goal of making SIZE_MAX big enough to
represent the size of any possible object, I can't think of a better
way to do it. If you don't share that goal, that's another matter.

I don't share the goal, but that's unrelated to the point I was
trying to make. The thing you want to do (and please excuse me
if I misrepresent your position, I don't mean to and am doing the
best I can not to) means putting stronger limits on what an
implementation can do (with regard to object sizes). Making _more_
things be undefined behavior reduces restrictions -- ie, it makes
the limits placed on implementations weaker, not stronger. Indeed,
if everything were undefined behavior then there would be no
limits at all -- anything could be a C compiler. So saying pointer
arithmetic of more than SIZE_MAX bytes is undefined behavior
gives implementations more freedom, not less; and, consequently,
makes programs less defined rather than more defined. It seems
like that direction is contrary to the direction you want to go.
Then either you need to use something other than an integer type,
such as an extended-width library of some sort, or uintmax_t should
be 128 bits.

Sorry, my comment was a little bit unclear here. What I mean
is uintmax_t has 64 _value_ bits (and I want it to have 64 value
bits for other reasons), but intmax_t/uintmax_t have 128 bits
of representation. Such an implementation is now conforming
(because of the undefined behavior loophole for signed integer
arithmetic). Do you think the Standard should be changed so
that such implementations _not_ be conforming? I can't think of
a good reason why the Standard should be changed to exclude
them.
On most existing systems, SIZE_MAX is big enough to span the entire
address space anyway, and objects bigger than SIZE_MAX bytes just
aren't possible. (By itself, that's not a strong argument for
banning such objects.)

Notice that implementations are free to ban such objects right now.
They are also free to allow such objects. What I think you're
saying is we should take away that second freedom and insist that
all (conforming) implementations ban such objects. I don't see any
reason to do that. I admit that most implementations won't choose
to exercise that right, but I think they should be free to make it
if that's what they want.
Either SIZE_MAX is an upper bound on the maximum size of any object,
or it's an upper bound except in a few unusual cases.

Certainly that's true in most (or maybe even all) implementations
right now. Even so, it's better to leave implementations the
freedom they have to make a different choice there.
And we already have files as a mechanism for handling data sets
larger than any object.

Unless you want to use indexing to access them. Also the statement
contains a hidden presumption, in the "larger than any object"
phrase. I'm not talking about dealing with objects larger than
objects, or even objects that are all that large necessarily. I'm
talking about dealing with objects larger than SIZE_MAX, which
doesn't have to be that big. The hidden presumption is that three
things -- SIZE_MAX, maximum object size, and usable address space
size -- are always going to be the same, or at least about the same.
I think that's a bad presumption because there are cases where it's
reasonable for any one of them to be different from the other two;
it seems like a mistake to insist that _all_ implementations avoid
some combinations of those design choices.
 
K

Keith Thompson

Tim Rentsch said:
As far as I know both of these limits hold in the same
way -- it's possible to exceed them only after there
has been undefined behavior.


I think that's a mistake. Probably 999 times out of 1000 I'd
want the diagnostic but I think it's wrong to always require it
in any conforming compilation. What's wrong with leaving the
choice up to the implementation? Most implementations are
going to give a diagnostic by default (or maybe always) anyway.

My personal opinion is that getting the diagnostic 1000 times out of
1000 is better than getting it 999 times out of a thousand.

Currently, if I'm writing a function that deals with an array of
unsigned char of arbitrary size, I can be only 99.9% sure that I can
safely use size_t to index it. I don't see that that 0.1% does
anybody any real good.

If I wanted to write code that creates objects bigger than SIZE_MAX
bytes, I could only use that code with an implementation that supports
such objects. I don't believe there currently are any such
implementations, so I can't write such code anyway. I propose
making that official.

And again, if it makes any sense to create objects bigger than
SIZE_MAX bytes, the problem is that the implementation chose too
small a value for SIZE_MAX.
Not the best that a program can do -- program code can always check
that the total size being allocated doesn't exceed SIZE_MAX. Such
checks are pretty much exactly analogous to checks against overflow
in integer arithmetic.


Or, other compilers to choose different behaviors that might appeal
to other developers.


It sounds like you're assuming that limiting objects to SIZE_MAX
bytes or arrays to SIZE_MAX extents (or encouraging implementations
to do so) is always desirable. I don't agree with that assumption.

Does it appeal to you? Are there any concrete non-hypothetical cases
where you really want to create objects bigger than SIZE_MAX bytes,
where having a bigger SIZE_MAX isn't the best solution?

Realistically, I might want to create an object bigger than 2**32
bytes on a system with 32-bit size_t -- but I can't, even if the C
implementation changed SIZE_MAX, because the underlying system just
doesn't support them. If I want to create such objects, I can either
use files rather than objects, or I can switch to a 64-bit system.

The above is limited to common current systems. Perhaps there
(or could be) are other systems with "exotic" memory models where
it makes more sense? Still, for all the cases I can think of,
the answer is to make SIZE_MAX bigger.

[...]
I don't share the goal, but that's unrelated to the point I was
trying to make. The thing you want to do (and please excuse me
if I misrepresent your position, I don't mean to and am doing the
best I can not to) means putting stronger limits on what an
implementation can do (with regard to object sizes). Making _more_
things be undefined behavior reduces restrictions -- ie, it makes
the limits placed on implementations weaker, not stronger. Indeed,
if everything were undefined behavior then there would be no
limits at all -- anything could be a C compiler. So saying pointer
arithmetic of more than SIZE_MAX bytes is undefined behavior
gives implementations more freedom, not less; and, consequently,
makes programs less defined rather than more defined. It seems
like that direction is contrary to the direction you want to go.

Yes, in principle adding more cases of undefined behavior gives more
freedom to the implementation.

I started with the assumption that objects should not be bigger than
SIZE_MAX bytes. (Why? Because that's what SIZE_MAX *means*, though
that's not currently its literal meaning.) Given this assumption,
I assert that implementations shouldn't support the creation or
manipulation of such objects, user code shouldn't attempt to create
such objects, and user code doesn't need to worry that other code
might have created such objects. (The last is perhaps the most
important goal.)

Given these assumptions, I conclude that the best we can do is to
make compile-time violations constraint violations, certain calloc()
calls fail by returning NULL, and other attempts undefined behavior.

There are different kinds of undefined behavior. Some cases are
things that are clearly errors that an implementation isn't required
to detect. In other cases, an implementation can reasonably define
and document a meaning, providing an extension. (The standard
doesn't make this distinction.) My intent is that the new cases
of UB are of the former kind, errors that needn't be diagnosed.
They'e UB rather than diagnosible errors simply because it's not
practical to diagnose them. (If C had exceptions ...)

[...]
Sorry, my comment was a little bit unclear here. What I mean
is uintmax_t has 64 _value_ bits (and I want it to have 64 value
bits for other reasons), but intmax_t/uintmax_t have 128 bits
of representation. Such an implementation is now conforming
(because of the undefined behavior loophole for signed integer
arithmetic). Do you think the Standard should be changed so
that such implementations _not_ be conforming? I can't think of
a good reason why the Standard should be changed to exclude
them.

No, I don't think the Standard should be changed to exclude them.
In practice, though, it's almost certain either that 64-bit integers
only require 64 bits of storage, or 128-bit integers can use all
128 bits of storage. In the rare counterexamples presumably there
are good reasons for having 64 padding bits, and you can't use them
as value bits anyway.
Notice that implementations are free to ban such objects right now.
They are also free to allow such objects. What I think you're
saying is we should take away that second freedom and insist that
all (conforming) implementations ban such objects.

Yes, to the extent that that's possible. Implementations are
perfectly free to support arbitrarily large objects; I'm merely
suggesting that thay should define SIZE_MAX appropriately to
reflect this.
I don't see any
reason to do that. I admit that most implementations won't choose
to exercise that right, but I think they should be free to make it
if that's what they want.

Then that's where we disagree.
Certainly that's true in most (or maybe even all) implementations
right now. Even so, it's better to leave implementations the
freedom they have to make a different choice there.
Unless you want to use indexing to access them. Also the statement
contains a hidden presumption, in the "larger than any object"
phrase. I'm not talking about dealing with objects larger than
objects, or even objects that are all that large necessarily. I'm
talking about dealing with objects larger than SIZE_MAX, which
doesn't have to be that big. The hidden presumption is that three
things -- SIZE_MAX, maximum object size, and usable address space
size -- are always going to be the same, or at least about the same.

Not quite. The maximum object size can be much smaller than the
usable address space size. For example it's entirely plausible that
an implementation with a 64-bit address space might only support
objects up to 2**32 bytes. I mentioned that most current systems
SIZE_MAX is big enough to span the entire address space but that
wasn't meant to be prescriptive.
I think that's a bad presumption because there are cases where it's
reasonable for any one of them to be different from the other two;
it seems like a mistake to insist that _all_ implementations avoid
some combinations of those design choices.

And what I'm suggesting is that SIZE_MAX should reflect (or be an
upper bound on) the maximum possible size of any object, because
that's what SIZE_MAX should mean.

Incidentally, there are several of your articles that I haven't
responded to, because it's going to require considerable time and
thought to do so. I've left them pending in my newsreader, and I
may or may not get around to responding to them eventually.
 
T

Tim Rentsch

Keith Thompson said:
My personal opinion is that getting the diagnostic 1000 times out of
1000 is better than getting it 999 times out of a thousand.

Yes, what you want is different from what I want. I get that.

If the Standard is left as is, so declarations resulting in sizes
larger than SIZE_MAX are not constraint violations, the chances are
_extremely_ high that you can and will still get all the diagnostics
you want; if, on the other hand, such cases are made constraint
violations, it's _guaranteed_ that conforming implementations will
give a (not always wanted) diagnostic. I know that you always want
it, but that's not true of everyone in all cases.

Currently, if I'm writing a function that deals with an array of
unsigned char of arbitrary size, I can be only 99.9% sure that I can
safely use size_t to index it. I don't see that that 0.1% does
anybody any real good.

This comment implicitly equates the 999/1000 pseudo-statistic (in effect
agreed to for the sake of discussion) with the percent chance that
size_t will work. That equality doesn't hold -- even if we grant the
pseudo-statistic, the actual chance that size_t will work is much
higher. You know that, right?

More importantly, unless size_t is required to be larger than the
address space of the machine in question (which seems like a bad
idea), there isn't any way to guarantee that using size_t will work in
all cases, _because code can obtain objects through extra-linguistic
means that don't have to obey implementation-enforced limits_.

If it's important to you to write code using indexing that will work
with any run-time object, including objects obtained through means not
under the implementation's control, there's nothing that says you have
to use size_t. There are other integer types with stronger guarantees
about how large a range they will cover -- use one of those instead.

If I wanted to write code that creates objects bigger than SIZE_MAX
bytes, I could only use that code with an implementation that supports
such objects.

This is sort of a silly statement. No one sets out to create
objects specifically larger than SIZE_MAX bytes, it happens
incidentally through other considerations. If someone wrote
code that had an array declaration like 'int a[10000][10000];',
that code could work _either_ with implementations of suitably
large SIZE_MAX, _or_ with implementations that allow objects
with size larger than SIZE_MAX.
I don't believe there currently are any such
implementations, so I can't write such code anyway. I propose
making that official.

The reasoning here seems bogus, based on the previous statement. Of
course you can write code that declares large arrays like those in the
above paragraph. What you want is to disallow implementations with
SIZE_MAX that is, say, 24 bits, from accepting them (with the obvious
meaning of "accepting"). If the implementation can run the program
and still be conforming in every other way, it makes more sense to
allow it than to disallow it.

And again, if it makes any sense to create objects bigger than
SIZE_MAX bytes, the problem is that the implementation chose too
small a value for SIZE_MAX.

I understand that that's your view. Other people have different
views.

Does it appeal to you? Are there any concrete non-hypothetical cases
where you really want to create objects bigger than SIZE_MAX bytes,
where having a bigger SIZE_MAX isn't the best solution?

I don't do much programming on embedded processors these days, but I
can easily imagine an embedded processor with a 16-bit or 24-bit word
size where in some cases it would make sense to have an object
larger than 16 or 24 bits worth of size.

Realistically, I might want to create an object bigger than 2**32
bytes on a system with 32-bit size_t -- but I can't, even if the C
implementation changed SIZE_MAX, because the underlying system just
doesn't support them. If I want to create such objects, I can either
use files rather than objects, or I can switch to a 64-bit system.

It sounds like you're assuming that an implementation with a 32-bit
size_t won't ever be running on a machine with a 64-bit address.
Even if that's true today, I don't see any reason to assume it
must be true tomorrow.
The above is limited to common current systems. Perhaps there
(or could be) are other systems with "exotic" memory models where
it makes more sense? Still, for all the cases I can think of,
the answer is to make SIZE_MAX bigger.

Again, I understand that that's your view.

[...]
I don't share the goal, but that's unrelated to the point I was
trying to make. The thing you want to do (and please excuse me
if I misrepresent your position, I don't mean to and am doing the
best I can not to) means putting stronger limits on what an
implementation can do (with regard to object sizes). Making _more_
things be undefined behavior reduces restrictions -- ie, it makes
the limits placed on implementations weaker, not stronger. Indeed,
if everything were undefined behavior then there would be no
limits at all -- anything could be a C compiler. So saying pointer
arithmetic of more than SIZE_MAX bytes is undefined behavior
gives implementations more freedom, not less; and, consequently,
makes programs less defined rather than more defined. It seems
like that direction is contrary to the direction you want to go.

Yes, in principle adding more cases of undefined behavior gives more
freedom to the implementation.

I started with the assumption that objects should not be bigger than
SIZE_MAX bytes. (Why? Because that's what SIZE_MAX *means*, though
that's not currently its literal meaning.) Given this assumption,
I assert that implementations shouldn't support the creation or
manipulation of such objects, user code shouldn't attempt to create
such objects, and user code doesn't need to worry that other code
might have created such objects. (The last is perhaps the most
important goal.)

My assumptions are different, but more importantly my conclusions are
different. I believe implementations should have the freedom to
choose whether or not they accommodate objects larger than SIZE_MAX
bytes. If you want this choice be implementation-defined, and thus
documented, I'm okay with that. If you want there to be additional
preprocessor symbols specified, so code can be adjusted for such
implementations using conditional compilation, I'm also okay with
that. And if you want to add a requirement that implementations must
provide an option that produces diagnostics for declarations larger
than SIZE_MAX bytes, I'm okay with that too. What I am not okay with
is saying _no_ conforming implementation must _ever_ be allowed to
tolerate objects larger than SIZE_MAX under any circumstances. As far
as I can tell, that last thing is what you're proposing.

Given these assumptions, I conclude that the best we can do is to
make compile-time violations constraint violations, certain calloc()
calls fail by returning NULL, and other attempts undefined behavior.

Other people have different assumptions. What thought have
you given to exploring or discovering alternatives that
might let different viewpoints co-exist?

There are different kinds of undefined behavior. Some cases are
things that are clearly errors that an implementation isn't required
to detect. In other cases, an implementation can reasonably define
and document a meaning, providing an extension. (The standard
doesn't make this distinction.) My intent is that the new cases
of UB are of the former kind, errors that needn't be diagnosed.
They'e UB rather than diagnosible errors simply because it's not
practical to diagnose them. (If C had exceptions ...)

I think I understand the distinction you're trying to make.
Assuming for the moment that I agree with your categorization
(and I'm not saying I don't only that I'm not sure), I don't
see how it makes any difference to (what I think is) the
basic issue.
[...]
Sorry, my comment was a little bit unclear here. What I mean
is uintmax_t has 64 _value_ bits (and I want it to have 64 value
bits for other reasons), but intmax_t/uintmax_t have 128 bits
of representation. Such an implementation is now conforming
(because of the undefined behavior loophole for signed integer
arithmetic). Do you think the Standard should be changed so
that such implementations _not_ be conforming? I can't think of
a good reason why the Standard should be changed to exclude
them.

No, I don't think the Standard should be changed to exclude them.
In practice, though, it's almost certain either that 64-bit integers
only require 64 bits of storage, or 128-bit integers can use all
128 bits of storage. In the rare counterexamples presumably there
are good reasons for having 64 padding bits, and you can't use them
as value bits anyway.

The point is that now implementations have the freedom to
exercise that choice. It's better to leave that freedom
in place, even 999 times out of 1000 an implementation
will make a different choice for that.

Yes, to the extent that that's possible. Implementations are
perfectly free to support arbitrarily large objects; I'm merely
suggesting that thay should define SIZE_MAX appropriately to
reflect this.

And I don't have any objection to making that statement as
a suggestion (or a Recommended Practice). I just don't think
it should be an irrevocable requirement.

Then that's where we disagree.



Not quite. The maximum object size can be much smaller than the
usable address space size. For example it's entirely plausible that
an implementation with a 64-bit address space might only support
objects up to 2**32 bytes. I mentioned that most current systems
SIZE_MAX is big enough to span the entire address space but that
wasn't meant to be prescriptive.

Once you allow implementations to have address spaces larger than
SIZE_MAX, you better be prepared to deal with objects larger than
SIZE_MAX, because there's no way the implementation can guarantee such
objects won't occur dynamically.

And what I'm suggesting is that SIZE_MAX should reflect (or be an
upper bound on) the maximum possible size of any object, because
that's what SIZE_MAX should mean.

I don't object to making that statement as a suggestion.
I do object to having it be an unconditional requirement.

Incidentally, there are several of your articles that I haven't
responded to, because it's going to require considerable time and
thought to do so. I've left them pending in my newsreader, and I
may or may not get around to responding to them eventually.

Well at least now I know they're thought provoking. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

size_t, ssize_t and ptrdiff_t 56
size_t in inttypes.h 4
The problem with size_t 45
return -1 using size_t??? 44
Plauger, size_t and ptrdiff_t 26
size_t and ptr_diff_t 9
size_t 18
finding max value of size_t 22

Members online

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,177
Latest member
OrderGlucea
Top