Null terminated strings: bad or good?

CBFalconer · Jan 8, 2009

Keith said:
CBFalconer said:

Keith Thompson wrote:
... snip ...

In fact, the implementations I've seen do reject it (i.e., issue
a diagnostic and fail to process the translation unit), and I
believe that's the only reasonable behavior. But I don't see how
to justify it based on the normative wording of the standard.

Click to expand...

See my earlier answer, which was:

How about this. No exceptions are mentioned, thus it covers all.

6.5.3.4 The sizeof operator

... snip ...

Semantics

[#2] The sizeof operator yields the size (in bytes) of its
operand, which may be an expression or the parenthesized
name of a type. The size is determined from the type of the
operand. The result is an integer. If the type of the
operand is a variable length array type, the operand is
evaluated; otherwise, the operand is not evaluated and the
result is an integer constant.

Click to expand...

Click to expand...

How does this address my point?

Given the expression:

sizeof (char[SIZE_MAX][2])

the quoted definition for sizeof doesn't imply a constraint
violation or other error. It implies a contradiction *in the
standard*.

sizeof measures the size of objects, or the type definition that
will be used in constructing an object. Notice the first sentence
of para. 4 below.

6.2.4 Storage durations of objects

.... snip ...

[#4] For such an object that does not have a variable length |
array type, storage is guaranteed to be reserved for a new
instance of the object on each entry into the block with
which it is associated; the initial value of the object is
indeterminate. If an initialization is specified for the
object, it is performed each time the declaration is reached
in the execution of the block; otherwise, the value becomes
indeterminate each time the declaration is reached. Storage
for the object is no longer guaranteed to be reserved when
execution of the block ends in any way. (Entering an
enclosed block or calling a function suspends, but does not
end, execution of the current block.)

CBFalconer · Jan 8, 2009

Keith said:
.... snip ...

Nobody has missed that; you've been claiming it repeatedly. What
we're missing is your demonstration that it's a correct statement.

I assert that, in a conforming implementation, calloc(SIZE_MAX, 2)
may return a non-null pointer which points to the beginning of an
anonymous object whose size is SIZE_MAX*2 bytes. In my opinion, the
standard should be modified so that this is *not* possible, i.e., so
that an object larger than SIZE_MAX bytes is disallowed.

No. Remember that these things are all measured and specified by a
size_t variable. size_t is an unsigned integer, and follows the
wraparound rules for unsigned. Multiplying a size_t by a size_t
cannot ever result in a value larger than SIZE_MAX. It can result
in a much smaller value.

CBFalconer · Jan 8, 2009

Flash said:
CBFalconer wrote:
.... snip ...

I had not miss you claiming this. However, as was pointed out
last time you claimed this there is nothing in the C standard
saying it is an error *if* the implementation succeeds in
allocating a block of the correct size and returns a pointer to
it. Your quote of the definition of sizeof does not support your
claim for this since you cannot apply sizeof to the block.

See my reply to Keith Thompson. You can't legally get calloc to
create anything larger than SIZE_MAX, because you can't specify
such a value. The size_t operands are unsigned integers and follow
the rules for unsigned overflows etc.

CBFalconer · Jan 8, 2009

Richard said:
No, it accesses the single int at the start of it. And sizeof(*p)
is sizeof(int), not SIZE_MAX*2.

You have a point there. But calloc still has not created anything
with the allegedly specified size, because of unsigned arithmetic,
for one thing.

Richard · Jan 8, 2009

*p certainly does. calloc, if successful, returned a pointer to a
memory block. That was put in p. *p dereferences it, and accesses
the object.

No, it accesses the single int at the start of it. And sizeof(*p)
is sizeof(int), not SIZE_MAX*2.

-- Richard[/QUOTE]

This is getting silly. Is this some kind of "in" joke?

CBFalconer · Jan 8, 2009

Keith said:
.... snip ...

To put it another way:

sizeof <BLANK> == count*sizeof(int)

Fill in the blank in a manner that makes this expression true and
is directly relevant to the object allocated by calloc() in the
code above.

You haven't shown the reduction to range 0..SIZE_MAX, performed by
the unsigned arithmetic. All those operands are type size_t.

(sizeof <BLANK>) % (SIZE_MAX+1) ==
(count % (SIZE_MAX+1)) * (sizeof(int) % (SIZE_MAX+1))

and it may need another % (SIZE_MAX+1).

Keith Thompson · Jan 8, 2009

CBFalconer said:
Yes.

Remember that these things are all measured and specified by a
size_t variable.

What "variable"? Do you mean an object of type size_t? Or do you
mean a value of type size_t?

And where does the standard say that the size of an object must be
able to be represented as a value of type size_t? (Please don't quote
the standard's definition of sizeof again.)

size_t is an unsigned integer, and follows the
wraparound rules for unsigned. Multiplying a size_t by a size_t
cannot ever result in a value larger than SIZE_MAX. It can result
in a much smaller value.

When I wrote SIZE_MAX*2, I didn't intend the multiplication to be done
using C's unsigned wraparound semantics. I meant that the object has
a size in bytes twice as large as SIZE_MAX. For example, if
SIZE_MAX==65535, the object has a size of 131070 bytes.

Keith Thompson · Jan 8, 2009

CBFalconer said:
You haven't shown the reduction to range 0..SIZE_MAX, performed by
the unsigned arithmetic. All those operands are type size_t.

(sizeof <BLANK>) % (SIZE_MAX+1) ==
(count % (SIZE_MAX+1)) * (sizeof(int) % (SIZE_MAX+1))

and it may need another % (SIZE_MAX+1).

I didn't show it because it isn't relevant to the example. Assume
that all values are reasonably small, so that the multiplication
doesn't wrap around. Assume that sizeof(int) is no larger than 8, and
count is no larger than 100.

Repeating what I posted upthread:

Ok, let's consider another case, setting aside the huge allocation
issue for the moment:

int count = <some number>;
int *p = calloc(count, sizeof(int));

If calloc succeeds, it allocates an object whose size is
count*sizeof(int). What expression refers to that object? Note that
*p refers to an object whose size is sizeof(int); that's not the
object I'm talking about.

To put it another way:

sizeof <BLANK> == count*sizeof(int)

Fill in the blank in a manner that makes this expression true and is
directly relevant to the object allocated by calloc() in the code
above.

Tony · Jan 8, 2009

Rui Maciel said:
I believe the action performed by the strlen() routine is always present
in other string-type objects in some way or form,

The "form" (implementation) is what I was refering to: iteration from the
start of the string to the terminating nul.

Tony

Tony · Jan 8, 2009

Stephen Sprunk said:
If running strlen() is going to cause performance problems, keep track of
the length in a separate variable.

Getting the length of a string is such a common operation that it's
implementation should have be considered when designing a string library and
any associated structures.

Tony

Tony · Jan 8, 2009

Bartc said:
Some people naturally think of strings as short strings (words, names,
messages, textlines, filenames and so on). For this purpose, a
256-character limit is a bit tight but with perhaps 1K strings you can do
a lot of useful work.

But others naturally want to use strings to store anything, of any size,
including gigabyte-sized files.

So if you're writing libraries (or implementing languages), you have to
allow for these two extremes.

Probably with separate abstractions.

Tony

Tony · Jan 8, 2009

CBFalconer said:
Not at all. For example, do some minor modifications to ggets.c
and you have a routine to create a single string of a text file
(assuming adequate memory).

That's bizarre. file != string, IMO. YMMV.

Tony

Tony · Jan 8, 2009

Bartc said:
It sounds pretty slow too if the memory used by the string isn't /quite/
enough to put the zero in. And if you usually have spare bytes at the end,
you might as well always put a zero in there.

Always making sure there is one spare byte at the end doesn't sound bad to
do. How much it will clean up the code, well I'll have to get my nose back
in there and start simplifying and then see what the end result is. With the
extra byte always being available, there is no performance overhead.

Tony

James Kuyper · Jan 8, 2009

CBFalconer wrote:
....

You have a point there. But calloc still has not created anything
with the allegedly specified size, because of unsigned arithmetic,
for one thing.

I think you're mistakenly referring to the fact that SIZE_MAX*2 is
required to have a value of SIZE_MAX-1 (assuming that SIZE_MAX >
INT_MAX, which is not required, but usually true).
However, calloc(nmemb,size) is not defined as allocating an amount of
memory of size equal to nmemb*size. It's defined as allocating enough
memory to hold an array (a single object!) containing nmemb objects of
the specified size.

calloc(SIZE_MAX,2) is permitted to return NULL, but no conforming
implementation can have calloc() return a pointer to memory with
insufficient space to store SIZE_MAX objects, each of 2 bytes in size,
and that requires a lot more than SIZE_MAX*2 bytes of memory, if you
interpret SIZE_MAX*2 as being evaluated according to the rules of C,
rather than the rules of mathematics.

James Kuyper · Jan 8, 2009

CBFalconer said:
Keith Thompson wrote:
... snip ...

You haven't shown the reduction to range 0..SIZE_MAX, performed by
the unsigned arithmetic. All those operands are type size_t.

Such reduction is irrelevant to calculation of the size needed by a call
to calloc(). The memory needed is defined in terms of a count of
objects, and the size of the objects, not in terms of the result of
multiplying those two numbers.

James Kuyper · Jan 8, 2009

CBFalconer said:
You name it by prefixing the name of the pointer with a *.

That's not the name of the object, it's an lvalue expression that refers
to an object. However, that would be still be sufficient for my purposes
if it was in fact the entire object allocated by the call to calloc().
It is not. It is only an object of type char[2]. The entire object has
the type char[SIZE_MAX][2].

Most importantly, sizeof(*ptr) presents absolutely no problems; it's
value is 2, far smaller than the minimum value for SIZE_MAX, which is
65535 (7.18.3p2).

I order to use an argument based upon sizeof's requirements, to
constrain the behavior of calloc(), you have to apply sizeof to
something for which the value would be greater than SIZE_MAX, unless
calloc() returns NULL. So far, you have yet to identify that something.

For that matter, you've also failed to identify a problem that having
calloc() return NULL would solve. sizeof(*ptr) has the exact same value,
whether or not calloc() returned NULL.

James Kuyper · Jan 8, 2009

CBFalconer said:
Wojtek Lerch wrote: ....

This argument only arises because of the prototype of calloc.
Nowhere else is there any possibility of creating oversized objects
(considering the definition of size_t and SIZE_MAX).

No, the problem is quite independent of calloc(), as has been repeatedly
pointed out. It is equally impossible for sizeof to return the specified
value when applied to types that are too big.

Nor is it possible to apply sizeof to the object allocated by a call to
calloc() in such a way as to display this problem, without first
declaring a type for which sizeof(type) would be equally problematic.

James Kuyper · Jan 8, 2009

CBFalconer said:
Wojtek said:

CBFalconer said:

long array[SIZE_MAX];
I maintain that, whenever (sizeof (long) > 1), that is a compile
error.

Click to expand...

We know that you do, but we don't believe that you have
demonstrated that to be true.

Click to expand...

I have quoted the appropriate portion of the standard. Any other
interpretation involves a contradiction.

My interpretation of the relevant words implies that a conforming
implementation of C can

1. reject any declaration that refers to a type bigger than SIZE_MAX, as
exceeding an implementation limit.

2. have calloc(nmemb, size) return a non-null pointer to enough memory
to store an array nmemb objects of the specified size, even if
nmemb*size has a mathematical value that is greater than SIZE_MAX. It
will return sufficient memory for the specified number of objects of the
specified size, even though the amount of memory required is greater
than the value of nmemb*size, interpreted as a C expression rather than
a mathematical one.

Please demonstrate the contradiction that you see in that interpretation.

Wojtek Lerch · Jan 8, 2009

CBFalconer said:
Wojtek said:

CBFalconer said:

long array[SIZE_MAX];

Click to expand...

I maintain that, whenever (sizeof (long) > 1), that is a compile
error.

Click to expand...

We know that you do, but we don't believe that you have
demonstrated that to be true.

Click to expand...

I have quoted the appropriate portion of the standard. Any other
interpretation involves a contradiction.

No, you quoted an irrelevant portion of the standard. And you didn't even
explain whether by "compile error" you meant a constraint violation or
something else.

Wojtek Lerch · Jan 8, 2009

CBFalconer said:
sizeof measures the size of objects, or the type definition that
will be used in constructing an object.

No, sizeof measures the size of types. If the operand is a type, it doesn't
matter whether that type is used in constructing an object enywhere in the
program. If the operand is an expression, only its type matters; whether
the expression designates an object or not is irrelevant.

Working with NON-NULL terminated strings	4	Jul 14, 2007
Reading null terminated strings in Java	9	Feb 4, 2009
pointer to NULL terminated array of pointer	8	Aug 30, 2012
How to put a null check on this code	0	Jan 4, 2022
Using <algorithm> with null-terminated arrays	4	Dec 18, 2010
strncpy() and null terminated strings	4	Apr 8, 2004
Hello all! Noob here with completely unrealistic ambitions. Happy to join the crew and get good enough to help others.	4	Aug 13, 2024
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022

Null terminated strings: bad or good?

CBFalconer

CBFalconer

CBFalconer

CBFalconer

Richard

CBFalconer

Keith Thompson

Keith Thompson

Tony

Tony

Tony

Tony

Tony

James Kuyper

James Kuyper

James Kuyper

James Kuyper

James Kuyper

Wojtek Lerch

Wojtek Lerch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads