Null terminated strings: bad or good?

David R Tribble · Jan 8, 2009

Keith said:
Even assuming the environmental limit on object size applies to a type
for which no object is declared, that still doesn't imply that an
implementation *must reject* ``sizeof(char[SIZE_MAX][2])''.

In fact, the implementations I've seen do reject it (i.e., issue a
diagnostic and fail to process the translation unit), and I believe
that's the only reasonable behavior. But I don't see how to justify
it based on the normative wording of the standard.

It seems clear that SIZE_MAX was meant to be the largest
size of any declarable type, since any declarable type can
be an argument of sizeof(). However, it is reasonable to
say that SIZE_MAX was not meant to be taken as a limit
on actual object size. Indeed, it's been pointed out that
calloc() could be used to create objects much larger than
any declarable type (depending on the implementation).

You could probably make a case that the standard does
not place any upper bounds on object size, in fact, and
rightly shouldn't.

Would it be out of line to propose a new standard macro in
<limits.h> (e.g., VAR_SIZE_MAX) that specifies the largest
allocatable object size (independent of sizeof()) supported
by the implementation?

-drt

jameskuyper · Jan 8, 2009

Richard said:
James Kuyper said:

Well, yes, but it is *decreasingly* the case, human mortality being
what it is.

His wording doesn't imply that any one person has been using them for
that length of time, only that, for a long time, there have been
people who were using them.

Keith Thompson · Jan 8, 2009

Golden California Girls said:
Except that those gigabyte-sized files can't contain zero.

Text files typically don't contain null characters.

Amandil · Jan 8, 2009

I can think of a few good reasons to have "string" mean a contiguous series
of bytes and a length. I have a hard time finding any value in having
"string" mean a contiguous series of bytes terminated by a null. Help me
with this please.

Tony

Seeing how long this thread has gone on, I thought I'd add my own
comment on Tony's question. C, as a language, was meant to be
extremely simple, fairly easy to learn or write a compiler for. The
types of objects supported by C are closely bound to the object types
of the underlying CPU. True, computers are built very differently, so
on some systems it makes sense to have 8-bit chars, 16-bit shorts and
ints, and, with some finagling, 32-bit longs. (For example, the 8086.)
On others, the int would be 32 bits. On a computer system with 18-bit
words, that would be the size of the int data type. And so on.

C, as a high-level language (ahem), also allows for 3 "compound" data
types. These are the array, struct, and union.

On many computers, a string is treated - at the assembly level - as an
array of bytes. In C, therefore, a character string degenerated to
simply, an array of chars. To do otherwise would be to add a new and
fairly specific compound *built-in* type, which would create exactly
the type of complexity the creators of C wanted to avoid.

As an example of this complexity, what is sizeof "Hello"? Do you
include the length of the string as part of the size of the object, or
not?

Obviously, if we don't keep track of the length of the string, there
must be some way of finding the end, just as exec() must have a way of
knowing when to stop looking for arguments. You could use '$', as DOS
(and probably CP/M) did, and deserve all the criticism that has been
mentioned above, such as why should a string not be able to contain a
perfectly valid character. Or you could use a NUL character, set aside
by ASCII not to have any visible glyph for precisely the reason that C
uses it for: to be a non-character, as an end-of-data marker. I've
heard that in variable length arrays of integers, +/-9999 is used for
the same purpose. The NUL character has no use in a file containing
plain text, as it was designed NOT to be plain text.

In cases where you do need to use the NUL character in a string (such
as the MAC scenario mentioned above, or when handling binary data),
you are perfectly free to: either (a) write your own string data type
and library, wherein the struct keeps track of the string's length, or
(b) keep track of the length on your own, and use the standard
libraries byte manipulation functions, designed precisely for the
situations in which the string functions are inappropriate, such as
the case where the last byte in this array is not a NUL character.

I believe this answers Tony's question, and deals with most issues
raised in this thread hitherto, except for the issue of the maximum
size of an object. On this, I will say only that I imagine the
standard allows implementations to set limits on the size of an
object, and if it doesn't, then perhaps the next version of the
standard will.

On the subject of trolls, etc., I have heard say that a man was nailed
a tree for saying how good it would be to be nice to people for a
change, so I'm not going to risk saying it. But when people (?) turn
to using foul language as a way of insulting one another, it implies a
last resort "pounding the table" argument, and subsequently lose the
respect of others they would have received were they able to contain
their anger.

Have an enjoyable day.

-- Marty Amandil
"Who is strong? One who overpowers his inclinations. As is stated
(Proverbs 16:32), 'Better one who is slow to anger than one with
might, one who rules his spirit than the captor of a city.'"
Shimon Ben Zoma, Ethics of the Fathers 4:1

Wojtek Lerch · Jan 8, 2009

David R Tribble said:
It seems clear that SIZE_MAX was meant to be the largest
size of any declarable type, since any declarable type can
be an argument of sizeof().

To me it's not clear at all. The only thing that seems clear to me is that
the standard forgot to specify what exactly is forbidden and whether it's a
constraint violation or undefined behaviour (or something else, such as
implementation-defined). Perhaps the plan was to make it consistent with
how ptrdiff_t is specifed -- allow types and objects bigger than SIZE_MAX
(just like arrays with more than PTRDIFF_MAX elements are not forbidden),
but make the behaviour of sizeof undefined for types whose size cannot be
represented as a size_t?

However, it is reasonable to
say that SIZE_MAX was not meant to be taken as a limit
on actual object size. Indeed, it's been pointed out that
calloc() could be used to create objects much larger than
any declarable type (depending on the implementation).

Where does the standard say that char[SIZE_MAX][SIZE_MAX] is not a
declarable type?

CBFalconer · Jan 8, 2009

Tony said:
.... snip ...

Surely he was just showing the main gist of the null-terminated
string writing and didn't intend to suggest that the given
function was completed production code.

Don't forget that Richard the nameless is a troll.

CBFalconer · Jan 8, 2009

Keith said:
.... snip ...

The standard *does not say* that the sizeof operator can be used
to determine the size of any object. It can determine the size
of a type or of an expression. In particular, it cannot be used
(directly) to determine the size of an anonymous object, such as
one created by a call to calloc(). So the description of the
sizeof operator is irrelevant to the question.

Firstly, an object can be an expression. Simply name it.
Secondly, the object created by calloc is referencable. Simply
store the returned value in a pointer variable. Then *ptr
references it.

int i, *p;
...
i ... is an expression, consisting of the name i alone.
It has the value of whatever i has been set to.

p = calloc(SIZE_MAX, 2);
if (p) puts("calloc is bad");

CBFalconer · Jan 8, 2009

Wojtek said:
.... snip ...

As far as I can tell, the standard doesn't explicitly forbid
naming types larger than SIZE_MAX bytes, or even applying sizeof
to such types -- it just describes the semantics of sizeof in a
way that is logically impossible for such types.

Thank you. Precisely. Append the words 'to exist'.

Wojtek Lerch · Jan 8, 2009

CBFalconer said:
Firstly, an object can be an expression. Simply name it.

No, an object is an area of memory in the abstract machine. An expression
is a sequence of tokens in the source code of a program.

Secondly, the object created by calloc is referencable. Simply
store the returned value in a pointer variable. Then *ptr
references it.

Maybe; or maybe it only references a portion of it. That depends on the
size of the object that calloc created and the size of the type that ptr is
declared to point to. Those two sizes are not neccessarily the same.

Antoninus Twink · Jan 8, 2009

Firstly, an object can be an expression.

Wrong. An expression is a sequence of operators and operands. An object
is a region of data storage. Read the goddamn standard you spend so much
time pushing down people's throats.

int i, *p;
...
i ... is an expression, consisting of the name i alone.
It has the value of whatever i has been set to.

Let me be the first of many to tell you that
i ...
is a syntax error.

Even if you replace it by
i;
then since i is uninitialized your program invokes undefined behavior
and might format your hard disk. (Yeah, right...)

CBFalconer · Jan 8, 2009

Keith said:
.... snip ...

In fact, the implementations I've seen do reject it (i.e., issue
a diagnostic and fail to process the translation unit), and I
believe that's the only reasonable behavior. But I don't see how
to justify it based on the normative wording of the standard.

See my earlier answer, which was:

How about this. No exceptions are mentioned, thus it covers all.

6.5.3.4 The sizeof operator

... snip ...

Semantics

[#2] The sizeof operator yields the size (in bytes) of its
operand, which may be an expression or the parenthesized
name of a type. The size is determined from the type of the
operand. The result is an integer. If the type of the
operand is a variable length array type, the operand is
evaluated; otherwise, the operand is not evaluated and the
result is an integer constant.

Keith Thompson · Jan 8, 2009

CBFalconer said:
Keith Thompson wrote:
... snip ...

In fact, the implementations I've seen do reject it (i.e., issue
a diagnostic and fail to process the translation unit), and I
believe that's the only reasonable behavior. But I don't see how
to justify it based on the normative wording of the standard.

Click to expand...

See my earlier answer, which was:

How about this. No exceptions are mentioned, thus it covers all.

6.5.3.4 The sizeof operator

... snip ...

Semantics

[#2] The sizeof operator yields the size (in bytes) of its
operand, which may be an expression or the parenthesized
name of a type. The size is determined from the type of the
operand. The result is an integer. If the type of the
operand is a variable length array type, the operand is
evaluated; otherwise, the operand is not evaluated and the
result is an integer constant.

Click to expand...

How does this address my point?

Given the expression:

sizeof (char[SIZE_MAX][2])

the quoted definition for sizeof doesn't imply a constraint violation
or other error. It implies a contradiction *in the standard*.

Keith Thompson · Jan 8, 2009

CBFalconer said:
Firstly, an object can be an expression. Simply name it.

The object in question was created by a call to calloc(). It has no
name. And no, an object cannot be an expression; an expression can
designate to an object.

Secondly, the object created by calloc is referencable. Simply
store the returned value in a pointer variable. Then *ptr
references it.

int i, *p;
...
i ... is an expression, consisting of the name i alone.
It has the value of whatever i has been set to.

p = calloc(SIZE_MAX, 2);
if (p) puts("calloc is bad");

What expression refers to the object -- the entire object -- created
by calloc (assuming it returns a non-null pointer)? *p certainly
doesn't.

CBFalconer · Jan 8, 2009

Wojtek said:
.... snip ...

However, it is reasonable to say that SIZE_MAX was not meant to
be taken as a limit on actual object size. Indeed, it's been
pointed out that calloc() could be used to create objects much
larger than any declarable type (depending on the implementation).

Click to expand...

Where does the standard say that char[SIZE_MAX][SIZE_MAX] is not a
declarable type?

It says sizeof can return the size of a type. But it returns a
size_t, which has a maximum value of SIZE_MAX. This requires that
the declaration be an error, or at least unusable. The latter
means that declaring such an object, or attempting to create it
with calloc, is an error.

CBFalconer · Jan 8, 2009

Flash said:
CBFalconer wrote:
.... snip ...

<snip>

Is irrelevant since there is no way to make the object created by
calloc the operand of sizeof. I'm sure this point has been made
already but you seem to have missed it.

No, you have missed that if calloc returned a non-NULL, calloc is
in error.

Wojtek Lerch · Jan 8, 2009

CBFalconer said:
Thank you. Precisely. Append the words 'to exist'.

I was going to say "to implement for such types".

The words of the standard specify semantics of sizeof in a way that is a
logical impossibility when the size of the operand is greater than SIZE_MAX.
Conceivably, there are many possible ways to remove the illogicall
requirement from the standard:

#1 Say that the behaviour is undefined when the operand of sizeof is larger
than SIZE_MAX bytes
#2 Say that the result is unspecified (or maybe implementation-defined) when
the operand of sizeof is larger than SIZE_MAX bytes
#3 Forbid expressions where the operand of sizeof is larger than SIZE_MAX
bytes
#4 Forbid expressions where the operand of sizeof is larger than SIZE_MAX
bytes, along with any other uses of such types
#5 Forbid expressions where the operand of sizeof is larger than SIZE_MAX
bytes, along with any other uses of sizeof
#6 Forbid expressions where the operand of sizeof is larger than SIZE_MAX
bytes, along with any other uses of expressions

I don't see any reason to pick #4 over #1, #2, or #3. Personally, I find #4
unnecessarily restrictive and almost as arbitrary as #5 or #6.

Richard Tobin · Jan 8, 2009

Keith Thompson said:
Text files typically don't contain null characters.

I wish. I use a mail reader written, er, some years ago, and many of
its assumptions have become less valid over time. One particular
problem is that I now often receive supposedly textual mail messages
(invariably spam) containing null characters, which the code assumes
won't happen.

Of course, the "typically" in the quoted text makes it true, but you
should think twice about writing code that relies on it.

-- Richard

Wojtek Lerch · Jan 8, 2009

CBFalconer said:
long array[SIZE_MAX];

Click to expand...

I maintain that, whenever (sizeof (long) > 1), that is a compile
error.

We know that you do, but we don't believe that you have demonstrated that to
be true.

CBFalconer · Jan 8, 2009

perhaps this is a bit of playing with definitions, but I
sometimes have to manipulate byte streams (or octet streams if
I'm feeling really pedantic) and I tend not to think of them
as "strings".

Don't forget that there is no 'string' type. Strings are a
particular use of char arrays, which are a type.

Keith Thompson · Jan 8, 2009

CBFalconer said:
No, you have missed that if calloc returned a non-NULL, calloc is
in error.

Nobody has missed that; you've been claiming it repeatedly. What
we're missing is your demonstration that it's a correct statement.

I assert that, in a conforming implementation, calloc(SIZE_MAX, 2) may
return a non-null pointer which points to the beginning of an
anonymous object whose size is SIZE_MAX*2 bytes. In my opinion, the
standard should be modified so that this is *not* possible, i.e., so
that an object larger than SIZE_MAX bytes is disallowed.

In practice, if an implementation exists for which calloc(SIZE_MAX, 2)
can succeed, the problem is not in the implementation's calloc() but
in its definition of SIZE_MAX; the implementation *should* IMHO make
size_t large enough to hold the largest possible size of any object
that can be created. But I see nothing in the standard that directly
supports this. (The strlen() argument is almost convincing, but it's
uncomfortably indirect.)

And, in practice, I doubt that an implementation exists on which
calloc(SIZE_MAX, 2) returns a non-null pointer. If there is no such
implementation, that would make it easier to revise the standard to
disallow it.

Working with NON-NULL terminated strings	4	Jul 14, 2007
Reading null terminated strings in Java	9	Feb 4, 2009
pointer to NULL terminated array of pointer	8	Aug 30, 2012
How to put a null check on this code	0	Jan 4, 2022
Using <algorithm> with null-terminated arrays	4	Dec 18, 2010
strncpy() and null terminated strings	4	Apr 8, 2004
Hello all! Noob here with completely unrealistic ambitions. Happy to join the crew and get good enough to help others.	4	Aug 13, 2024
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022

Null terminated strings: bad or good?

David R Tribble

jameskuyper

Keith Thompson

Amandil

Wojtek Lerch

CBFalconer

CBFalconer

CBFalconer

Wojtek Lerch

Antoninus Twink

CBFalconer

Keith Thompson

Keith Thompson

CBFalconer

CBFalconer

Wojtek Lerch

Richard Tobin

Wojtek Lerch

CBFalconer

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads