I can think of a few good reasons to have "string" mean a contiguous series
of bytes and a length. I have a hard time finding any value in having
"string" mean a contiguous series of bytes terminated by a null. Help me
with this please.
Tony
Seeing how long this thread has gone on, I thought I'd add my own
comment on Tony's question. C, as a language, was meant to be
extremely simple, fairly easy to learn or write a compiler for. The
types of objects supported by C are closely bound to the object types
of the underlying CPU. True, computers are built very differently, so
on some systems it makes sense to have 8-bit chars, 16-bit shorts and
ints, and, with some finagling, 32-bit longs. (For example, the 8086.)
On others, the int would be 32 bits. On a computer system with 18-bit
words, that would be the size of the int data type. And so on.
C, as a high-level language (ahem), also allows for 3 "compound" data
types. These are the array, struct, and union.
On many computers, a string is treated - at the assembly level - as an
array of bytes. In C, therefore, a character string degenerated to
simply, an array of chars. To do otherwise would be to add a new and
fairly specific compound *built-in* type, which would create exactly
the type of complexity the creators of C wanted to avoid.
As an example of this complexity, what is sizeof "Hello"? Do you
include the length of the string as part of the size of the object, or
not?
Obviously, if we don't keep track of the length of the string, there
must be some way of finding the end, just as exec() must have a way of
knowing when to stop looking for arguments. You could use '$', as DOS
(and probably CP/M) did, and deserve all the criticism that has been
mentioned above, such as why should a string not be able to contain a
perfectly valid character. Or you could use a NUL character, set aside
by ASCII not to have any visible glyph for precisely the reason that C
uses it for: to be a non-character, as an end-of-data marker. I've
heard that in variable length arrays of integers, +/-9999 is used for
the same purpose. The NUL character has no use in a file containing
plain text, as it was designed NOT to be plain text.
In cases where you do need to use the NUL character in a string (such
as the MAC scenario mentioned above, or when handling binary data),
you are perfectly free to: either (a) write your own string data type
and library, wherein the struct keeps track of the string's length, or
(b) keep track of the length on your own, and use the standard
libraries byte manipulation functions, designed precisely for the
situations in which the string functions are inappropriate, such as
the case where the last byte in this array is not a NUL character.
I believe this answers Tony's question, and deals with most issues
raised in this thread hitherto, except for the issue of the maximum
size of an object. On this, I will say only that I imagine the
standard allows implementations to set limits on the size of an
object, and if it doesn't, then perhaps the next version of the
standard will.
On the subject of trolls, etc., I have heard say that a man was nailed
a tree for saying how good it would be to be nice to people for a
change, so I'm not going to risk saying it. But when people (?) turn
to using foul language as a way of insulting one another, it implies a
last resort "pounding the table" argument, and subsequently lose the
respect of others they would have received were they able to contain
their anger.
Have an enjoyable day.
-- Marty Amandil
"Who is strong? One who overpowers his inclinations. As is stated
(Proverbs 16:32), 'Better one who is slow to anger than one with
might, one who rules his spirit than the captor of a city.'"
Shimon Ben Zoma, Ethics of the Fathers 4:1