I can think of a few good reasons to have "string" mean a contiguous
series
of bytes and a length.
"These are sometimes known as "Pascal-style" strings. The main issue is
the length of the string is limited by the maximum value that can be
stored in the length field; in Pascal, it was a single byte, limiting
strings to 255 characters. There are other variants that are in use
and you may run into in C, for example, the Windows API defines a
"BSTR" type, which consists of a 4-byte length field followed by
string data, the pointers you deal with point to the start of the data
(4 bytes after the start of the allocated block)."
Perhaps strings should be akin to width-specified integers:
string16 (a string with up to 65536 chars)
string32 ... etc.
I have a hard time finding any value in having
"string" mean a contiguous series of bytes terminated by a null.
"These are normally known as "C-style" strings. The main advantage is
the length of the string is limited only by available memory, and the
length field is not stored with the string, thus conserving storage
space."
The "main advantage" above, is actually a disadvantage. It causes
programmers to write code that is succeptible to buffer overrun attacks.
Storage space conservation? Only in the exceptional case nowadays.
"Another major advantage to storing null-terminated strings is the
strings can be modified in place with minimal effort; truncating a
string is a matter of simply setting the new end byte to 0,"
As if changing the length field was harder to do?
""removing"
the prefix of a string can be done simply by referring to a location
past the beginning,"
That operation is the same in the "Pascal-type string also, but then the
length has to be updated. No big deal.
"dividing strings into substrings can be done by
placing 0's where appropriate. As an exercise, try implementing strtok
() with Pascal-style strings. You may be surprised at the difficulty."
Well one function is an exceptional case. The rule is to program for the
common case and make special things as required rather than complicate the
general case.
"The main disadvantage of C-style strings is computing the length is O
(n)"
I'd say there are a FEW issues and that is just one of them.
", but applications that need to reduce this to constant time can
easily do so by storing the length elsewhere, if they need it."
Tony