Perhaps strings should be akin to width-specified integers:
string16 (a string with up to 65536 chars)
string32 ... etc.
This is almost not a bad idea, but the major problem with it is
string16 and string32 are now different types, when they arguably
should not be. Are there special rules for converting from string16 ->
string32? What if you want to pass a string8 to a function that
expects a string32? What if the function modifies the string through
it's parameter? Then the string8 -> string32 conversion must make
changes to the string32 visible in the string8. What if you want to
convert a string32 to a string16? Should it do a run-time check and
fail if the string32's length is larger than 65535?
"These are normally known as "C-style" strings. The main advantage is
the length of the string is limited only by available memory, and the
length field is not stored with the string, thus conserving storage
space."
The "main advantage" above, is actually a disadvantage. It causes
programmers to write code that is succeptible to buffer overrun attacks.
Storing the length with the string does not protect against buffer
overrun attacks.
Storage space conservation? Only in the exceptional case nowadays.
Not on embedded systems. The ATMega168 I have sitting on my desk right
now has 512 bytes of read-only storage for data, and 16kB more
available for program + data. Every byte counts. These are not
exceptional cases; it's fairly common hardware, albeit somewhat
specialized.
""removing"
the prefix of a string can be done simply by referring to a location
past the beginning,"
That operation is the same in the "Pascal-type string also, but then the
length has to be updated. No big deal.
No. This is not correct. Consider this function, which takes as input
a full path name (e.g. "c:\\windows\\notepad.exe") and prints the last
path component ("notepad.exe"), without modifying the input.
void print_file_name (const char *fullpath) {
const char *filename = strrchr(fullpath, PATH_SEPARATOR);
printf("file name only: %s\n", filename ? filename : fullpath);
}
I challenge you to implement that with counted strings without leaking
memory or allocating new memory. As a handicap, you may assume that a
function "last_index_of(counted_string, char)" exists that returns the
index of the last occurrence of the character in the specified
counted_string (say it returns -1 if not found).
"dividing strings into substrings can be done by
placing 0's where appropriate. As an exercise, try implementing strtok
() with Pascal-style strings. You may be surprised at the difficulty."
Well one function is an exceptional case. The rule is to program for the
common case and make special things as required rather than complicate the
general case.
It's one function because I didn't list all the others. How about
strrchr, strchr, strstr? Not to mention that strtok is not an uncommon
function.
Additionally, if the rule is to program for the common case, it seems
that null-terminated strings satisfy that rule quite well, as you can
already see. Counted strings would complicate the general case. Not
that counted strings aren't useful in certain cases, but I would
assume they are useful in the minority based on the success I've had
with null-terminated strings so far.
"The main disadvantage of C-style strings is computing the length is O
(n)"
I'd say there are a FEW issues and that is just one of them.
Please elaborate.
And this, of course mitigates that issue:
", but applications that need to reduce this to constant time can
easily do so by storing the length elsewhere, if they need it."
Jason