Paul said:
jacob navia wrote:
Lew Pitcher wrote:
[snip]
What the OP complains about (his direct complaint) is the result of a
failure to validate, and that can happen in any language.
Yes bugs can happen in any language.
I believe misinterpreted NUL termination is unique to the C language.
Prevalent, but not unique. I seem to recall an .ASCIZ directive
or equivalent in more than one assembler. And remember the CP/M I/O
functions that used '$'-terminated strings? (Ghastly, they were.)
Are you doing a Colbert/Borat here? That doesn't even need a response
as it clearly makes *MY* case, not yours. (Assembly is not a
"language" and I have never seen an assembler that came with a
standard library that told you how to encode strings.)
But in any case, what's special about strings?
When they are easy to work with, there's nothing special about them.
What's special about them in this case is the incompetent support for
them in the standard C library. fgets() cannot cleanse or even notice
a '\0' in the strings it reads, while the rest of the library does. I
have not delved into this bug in detail, but obviously that's a prime
candidate for what went wrong here.
[...] If using a sentinel to terminate a string is a Bad Thing,
Nice straw man. You can force any design to work, its a question of
the kind of support you give to make it work *WELL*. My library, for
example, uses both length limits and \0 termination in order to allow
for a smooth transition between each kind of string. It relies on the
fact that the intersection of the two is rich enough to satisfy
anyone, and when you need to allow for contained '\0's you can just
ignore the terminator semantics (since all the functions alway ignore
this redundant terminator as well).
[...] why isn't it also a Bad Thing to
use a sentinel to terminate other sequences? Linked lists, for example:
Do you stop traversing when you find a NULL link (or an "I'm the end"
bit), or do you stop when a node counter tells you to?
So just to make it clear that I am not associated to this straw man --
the issue with strings are: 1) storage 2) aliasing content with meta-
data (i.e., \0 occupies a character position, but its not a
character.) 3) ability to deal with sub-strings.
If you decided to make strings a linked list of characters as you
suggest, you would not have a pointer to a dynamically allocated node
which contained a \0 (unless you are idiot) but instead just a pointer
to NULL (or some other sentinel value.) You would also be laughed at
for doing so; the overhead cost is way too high.
String in C are not just a sequence with a terminator -- they are
*ARRAYS* of characters. (Other string libraries like Vstr have
dropped this semantic.) In C the length of an array has to exist for
it to be sound. The bug in question happened because the amount of
data read and strlen() somehow didn't match up. With explicitly
length delimited strings, obviously that sort of thing cannot happen.
Do you use '\n' to mark the division between one text line and
the next, or do you attach a length to every line, or make every line
the same length? (Both latter strategies, by the way, are in actual
use in actual file systems.)
How is using a delimiter in any way comparable to fixed line limits?
Would C be a better language if it eliminated ; as a statement
terminator and instead used a count of characters or tokens?
Exactly who do you think you are fooling with this straw man? Do you
seriously think I am advocating the elimination of the ; in C?
There's nothing inherently wrong with sentinels, and there are
only two things you need to keep in mind:
- Don't see a sentinel where none exists, and
- Don't keep moving when the sentinel hollers "Stop!"
Tech07's original response to you was dead on target, even though he
seemed to back off for some reason.
A sentinel can be and is best used as meta-data, not data. ASCII and
UNICODE both list \0 as the well defined control character NUL.
Neither standard demands how such a character is to be used. In fact
Unicode sees absolutely no distinction between the characters 0
through 8 inclusive. C's imposition of \0 as data with meta-data
meanings is just that -- an imposition.
It can be made to work, but you have to be diligent in unifying the
array-like and terminator-like semantics of strings in the library.
That clearly is not the case in the C standard library (specifically
the fges() function as the most obvious candidate for this failure.)