Paul said:
Paul Hsieh wrote:
jacob navia wrote:
Lew Pitcher wrote:
[snip]
What the OP complains about (his direct complaint) is the result of a
failure to validate, and that can happen in any language.
Yes bugs can happen in any language.
I believe misinterpreted NUL termination is unique to the C language.
Prevalent, but not unique. I seem to recall an .ASCIZ directive
or equivalent in more than one assembler. And remember the CP/M I/O
functions that used '$'-terminated strings? (Ghastly, they were.)
Are you doing a Colbert/Borat here? That doesn't even need a response
as it clearly makes *MY* case, not yours. (Assembly is not a
"language" and I have never seen an assembler that came with a
standard library that told you how to encode strings.)
"Assembly is not a `language'" suggests that you have a
notion of "language" that is foreign to me. People write
assembly code, read assembly code, and debug assembly code
exactly as they write, read, and debug C or Lisp.
Exactly as? I think not. In assembly, you don't have a "model" for
your machine, you have *your machine*. You think of your machine as a
massive bit-state, in which you simply drive the bit-state instruction
by instruction. Because that's what it actually is. Your level of
abstraction is to think of collections of these bits to hold values
that you will eventually use to interact with I/O devices.
In HLLs and to some degree in C, you think in terms of data
structures. And in fact, there are many parts of the machine you just
have no clue about because you are not supposed to know about them.
For example, at what point do you run out of memory in C or Lisp?
There is no sensible way to even address the question, whereas in
assembler its pretty much always clear the amount of memory available
to you, the amount you are using, and thus when you will run out.
As for the existence of a library, no: Assembly languages
themselves seldom come with such libraries. Fine. But why
did the assembler creators expend the effort to implement and
document an .ASCIZ directive?
Maybe because the C language reared its head and some people decided
that assemblers should be functionally inter-operable with it? You
might also ask why Intel made the 286's version of the loadall
instruction (now deprecated) or the laughable instruction: "POP
SS" (which will retrieve a new stack segment but will not reset the
stack pointer offset; D'oh!) They are engineers, not gods.
[...] They did it because they thought
zero-terminated strings would be common enough that it was
worth while to offer a convenient shortcut for creating them,
that's why.
You are arguing the fact that an assembler *can* support zero-
terminated strings (when they typically are never used by assembly
programs at all), versus C which practically *requires* that you do.
An assembler doesn't have to justify all its support extensions
because it doesn't dictate what the programmer is really going to do.
The C language library is the standard interface to I/O and it encodes
string in only one way (and even then inconsistently: fgets()).
[...] In other words, the libraries that used zero-
terminated strings were not part of the assembler and its
language per se, but were anticipated to be a significant part
of the environment.
Assembly language designers do not get to "anticipate" anything. The
CPU vendor makes an instruction set based on the market, academic
feedback, customer or some crazy designer ideas (read: itanium) and
the assembler just gets to encode it, with a few extra bells and
whistles to help the programmer.
The existence of an artifact proves that the artificer
thought -- rightly or wrongly -- that the artifact would
serve a purpose.
No, it proves that it *MIGHT* or is *AVAILABLE* to serve a purpose.
They don't get to tell developers when or if they should use it. As an
example: Intel learned this quite dramatically with the 80386 ISA
which had massive instruction set support for multi-tasking. The
instructions went largely unused beyond the very minimum support
required to build multitasking models totally in software. (AMD64
dropped support for these extraneous instructions in 64 bit mode
altogether.)
[...] The existence of an .ASCIZ directive proves
that somebody thought -- rightly or wrongly -- that programmers
would use it, and this proves that somebody thought -- rightly
or wrongly -- that his assembler would be used in conjuction
with code that manipulated zero-terminated strings. And not in
C, which was the point of the digression: To refute the assertion
that zero-terminated strings (and mistakes made therewith) are
somehow C-specific.
You argue in a space of pure delusion. In your delusion you require
that assembler is a language and the potential uses of zero terminated
strings in assembler are *ACTUAL* uses of zero terminated string in
real programs written in that assembler. Without, of course, any need
to produce an example of any such program (which would vacate any
pedantic point about your lack of discernment; but you've goose egged
there too.)
[...]
If you decided to make strings a linked list of characters as you
suggest, you would not have a pointer to a dynamically allocated node
which contained a \0 (unless you are idiot) but instead just a pointer
to NULL (or some other sentinel value.) You would also be laughed at
for doing so; the overhead cost is way too high.
It wasn't a suggestion or a f'rinstance, it was a report
of an actual SNOBOL3 implementation I used around 1970. It
struck me as wasteful, too -- and this was in the days when
memory was scarce and waste could not be ignored as we so
casually do today. But it ran, it worked -- and it was real,
not some kind of thought experiment.
And did they allocate special nodes at the end of each string with the
contents of a 0 or some other terminator in it?
[...]
A sentinel can be and is best used as meta-data, not data. ASCII and
UNICODE both list \0 as the well defined control character NUL.
Neither standard demands how such a character is to be used. In fact
Unicode sees absolutely no distinction between the characters 0
through 8 inclusive. C's imposition of \0 as data with meta-data
meanings is just that -- an imposition.
C's insistence that pointer value 0 (not to be confused
with all-bits-zero) be given special treatment is equally an
imposition, and equally artificial.
Equally???? What should the default or "no-value" contents of a
pointer be? Alternate systems don't make any sense. With C strings,
length delimiting is an extremely obvious alternative. Furthermore
having and using a value of '\0' makes just as much sense for a
character as any other. With pointers, you cannot *USE* the value on
the other side of NULL unless only *ONE* pointer uses it. Which is
insane for anything but the very simplest programs.
[...] Much hardware that runs
C is perfectly capable of using pointer-0 to access memory;
it's a perfectly valid memory location and the hardware can
form perfectly valid pointers to it. The hardware is often
set up in such a way that the attempted access will trap (not
always, as witness the recent thread about gcc optimizations
and the Linux kernel), but this is just to help debug buggy
C code. (Non-buggy C code never attempts the access, and so
doesn't care what would happen if it did.)
Ah. You have Heathfield disease. You never have to care about buggy
code. Given your level of analysis shown here, that must be wishful
thinking on your part.
Pointer-0 is no less an "imposition" than character-0:
both are examples of C attaching special meaning to a value
that is in not inherently special. A pointer-0 at the end of
a list and a character-0 at the end of a string are morally
equivalent, both matters of convention and not of necessity.
Tech07 has characterized your understanding of computer programming
quite accurately I see. Is this just senility or have you always been
this shallow?
Write up the code for a string as a linked list and see if you can't
tell the difference between a 0-character-terminator, and a NULL-link
terminator.