Null terminated strings: bad or good?

T

Tony

Keith Thompson said:
[...]

The original question, given in the subject header, was:

Null terminated strings: bad or good?

The answer, I think, is simpler than this lengthy thread might indicate:

Both.

"Both" is mostly true because the underlying implementation (at the compiler
level) is based on null-terminated strings. That may make good sense for a
close-to-the-hardware language, but I'm not sure that it doesn't complicate
things unnecessarily for the library level designer. So, we have low level
strings and some libraries that wrap and hide the ugly low level
implementation to some degree.

At the high level, I am keeping my 3 pointer implementation class that wraps
the null-terminated strings. The terminate-on-the-fly thing though, I'll let
go because it's just begging to make for buggy code.

Tony
 
D

David Thompson

<snip> In C, therefore, a character string degenerated to
simply, an array of chars. <snip>
Obviously, if we don't keep track of the length of the string, there
must be some way of finding the end, just as exec() must have a way of
knowing when to stop looking for arguments. You could use '$', as DOS
(and probably CP/M) did, and deserve all the criticism that has been
mentioned above, such as why should a string not be able to contain a
perfectly valid character. Or you could use a NUL character, set aside
by ASCII not to have any visible glyph for precisely the reason that C
uses it for: to be a non-character, as an end-of-data marker. I've

(All) ASCII controls are not (standard) defined to have a particular
glyph; that is not the same thing as being defined to have no glyph.
Only the Format Effectors are defined to have particular display
meanings, and even that doesn't preclude having a glyph as well. There
were contexts, e.g. datascopes, which (always) display all controls,
and quite a few terminals had 'display all' for debugging.

NUL was decidely not defined as end-of-data. It was specifically --
well, as specific as ASCII got -- defined to be added or removed with
no effect, and was commonly used as padding, e.g. after sending CR LF
to a Teletype or other mechanical terminal, or even after LF to some
early video terminals with time-consuming mem-to-mem scroll, unless it
was easy to do a timed delay instead. There were several _other_
defined terminators or delimiters: 03 ETX, 17 ETB and 04 EOT for
(some) transmission protocols; 19 EM for (some?) storage; 1C FS and
possibly some of its 'children' (1D GS, 1E RS, 1F US).
heard that in variable length arrays of integers, +/-9999 is used for
the same purpose. The NUL character has no use in a file containing
plain text, as it was designed NOT to be plain text.
Some arrays of numbers (float as well as integer) can use sentinels,
depending on the data to be stored there; +/-9999 doesn't seem
particularly likely in binary. All-9s of different lengths, unsigned
or signed, was fairly common as a dummy (including sentinel) value in
COBOL which standardly/defaultly uses decimal numbers.

NUL isn't often useful in a text _file_, although it could be filler
for thing(s) to be added later, or to replace deleted thing(s),
without moving the rest of the file, especially in plain unstructured
(Unix, DOS/Win, Mac) files where that becomes costly. But not all text
is in files; for various comms protocols and interfaces, and
non-structured media like papertape, NUL has often been useful.

And of course, as you go on to say (snipped) not all strings are text.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top