Zero terminated strings

spinoza1111 · Aug 2, 2009

spinoza1111said:

Gareth Owen wrote:
struct string
{
unsigned len; /* possibly size_t rather than unsigned */
char str[];
}
How do you pass these on the stack as arguments to a function?
How are you going to pass a 10MB string on the stack anyway? If
your

Click to expand...

Click to expand...

Duh... isn't char str[] == char *str,

Click to expand...

Only when it is a formal parameter in a function declaration.

In the above, however, str is a "flexible array member" of the
structure type. This syntax was introduced in C99.

Thanks for the update, Dickie boy. But does that imply that when a
struct containing str[] is pushed on the stack, the array and not a
pointer to it is stacked? Did you guys REALLY mung C in addition to
trying to destroy Schildt?

Call by value implies that the array IS copied. But in the old days
when men were men, and the sheep were nervous, and char str[] =='ed
char *str, call by value implied that the ADDRESS represented by str
should be stacked in by value mode. I rilly hope this is still the
case. Don't tell me your Holy Standard requires the runtime to copy
monster arrays onto the stack.

Word to the wise, also, Dickie-boy. Before you pontificate again about
programming, sit down and write the compiler and runtime for a subset
of C, like Herbie boy. Memorizing the standard might be a good way to
win pub bets and bully people here, but you never seem to explain
things properly, because you can't envision how things are handled.

You can't envision a stack filled with junk passed by value, or if you
can, this doesn't bother you because youse don't got no taste or
culchah.

Richard Bos · Aug 2, 2009

jacob navia said:
You are misquoting. I as trying to FIGHT against that attitude
and I quote it on an ironic tone.

You should realise by now, jacob, that you are _not_ good at sarcasm,
irony, or anything a l'anglaise like that.

You are just misquoting me in bad faith (as always)

....ok, I take it back. You _are_ good at irony, just not when you intend
to be ironic.

Richard

Richard Bos · Aug 2, 2009

Tech07 said:
Not hardly. That you can't see the difference of concerns between terminated
character strings and a linked list node using a null or sentinel to
indicate end-of-list or beg-of-list behooves you to do some homework and
write an essay about it, IMO.

Don't you think that, if you make a statement like "null-terminated
strings are bad", it is _your_ responsibility to explain why you think
so, rather than Eric's to find out why you think two kinds of null
terminators are so essentially different?

THAT is your homework assignment! (Part of it anyway).

No. _You_ want to convince; _you_ do the homework.

Richard

spinoza1111 · Aug 3, 2009

spinoza1111said:

spinoza1111said:
Gareth Owen wrote:
struct string
{
unsigned len; /* possibly size_t rather than unsigned */
char str[];
}
How do you pass these on the stack as arguments to a function?
How are you going to pass a 10MB string on the stack anyway? If
your
Duh... isn't char str[] == char *str,
Only when it is a formal parameter in a function declaration.
In the above, however, str is a "flexible array member" of the
structure type. This syntax was introduced in C99.

Click to expand...

Click to expand...

Thanks for the update, Dickie boy.

Click to expand...

It seems you still have trouble spelling my name. If you can't get
that right, you're really going to struggle with programming.

"Lighten up, Frances"

But does that imply that when a
struct containing str[] is pushed on the stack, the array and not a
pointer to it is stacked?

Click to expand...

No, it doesn't imply that. Nor does it imply that a pointer is
"stacked". Nor does it imply that the implementation uses a stack.

You suck as a teacher, Frances. How would you like it if your kids
went to a school where the teacher said what was NOT true, because
she'd memorised one book only and knew nothing? Thus you appear to be.

If a structure is an argument expression in a function and the
structure has a flexible array member, the flexible array member is
not copied to the parameter.

Wow, that's a load off my mind. So is it a pointer, Frances?

Did you guys REALLY mung C in addition to
trying to destroy Schildt?

Click to expand...

Mu squared.

Call by value implies that the array IS copied. But in the old days
when men were men, and the sheep were nervous, and char str[] =='ed
char *str,

Click to expand...

That would be never, then, except in a formal parameter list.

call by value implied that the ADDRESS represented by str
should be stacked in by value mode.

Click to expand...

In the "old" days, C didn't have flexible array members, so the
question is meaningless.

Oh, a new incompatible data type! Are we talking about the same
language, Frances?

How many people actually use these features? houldn't this language
have a new name, such as M for Mung? And why on EARTH, Frances, do you
think you belong here if you insist on babbling about a Standard that
was on the face of it public relations, and not about C as understood
(K&R and K&R2)?

It's not holy. If you don't want anyone to tell you what's in it,
that's up to you.

Word to the unwise: learn to get people's names right.

Lighten up, Frances.

Phil Carmody · Aug 3, 2009

Mark McIntyre said:
No, badly coded browsers are a continuing security nightmare.

It looked like the signing agency were just as guilty of using
badly-coded software.

Nobody I know of particularly trusts their browser. Almost everyone
I know of has to trust signing agencies.

What have bad browser implementations got to do with C, you ludicrous troll?

It's more likely the (program written in the) language which accepted
the NUL in the middle of a string which is the root of the problem, so
that would be a language with counted strings...

Phil

Phil Carmody · Aug 3, 2009

Richard Heathfield said:
Flash Gordon said:

Richard said:

(e-mail address removed) said:

<snip>

Consider the simple C statement:

char s[10]="abc\0def";

You can't write a C compiler using normal C style strings to
compile that.

Why on earth not? (Incidentally, I compiled it just fine using gcc
just now.)

Click to expand...

He said *write* a compiler, not compile with a compiler.

Click to expand...

Well, he actually said "write a compiler to compile that". GNU has
written such a compiler. It's not clear to me what his point is.

If the compiler used terminated C strings alone, and stored the
literal strings that it wants to emit into the object or binary
files using such terminated C strings, then it would not be able
to correctly emit all 8 bytes, as it would terminate after the
4th.

(_Unless_ it stored the string in escaped format until the moment
before character-by-character output.)

However, that's not surprising. As a compiler, it needs control
as well as data, and it can't just use in-band control, as the
language leaves no additional room for in-band control (unless it
keeps the strings escaped internally). Therefore it must use
out-of-band control. But that's purely because it is a compiler
for the language.

Phil

Phil Carmody · Aug 3, 2009

James Kuyper said:
Gareth said:

Thats very interesting. Do you know if the standard mandates that
behaviour?
Must std::string be able to cope with embedded NULLs?

Click to expand...

NULL is a macro. I think you're talking about nulls, not NULLs.

<OT stuff about C++>

The C++ standard does not explicitly say that it must cope with
embedded null characters, it simply describes a counted-string
implementation while failing to attach any special significance to
null characters, leading to the conclusion that they must be treated
as any other character.

There are some exceptions, but it's hard to identify them, because
std::string<> is a specialization for char of std::basic_string<charT,
traits> where charT can be any literal type(21p1), as defined in
3.9p10, and all the documentation is in terms of
std::basic_string<>. For charT==char, charT() is '\0'. Therefore, you
have to look for statements referring to charT(). traits::length(p)
yields the smallest i such that traits::eq(p, charT()) is true; for
traits==std::char_traits<char>, traits::eq(a,b) is simply a==b, and,
the result is therefore the same as you would get from strlen(p).

Oh my god. So the number of characters output by << may be different
than the object's length? That sounds like a security nightmare just
waiting to happen, if someone were to embed a '\0' into a std::string,
for example.

std::basic_string<charT,traits> uses traits::length() only in those
member functions that take charT* arguments, to determine the length
of the stings pointed at by those arguments. Null-termination plays no
role in the internals of the class.

Click to expand...

So the implementation won't do anything stupid, apart from permitting
sloppy programmers to do stupid things. That's an improvement over C
how?

Phil

Phil Carmody · Aug 3, 2009

Richard Heathfield said:
spinoza1111 said:

Gareth Owen wrote:

struct string
{
unsigned len; /* possibly size_t rather than unsigned */
char str[];
}

How do you pass these on the stack as arguments to a function?

How are you going to pass a 10MB string on the stack anyway? If
your

Click to expand...

Duh... isn't char str[] == char *str,

Click to expand...

Only when it is a formal parameter in a function declaration.

In the above, however, str is a "flexible array member" of the
structure type. This syntax was introduced in C99.

<snip>

You forgot your "diddly squat" line, or whatever it was. Strange,
as I think this post was probably the one that most deserved it!

Phil

James Kuyper · Aug 3, 2009

Something in my newsreader caused two lines to be merged into one:

That should have displayed as:

~/code$ ./NulTermStr
Paypal.com.badguy.com

That fact is very relevant to the discussion below.

Paypal.com
Thats very interesting. Do you know if the standard mandates that
behaviour?
Must std::string be able to cope with embedded NULLs?

Click to expand...

NULL is a macro. I think you're talking about nulls, not NULLs.

<OT stuff about C++>

The C++ standard does not explicitly say that it must cope with
embedded null characters, it simply describes a counted-string
implementation while failing to attach any special significance to
null characters, leading to the conclusion that they must be treated
as any other character.

There are some exceptions, but it's hard to identify them, because
std::string<> is a specialization for char of std::basic_string<charT,
traits> where charT can be any literal type(21p1), as defined in
3.9p10, and all the documentation is in terms of
std::basic_string<>. For charT==char, charT() is '\0'. Therefore, you
have to look for statements referring to charT(). traits::length(p)
yields the smallest i such that traits::eq(p, charT()) is true; for
traits==std::char_traits<char>, traits::eq(a,b) is simply a==b, and,
the result is therefore the same as you would get from strlen(p).

Click to expand...

Oh my god. So the number of characters output by << may be different
than the object's length? ...

Only when using it to print with char* pointers, such as the one
returned by std::string::c_str(). With such pointers, it falls back to
the C-like behavior of treating the pointer as pointing to the first
character of a null-terminated string - it really has no other choice,
since it has no other way to determine the end of the string.

However, notice in the above output that when he used << directly on the
std::string object, it output the entire contents of the string,
presumably including the non-printing '\0' character.

Nobody · Aug 3, 2009

It looked like the signing agency were just as guilty of using
badly-coded software.

AFAICT, it's a case of badly-specified software.

The lowest level of the DER certificate format is ASN.1, which allows
embedded NULs, so it's not the fault of whoever wrote the low level I/O
code.

Embedded NULs are also valid within strings in the DNS protocol,
although they aren't valid within domain labels.

My guess is that higher-level code parses a domain name into components
using a rule along the lines of "any sequence of characters except a dot",
when it should have tested for membership in a specific set.

This is a common flaw in regexp-based parsers; occurrences of "." or
"[^...]" in a regexp almost invariably match things that they shouldn't be
matching. This may not being a problem if you're doing search-and-replace
on your own files, but it's often disastrous for validating untrusted data.

Flash Gordon · Aug 3, 2009

Nobody said:
AFAICT, it's a case of badly-specified software.

No, I think it is badly designed or written software. I'm sure the
requirement specification (or equivalent) for the software referred to
the RFC for DNS (either implicitly or explicitly) and said that invalid
requests should be rejected.

The lowest level of the DER certificate format is ASN.1, which allows
embedded NULs, so it's not the fault of whoever wrote the low level I/O
code.
Agreed.

Embedded NULs are also valid within strings in the DNS protocol,
although they aren't valid within domain labels.

Hmm. I'm mildly surprised by that, but can accept it.

My guess is that higher-level code parses a domain name into components
using a rule along the lines of "any sequence of characters except a dot",
when it should have tested for membership in a specific set.

This is a common flaw in regexp-based parsers; occurrences of "." or
"[^...]" in a regexp almost invariably match things that they shouldn't be
matching. This may not being a problem if you're doing search-and-replace
on your own files, but it's often disastrous for validating untrusted data.

Personally I'm not sure I would use something as complex as a regexp
parser for passing domain names. The requirements are so simple I think
it is easier to express it in simple code (with maybe a few useful
little library functions similar to ones I already have).

Paul Hsieh · Aug 3, 2009

Paul said:
Paul said:

jacob navia wrote:
Lew Pitcher wrote:
[snip]
What the OP complains about (his direct complaint) is the result of a
failure to validate, and that can happen in any language.
Yes bugs can happen in any language.

Click to expand...

Click to expand...

I believe misinterpreted NUL termination is unique to the C language.

Click to expand...

Prevalent, but not unique. I seem to recall an .ASCIZ directive
or equivalent in more than one assembler. And remember the CP/M I/O
functions that used '$'-terminated strings? (Ghastly, they were.)

Are you doing a Colbert/Borat here? That doesn't even need a response
as it clearly makes *MY* case, not yours. (Assembly is not a
"language" and I have never seen an assembler that came with a
standard library that told you how to encode strings.)

But in any case, what's special about strings?

When they are easy to work with, there's nothing special about them.
What's special about them in this case is the incompetent support for
them in the standard C library. fgets() cannot cleanse or even notice
a '\0' in the strings it reads, while the rest of the library does. I
have not delved into this bug in detail, but obviously that's a prime
candidate for what went wrong here.

[...] If using a sentinel to terminate a string is a Bad Thing,

Nice straw man. You can force any design to work, its a question of
the kind of support you give to make it work *WELL*. My library, for
example, uses both length limits and \0 termination in order to allow
for a smooth transition between each kind of string. It relies on the
fact that the intersection of the two is rich enough to satisfy
anyone, and when you need to allow for contained '\0's you can just
ignore the terminator semantics (since all the functions alway ignore
this redundant terminator as well).

[...] why isn't it also a Bad Thing to
use a sentinel to terminate other sequences? Linked lists, for example:
Do you stop traversing when you find a NULL link (or an "I'm the end"
bit), or do you stop when a node counter tells you to?

So just to make it clear that I am not associated to this straw man --
the issue with strings are: 1) storage 2) aliasing content with meta-
data (i.e., \0 occupies a character position, but its not a
character.) 3) ability to deal with sub-strings.

If you decided to make strings a linked list of characters as you
suggest, you would not have a pointer to a dynamically allocated node
which contained a \0 (unless you are idiot) but instead just a pointer
to NULL (or some other sentinel value.) You would also be laughed at
for doing so; the overhead cost is way too high.

String in C are not just a sequence with a terminator -- they are
*ARRAYS* of characters. (Other string libraries like Vstr have
dropped this semantic.) In C the length of an array has to exist for
it to be sound. The bug in question happened because the amount of
data read and strlen() somehow didn't match up. With explicitly
length delimited strings, obviously that sort of thing cannot happen.

Do you use '\n' to mark the division between one text line and
the next, or do you attach a length to every line, or make every line
the same length? (Both latter strategies, by the way, are in actual
use in actual file systems.)

How is using a delimiter in any way comparable to fixed line limits?

Would C be a better language if it eliminated ; as a statement
terminator and instead used a count of characters or tokens?

Exactly who do you think you are fooling with this straw man? Do you
seriously think I am advocating the elimination of the ; in C?

There's nothing inherently wrong with sentinels, and there are
only two things you need to keep in mind:

- Don't see a sentinel where none exists, and

- Don't keep moving when the sentinel hollers "Stop!"

Tech07's original response to you was dead on target, even though he
seemed to back off for some reason.

A sentinel can be and is best used as meta-data, not data. ASCII and
UNICODE both list \0 as the well defined control character NUL.
Neither standard demands how such a character is to be used. In fact
Unicode sees absolutely no distinction between the characters 0
through 8 inclusive. C's imposition of \0 as data with meta-data
meanings is just that -- an imposition.

It can be made to work, but you have to be diligent in unifying the
array-like and terminator-like semantics of strings in the library.
That clearly is not the case in the C standard library (specifically
the fges() function as the most obvious candidate for this failure.)

Paul Hsieh · Aug 4, 2009

Don't you think that, if you make a statement like "null-terminated
strings are bad", it is _your_ responsibility to explain why you think
so, rather than Eric's to find out why you think two kinds of null
terminators are so essentially different?

Did you miss the first post on this thread? Some DNS-server/Web
browser combination just at itself because of bad \0 termination. It
means someone was compelled to write code that somewhere along the way
ignored this \0 terminator. fgets(), which is part of the standard C
library already does this, but there are many obvious memory transfer
primitives from the C library (fread, memcpy) which might have also
caused this error. Either way, the case has been made; a real product
failed because of the way C treats NUL. C is also unique in this
respect so the need for this focused discussion is obvious.

Tech07 got Eric Sosman totally dead to rights on this. Eric Sosman is
playing total head in the sand and straw man. The fact that he sees no
serious distinction between data and meta-data is just a sad testament
about him. Tech07 is fully justified in declaring checkmate and moving
on.

Tech07 · Aug 4, 2009

Joe Wright said:
Tech07 said:

While that is factual, it is not correct (whoever made ASCII, not you).
While there are 127 ASCII codepoints (outside of "extended 8-bit ASCII"),
only a subset are characters or printables. 'NUL' is the name given to
the codepoint 0, and yes, apparently/unfortunately they call it "the null
character" even though it is not a character at all.

Click to expand...

There are 128 code points, 0..127 and I'll take all of them as characters.
How many do you think there are?

| 0 NUL| 1 SOH| 2 STX| 3 ETX| 4 EOT| 5 ENQ| 6 ACK| 7 BEL|
| 8 BS | 9 HT | 10 LF | 11 VT | 12 FF | 13 CR | 14 SO | 15 SI |
| 16 DLE| 17 DC1| 18 DC2| 19 DC3| 20 DC4| 21 NAK| 22 SYN| 23 ETB|
| 24 CAN| 25 EM | 26 SUB| 27 ESC| 28 FS | 29 GS | 30 RS | 31 US |
| 32 | 33 ! | 34 " | 35 # | 36 $ | 37 % | 38 & | 39 ' |
| 40 ( | 41 ) | 42 * | 43 + | 44 , | 45 - | 46 . | 47 / |
| 48 0 | 49 1 | 50 2 | 51 3 | 52 4 | 53 5 | 54 6 | 55 7 |
| 56 8 | 57 9 | 58 : | 59 ; | 60 < | 61 = | 62 > | 63 ? |
| 64 @ | 65 A | 66 B | 67 C | 68 D | 69 E | 70 F | 71 G |
| 72 H | 73 I | 74 J | 75 K | 76 L | 77 M | 78 N | 79 O |
| 80 P | 81 Q | 82 R | 83 S | 84 T | 85 U | 86 V | 87 W |
| 88 X | 89 Y | 90 Z | 91 [ | 92 \ | 93 ] | 94 ^ | 95 _ |
| 96 ` | 97 a | 98 b | 99 c |100 d |101 e |102 f |103 g |
|104 h |105 i |106 j |107 k |108 l |109 m |110 n |111 o |
|112 p |113 q |114 r |115 s |116 t |117 u |118 v |119 w |
|120 x |121 y |122 z |123 { |124 | |125 } |126 ~ |127 DEL|

128 codepoints, fewer characters. (I can't divide by 2 on Fridays).

Tech07 · Aug 4, 2009

Malcolm McLean said:
Ad hominem is the assertion that an argument is wrong because of the
person who is making it.

eg
The Duke of Edinburgh thinks that single mothers should not get
benefits, but he lives off the State himself..
The Pope thinks that condom use is immoral, but what does he know
because he is celibate?
Kids think it is necessary to have the latest trainers.
Islam is a black man's religion.

It is not the same an an insult. Most insults are not ad hominem
arguments.

From Merriam-Webster Online:

"ad hominem

1 : appealing to feelings or prejudices rather than intellect
2 : marked by or being an attack on an opponent's character rather than by
an answer to the contentions made"

I've always used 'ad hominem' it as synonymous with "personal attack"
(especially on USENET which implies one has jettisoned debating etiquette)
which seems to be pretty much (2) above. Your examples do not appear to be
examples of either definition (1) or (2).

Tech07 · Aug 4, 2009

Flash Gordon said:
Moi said:

Gareth Owen wrote:

strcat is a MUCH faster operation if you do NOT seek the terminating
zero.
That depends. It's probably true for long strings. If your string
implementation looks roughly like

struct string
{
unsigned len;
char * str;
}

then for short strings, looking up the length could easily cause a
cache miss -- or even a page fault -- depending on memory access
pattern and string accesses require an extra level of indirection.
NULL terminated strings have guaranteed memory locality.
A more likely structure would be...

struct string
{
unsigned len; /* possibly size_t rather than unsigned */ char str[];
}

So the length is guaranteed to be a few bytes before the start of the
character data. So memory locality should not be a problem unless your
pages are very short indeed!

If you have lots of short strings (say you are trying to load a massive
dictionary in to memory, one string per work) the memory overhead of
the length could become more significant.
Then you use a short length of one byte, and you don't need the
terminator so the overheads are the same.

Click to expand...

Click to expand...

Thus limiting the length of the string quite severely, or requiring
multiple different string types.

Adding complexity, if you mean what I think you mean.

The complexity is still there, and unless you do all the changes to the
language required to make strings a first class type (which I think would
be large changes) the complexity will show itself in various places.

There are other ways to do it. Allocate one block in which you store
multiple strings separated by the null termination. Then you have only one
allocation overhead for however many strings you have.

An alternative would be to misuse utf-8 encoding to store the length.
This would keep the alignment to 'char' and still allow a string length
of up to 16K, (IIRC).

Click to expand...

That kind of scheme is, I think, what bartc meant by "or length could be
variable". However, it does add complexity. It may well be valuable and
worth while complexity, but it is still there.

On "complexity": You can't really fairly compare the complexity/simplicity
of an incorrect solution with that of a correct one.

On "correctness": It's very subjective (as evidenced again by this thread)
whether the "one style fits all" compromise of the null-terminated C string
design is "palatable" or even correct. One could inductively reason about
the list of requirements that led to the null-terminated design, but there
probably wasn't any such formal or thorough analyses. The analysis is easier
done in retrospect. I'm not sure what portion of the C/C++ programming
languages crowd has either opinion, but I have an incling that most would
opt for a different solution if it was C language creation time right now.

The old saying goes, "There ain't no such thing as a free lunch".

When I'm doing major string handling I prefer to use languages other than
C which have better string handling and in which strings are a first class
type. Making C such a language would, in my opinion, be a big change.

OK, like me, you think the null-terminated design is not adequate. I'm not
ready to say whether a library-only approach can adequately remedy the
issues, but of course, an ISO Standard remedy is much more difficult to do
than an in-house one.

Beej Jorgensen · Aug 4, 2009

Tech07 said:
Not hardly. That you can't see the difference of concerns between
terminated character strings and a linked list node using a null or
sentinel to indicate end-of-list or beg-of-list behooves you to do some
homework and write an essay about it, IMO.

You and Eric are looking at this from entirely different levels. From
his correct theoretical level, there is no difference between a linked
list with a sentinel value terminator and a string with a NUL
terminator.

But from your correct practical level, it makes all the difference in
the world.

(For clarification of the distinction, use this example: someone might
claim, with theoretical correctness, that you might as well play the
California Lotto numbers 1,2,3,4,5,6, since they have the same chances
of occurring as do the others. But another person might claim, with
practical correctness, that only a fool would play those numbers because
he would have to split the winnings with Steve Jobs, who also plays
them.)

See, it wasn't merely the NUL-terminated strings that allowed this
exploit; it was the cooperation of a variety of systems PLUS
NUL-terminated strings. If you can come up with any string
representation, I can come up with a cooperative set of systems that
will create a security hole with it, guaranteed.

Merely separating the metadata from the data isn't enough; there can
still be exploits. But, again, you can argue that from a practical
standpoint, having NUL-terminated strings makes it "easier" to
accidentally write certain exploitable code. And, if I had to debate,
I'd probably want to argue that length-based strings are more resistant
to exploits based on unvalidated input.

But all of us must admit that a properly-designed system written in
correct C is secure.

I wouldn't call the null terminator it a sentinel, but I know what you
mean.

I would--it matches every definition of the word I can find. But don't
listen to me or Wikipedia when instead you can listen to Don Knuth, who
uses "sentinel" in the buffer-termination sense in The Art of Computer
Programming, Volume 1, page 217. (And in the linked-list termination
sense on page 276.)

-Beej

Tech07 · Aug 4, 2009

Tech07 said:
I don't think a library is sufficient. Primitive strings are inextricably
tied to the implementation (compiler level). To be comprehensive about it,
you need compiler/language support.

Maybe.

Tech07 · Aug 4, 2009

Beej Jorgensen said:
I would--it matches every definition of the word I can find. But don't
listen to me or Wikipedia when instead you can listen to Don Knuth, who
uses "sentinel" in the buffer-termination sense in The Art of Computer
Programming, Volume 1, page 217. (And in the linked-list termination
sense on page 276.)

I wasn't debating the definition, I was clarifying my own usage of the the
word 'sentinel'. When talking about ASCII zero at the end of a character
string, I call it a 'terminator'. When talking about a NIL node in a
red-black tree, I call it a sentinel. Here's the example from the RB tree I
use (the name 'sentinel' is just coincidence):

// rbtree sentinel
//
rbtnode sentinel(NIL, NIL, 0, RB_BLACK, Null);
rbtnode* NIL = &sentinel;

So, I consider the concept of 'sentinel' more specifically than the general
meaning. Just my own colloquialism I guess, but with purpose.

user923005 · Aug 4, 2009

I wasn't debating the definition, I was clarifying my own usage of the the
word 'sentinel'. When talking about ASCII zero at the end of a character
string, I call it a 'terminator'. When talking about a NIL node in a
red-black tree, I call it a sentinel. Here's the example from the RB tree I
use (the name 'sentinel' is just coincidence):

// rbtree sentinel
//
rbtnode sentinel(NIL, NIL, 0, RB_BLACK, Null);
rbtnode* NIL = &sentinel;

So, I consider the concept of 'sentinel' more specifically than the general
meaning. Just my own colloquialism I guess, but with purpose.

Sentinal has both meanings.

The usual sense in which I use it is (from my colloquial standpoint of
usage):
"A special value used to show termination of a list or data structure
whose sole purpose is to speed calculation or to simplify the
algorithm."

So that definition is clearly appropriate to either case.

On the other hand, I would be much more likely to use a term like
'zero terminated' string or 'nul terminated' string or 'C style'
string to describe a C string than to use the term 'sentinal' in
connection with C strings. Not that it is incorrect. Just that it
seems like a special case that has a more certain meaning (sort of
like 'shard' to describe broken pottery -- it communicates a tiny bit
more and is highly specific).

Zero Byte Terminated Strings	10	Mar 28, 2007
Working with NON-NULL terminated strings	4	Jul 14, 2007
strncpy() and null terminated strings	4	Apr 8, 2004
Reading null terminated strings in Java	9	Feb 4, 2009
Exact Arithmetic and Strings	4	Jul 13, 2010
Null-terminated strings with struct module?	2	Mar 6, 2004
Null character and JavaScript strings	16	Mar 4, 2011
FAQ 6.23 How can I match strings with multibyte characters?	0	Jan 11, 2011

Zero terminated strings

spinoza1111

Richard Bos

Richard Bos

spinoza1111

Phil Carmody

Phil Carmody

Phil Carmody

Phil Carmody

James Kuyper

Nobody

Flash Gordon

Paul Hsieh

Paul Hsieh

Tech07

Tech07

Tech07

Beej Jorgensen

Tech07

Tech07

user923005

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads