Zero terminated strings

T

Tech07

jacob navia said:
Eric said:
jacob said:
Zero terminated strings are a continuing security nightmare.
[...]

Even if we accept the thesis as proven (which I don't, but
let it pass), what remedy or other action would you suggest?
Have you anything to offer other than a complaint? Anything
constructive, for instance?
I have proposed (and implemented) a full replacement of zero
terminated strings in my C compilation system lcc-win.

What about the following function call scenario:

processstring("this is the string to process");

The literal passed to the function is a null-terminated C-string. And what
about this null-terminated C-string:

static const char error_string[] = "Null arg!";
 
T

Tech07

But in any case, what's special about strings? If using a sentinel
to terminate a string is a Bad Thing, why isn't it also a Bad Thing to
use a sentinel to terminate other sequences? Linked lists, for example:
Do you stop traversing when you find a NULL link (or an "I'm the end"
bit), or do you stop when a node counter tells you to?

Nulls in linked-list links are not data. They are part of the machinery of a
program and under developer control. You tried to generalize to make a
point. Unfortunately, generalizations are almost never true and your
approach failed miserably. I'll let others say specifically why your other
examples are different from character strings.
 
T

Tech07

bartc said:
This doesn't seem to bother the Python people. I think Python 3.x is not
backwards compatible with 2.x.

2.x is still available for those who don't want to change. 3.x is
available to those who want to write new code using the latest language
without any obsolete baggage. And I believe conversion tools are available
for those who want to upgrade existing code.

Contrast with C...

The C99 branch should be put in maintenance-only mode and C1X fork should be
the break away from the past.
 
S

spinoza1111

I have hanging on comp.lang.c over a longer time and based on my
experience here, the above statement is completely false. Your
analysis of Richard Heathfield's programming ideas is way off the
reality. Perhaps you need to be more fundamental to C's technical
details in order to understand what *exactly* Heathfield wants to say
when he says something about someone's code.


I think I learned exactly the opposite  from Richard (and from CLC).
Take responsibility of your code. Why someone else has to pay for your
mistakes. I even wrote entire Coding Guidelines for my company based
on what I have learned.


History has taught me that average biological students and
academicians always had trouble understanding Charles Darwin's ideas.
His ideas were accepted after 100 years of his death. Perhaps you need
time to understand the ANSI C.


I have hanging on comp.lang.c over a longer time and based on my
experience here, the above statement is completely false. Your
analysis of Richard Heathfield's programming ideas is way off the
reality. Perhaps you need to be more fundamental to C's technical
details in order to understand what *exactly* Heathfield wants to say
when he says something about someone's code.

That's the secret of self-serving expertise. You cannot be an "expert"
if you "bite the hand that feeds you", here, the computer companies
that foolishly overinvested in C. It becomes impossible to criticise.
This type of "expertise" in the United States and Britain produced
thousands of casualties in the Iraq war.

You see, I did as you recommend years and years ago. I taught C at
Princeton and assisted John "A Beautiful Mind" Nash with C. But I
realized as platforms grew more and more complex that I was trying to
do "objects" with a non-OO language and as a competent programmer I
thought ontologically, in terms of objects, even when programming in
Cobol in the 1970s: in Cobol I learned to treat a report as an object
and not at all as a bunch of "lines".

But let me be quite honest with you. Although I taught C at Princeton
and effectively assisted John Nash, I rarely did much effective coding
in the language.

[Go ahead, Richard, insert your flame. You done? Thanks.]

In True Basic, I developed a hydrostatic stability program for
oceangoing research vessels used for years on far more than the vessel
it was designed-for. In IBM 1401 SPS, a primitive assembly language, I
developed an early data base system with provisions for user
specification of logic and report format. In Visual Basic I developed
a compiler for Quick Basic. In Cobol I developed a digital switch
simulator to bill for calls by reconstructing them from phone events.
In Rexx I developed a parser generator and parser for Rexx. Etc.

But when I adopted C for a real estate appraiser, I was dismayed,
since I had to spend, as Richard Heathfield has admitted he's wasted,
hours of paid time merely to be able to use strings, which have a
place in all applications because we need error messages.

As a programming professional I abandoned C to discover that even old
Visual Basic was safer and superior. I then discovered C Sharp and
today will not use C except for recreationally freezing and crashing
my system.
I think I learned exactly the opposite from Richard (and from CLC).
Take responsibility of your code. Why someone else has to pay for your
mistakes. I even wrote entire Coding Guidelines for my company based
on what I have learned.

When the basically ignorant reach the limits of their minds,
They then turn to groaning Religion, this one often finds.
Having been bested in the Realms of Science and of Light,
They call upon airy spirits of Divinity to be their manly might.
They doff the Scholar's robes and don the robe of Priest,
And groan shibboleths and curses to the sun as it rises in the East.
They burn the books of yore and thrust the Honest from their door,
And pronounce sentence on all but the Pedant, Boor, and Bore.
"Responsibility" is their newest watch-word, although of this they
partake None:
It is post facto victim blaming which the Witch is perforce,
condemn'd.
You pass their void pointers to Abyssinia, and Timbuktoo
It gives you meaningless answers and then it prints, Screw You.
Old and ancient Programmers like Ritchie and Brian Kernighan
Are elevate to Gods and lose the name and form of Man.
They alter'd parameters passed by value and without an asterisk
This their followers do: it's the mark of the truly "Blest".
Tics and saws and old wives' tales take on a new power and new might:
Their violation becomes a hanging matter, and they're used to just
Afright.
Thus man devolves as years go by from Neil Armstrong on the Moon
To Ape, to Horse, to Dog, to Swine and then, to mad Buffoon.

(Edward G. Nilges, Hong Kong, 1 Aug 2009. Moral rights have been
asserted by the author, so screw you, ok? All verse responses on
Google Groups are writ original extempore and for the occasion and the
hour, unless otherwise indicated. "Probably didn't know I was a poet,
probably thought I was a jag" - Sasha, Lincoln Park Ale House, 1981.)
 
T

Tech07

jacob navia said:
This argument has been brough countless times and even if the
zero terminated string people try to offer a lot of smoke
around it, it is OBVIOUS to all.

strcat is a MUCH faster operation if you do NOT seek the
terminating zero.

strlen is instantaneous.

strcpy can use block moves like memcpy.

etc etc.

And it's the way the x86 string assembly instructions work. "C close to the
hardware"? Certainly C-strings have an "impedance mismatch" with the Intel
hardware as far as "close to the hardware" goes.
 
T

Tech07

Gareth Owen said:
That depends. It's probably true for long strings.

If your string implementation looks roughly like

struct string
{
unsigned len;
char * str;
}

then for short strings, looking up the length could easily cause a
cache miss -- or even a page fault -- depending on memory access
pattern and string accesses require an extra level of indirection.
NULL terminated strings have guaranteed memory locality.

Note that length-prefixed strings do not have the problems of the
descriptor-based string you show above. JNavia's string is descriptor-based.
One size (one representation) does not fit all, as is always the case.
 
B

bartc

Gareth Owen said:
That depends. It's probably true for long strings.

If your string implementation looks roughly like

struct string
{
unsigned len;
char * str;
}

then for short strings, looking up the length could easily cause a
cache miss -- or even a page fault -- depending on memory access
pattern and string accesses require an extra level of indirection.
NULL terminated strings have guaranteed memory locality.

If you're comparing one string with another (equal/not equal), then most of
the time you probably just need to compare the lengths, you don't need to
look at str*.

Anyway even with an ordinary string:

char *str;

you still have to look at two memory levels: the memory containing str, and
the memory where str points to. So the counted string could well reduce
memory accesses even for short strings.
 
T

Tech07

Kaz Kylheku said:
Obviously, this bug was caused by idiots who thought that they could solve
some
imaginary problem by using a ``better'' string library that can represent
a
null byte in the middle of a string.

A null byte has absolutely no place in character (i.e. text) strings. If
an
array of bytes contains nulls, it's not a character string, but a binary
string, or blob if you will. Null is not really a character, basically.
It has
no glyph, and no signaling action for printing control.

There is no legitimate need, ever, in a data representation for text, to
support an embedded null byte. It's not text; it's a special code which
says
``I am not text''. So, implicitly, if a null byte follows text, it means
either
that the text has ended, or the text is corrupt with the repugnant
inclusion of
non-text data.

The moral of this story is that if your language or string library allows
nulls
in the middle of a string, it's wrong, and you should fix it such that the
null
is treated as a terminator, or such that an exception is triggered if it
occurs.

There are good reasons for working with strings in a representation other
than
the null-terminated array, but being able to represent a null in the
middle of
a string is not one of those good reasons. Strings that know their own
length
should still banish the null byte from being a constituent.

"string" is actually the more general computer science term. "character
string" is a sub category. "string" is used as a synonym for "byte string"
where each byte may be 0 thru 255. In this thread though, "string" was
apparently being used as "character string"???

Jacob, when you say "string", do you mean "character string" or "byte
string"?
 
B

bartc

A null byte has absolutely no place in character (i.e. text) strings. If
an
array of bytes contains nulls, it's not a character string, but a binary
string, or blob if you will. Null is not really a character, basically.
It has
no glyph, and no signaling action for printing control.

There is no legitimate need, ever, in a data representation for text, to
support an embedded null byte. It's not text; it's a special code which
says
``I am not text''. So, implicitly, if a null byte follows text, it means
either
that the text has ended, or the text is corrupt with the repugnant
inclusion of
non-text data.

OK, so some people want to use 'strings' to mean collections of byte values,
rather than collections of byte values that can store any value except zero.
(And distinct from char arrays which do not contain their own length.)

(For example, a complete binary file could be treated as a single String, or
a set of concatenated Asciiz text strings could be stored as a single
String.)

Counted strings (or any mechanism that doesn't rely on terminator character)
can deal with such Strings.
Standard C-strings can't.

And since C doesn't specify a character set, why can't I invent a character
set that has a special meaning for byte value 0? Again, counted strings
could cope with such a set.
The moral of this story is that if your language or string library allows
nulls
in the middle of a string, it's wrong, and you should fix it such that the
null
is treated as a terminator, or such that an exception is triggered if it
occurs.

Only if the library is specified only for Asciiz and similarly terminated
strings.
 
J

James Kuyper

Gareth said:
Thats very interesting.
Do you know if the standard mandates that behaviour?
Must std::string be able to cope with embedded NULLs?

NULL is a macro. I think you're talking about nulls, not NULLs.

<OT stuff about C++>

The C++ standard does not explicitly say that it must cope with embedded
null characters, it simply describes a counted-string implementation
while failing to attach any special significance to null characters,
leading to the conclusion that they must be treated as any other character.

There are some exceptions, but it's hard to identify them, because
std::string<> is a specialization for char of std::basic_string<charT,
traits> where charT can be any literal type(21p1), as defined in 3.9p10,
and all the documentation is in terms of std::basic_string<>. For
charT==char, charT() is '\0'. Therefore, you have to look for statements
referring to charT(). traits::length(p) yields the smallest i such that
traits::eq(p, charT()) is true; for traits==std::char_traits<char>,
traits::eq(a,b) is simply a==b, and, the result is therefore the same as
you would get from strlen(p).

std::basic_string<charT,traits> uses traits::length() only in those
member functions that take charT* arguments, to determine the length of
the stings pointed at by those arguments. Null-termination plays no role
in the internals of the class.

</OT>
 
E

Eric Sosman

Tech07 said:
Nulls in linked-list links are not data. They are part of the machinery of a
program and under developer control.

This distinction is too subtle for me. You're saying that
charArray[42] is data but that ptr->next is not? And that
ptr->next is "under developer control" but charArray[42] is
not? What sort of "control" can the developer exert on a
piece of memory that is "not data" like ptr->next?
 
S

spinoza1111

spinoza1111said:



If that is true, it is a tragedy. But it is unlikely to be true.

It's true.
Yes, you mentioned that many, many times, as if you'd done something
amazing. But apparently he was using int in a program he'd written,
but the int wasn't big enough, so you suggested long int, or possibly
unsigned long int - or something along those lines, anyway. End of
help. Gosh gee whizz.

No, based on my previous compiler development experience, I discovered
that the Microsoft compiler was erroneously using the wrong precision
to evaluate a constant expression, and the Borland compiler was not.
Gosh gee whizz.

More important, I was selected because unlike most programmers I have
social skills and don't arrogantly push people around for their
putative "incompetence". It was very important that Nash retain his
stability and not react to threatening people. I shudder to think how
you would have handled this situation.
That looks like an exaggeration to me.


Like a padded cell, right?

Screw you, asshole.
 
K

Kenny McCormack

Lew Pitcher said:
If you have a problem with an existing C program, please bring it up here.
If you have a problem with the C standard, bring it up in comp.std.c.
If you have a problem with the C language, write your own.

Defensive, much?

(Your whole screed sounds so much like a little girl whose dolly got
thrown in the mud.)
 
A

Antoninus Twink

Yes, but the trouble is Jacob, you have a bad reputation, mainly due
to a vicious and unrelenting campaign of character assassination
that's been waged in this group by me and my buddy Heathfield over
several years.

There, fixed that for you Mark.
 
N

Nobody

Yes you are. I can sell software I've compiled using Cygwin as long as I
don't link it to the cygwin dll (by using their mingw stuff, for
example) OR if I'm willing to let customers have the code under GPL
(which some companies do).

Or if you only use your code internally and don't distribute it.

The GPL doesn't require that you distribute anything; it only sets
conditions in the event that you do.
 
K

Keith Thompson

Richard Heathfield said:
spinoza1111 said: [snip]
Oh, was that a little close to the bone? I apologise. I didn't realise
you were sensitive to insanity references. I will try to remember not
to make such references in future - although I don't promise always
to succeed in that attempt. After all, almost everything you post is
redolent of insanity. Nevertheless, I'll do my best to refrain.

Richard, I suggest that it's time to stop feeding the troll. If you
refrained from responding to anything spinoza1111 posts (by use of
a killfile if necessary), I think it would substantially improve
this newsgroup's signal-to-noise ratio. Almost everyone here
already knows that spinoza1111 is wrong about nearly everything;
those few who don't won't be convinced otherwise.
 
K

Kenny McCormack

<snip>

Eric really is a complete and total loon - and should be completely
ignored (especially by you, Jacob), except as an object of ridicule (the
purpose of this post).

The level of "loon" that Eric boy displays, in every post, is simpley,
to put it bluntly (and there is no other way to put it), breathtaking.

To so completely miss the point of everything (as he consistently does)
is, truly, a skill.
 
K

Kenny McCormack

That is, I was trying to AVOID that.

You are just misquoting me in bad faith (as always)
[/QUOTE]

Clearly so. But they will never admit it.

Of course, you make it too easy for them, because your command of
English is not that good. Kiki claims that you are no good at
irony/sarcasm, but he just pretends not to get it, to push his agenda
(i.e., to be a jerk).

I.e., business as usual in that carnival known as comp.lang.c.
 
K

Kenny McCormack

Because you are advocating removing null-terminated strings from the
Standard. Existing code would therefore no longer compile with a Cxx
compiler.

Don't be a jerk (*). Nobody is advocating *removing* anything.

(*) Yes, I know this is hard for you, since it is your whole act.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top