Draft Secure C

Walter Roberson · Jan 14, 2007

Walter Roberson a écrit :

Ada is nothing like a "portable assembler"!!!! You are dreaming.

As best I recall, Ada's premise was that all operations would be
precisely defined and implemented on all platforms. It was
very important in the history of Ada that there would be only *one*
legal interpretation of any Ada program. The strong intent was that
an Ada program would have exactly the same semantics on -every-
platform that Ada was available on, and that every Ada program could
be taken and executed (with the same results) without ANY changes
(other than recompilation) on every Ada implementation. And it
was important to Ada that it be usable for robust multiprocessing
and time-sensitive work. Ada was, in short, intended to provide
portable high-level interfaces to hardware. It seems to me that the
lackluster demand for Ada should tell us something about the
market conditions for 'a really first rate "portable assembler"'.

Walter Roberson · Jan 15, 2007

Walter Roberson wrote:

I noticed how you didn't say never. So when you *have* needed some of
that functionality, what did you do about it?

Implimented in terms of the platforms I cared about, and documented
the platform restriction. Beyond those, I never received requests
to port to additional platforms.

Which is not to say that I paid lipservice to platform dependancies:
instead, it was the case that I paid close attention to what was or
was not promised by C, and in so doing, wrote code that avoided
the issues when possible, and isolated the affected areas when
dependancies were unavoidable.

Now here's the point: in NO case that I can remember, did I choose
another language because it offered portability guarantees that C did
not. Each time, I chose a language suitable for the nature of the
project.

The portability issues that you describe were never more than a
miniscule consideration in the work I did. Much more difficult was
portability at the OS level -- matters such as dealing with network
programming interfaces or serial port interfaces. The layers you
describe would be, for the work I did, essentially akin to
microoptimizations.

[...] Memory leak analysis sometimes, but only
one of the several extensions to heap functionality you propose would
make any difference to me (and that only on the odd occasion.)

Click to expand...

So your argument then, is that you don't think there should be memory
leak assistance, because the other proposals I made are not something
you would be interested in?

Your proposals would, in my opinion, do almost nothing to save C
amongst the general populace of programmers: most of your proposals
are irrelevant for most programs, I believe. They might do an
admirable job of fixing one corner of the language, but I don't
believe for a moment that C is "crying out for" that set of
changes. You accuse the C99 committee of not addressing the "real"
problems of C, but in my assessment, what you propose would be
largely greated by a Hearty High-Ho "So What?" by the great majority
of C programmers.

Okay ... So here is another one that you would use, but only if they
were tied to Unix and named "nothl" and "htonl"?

No, if they were provided by the C library, my first question would
be how to override them to get at the implementation's routines
of the same name: unless the C standards committee -defined-
them as operating the same way as in POSIX, C's versions would
be of no utility to me. I have no use for the operations outside
of network programming, and I'm sure the C standards committee knows
to butt out of the network programming standards area.

Are you saying these
are only possibly useful to Unix and therefore must not be available to
other platforms in a portable way?

I'm not in the business of writing Linux drivers or OS kernels that
would, optimally, be writable without changes for every platform
that the code might -possibly- be ported to. The effort that the
implementors of my network stack have to go through to provide
ntohl() and htonl() are of little interest to me: that's host
implementation, and I don't care what compiler extension or whatever
that they hide away in a system library or system header file intended
for system use. Whether such extensions are built into C or not
wouldn't have made C more useful for much of anything I did in
the last decade; such extensions might have marginally increased the
-theoretical- portability of some of my programs, but not one iota
would they have increased the -practical- portability of what I did.

Alloca(), and better heap management is actually a reaction to garbage
collection. Garbage collection makes memory management in other
languages a complete non-issue.

I've been using the symbolic computation language, "maple" a fair
bit over the last year. I profiled one of my programs to figure
out where to expend the most effort in speed improvement... and
found that 68% of the execution time was being spent in garbage
collection. I would have had to have developed hefty mathematical
theorems able to operate on the terms in-place (with no pointers,
and with the order of the terms open to change without notice)
to mathematically bypass the need for the garbage collection in
order to have a chance of significantly improving the speed of
my program -- greatly improving my program complexity (if such
theorems could be found at all) just to work around the slow
garbage collection. Sometime later, one of the developers mentioned
in passing that the speed of the garbage collector is proportional
to the -amount- of memory allocated, not to the number of memory
allocations. I am at a loss for words to describe how glad I am
to have your assurance that "Garbage collection makes memory
management in other languages a complete non-issue."

You are saying that adding enhancements to C are not a good idea,
because adding them to C++ would be better?!?!

I have a copy of the official printed ISO C++ standard. It is
a bear to find anything useful in it, precisely because C++
added so many (mandatory) features of narrow utility that the noise
drowns out the signal. If I -had- mentioned C++ at all (and I
did not), then Yes, it might have been with the notion that it
would be better to add the features to C++ than to C -- better to
let C++ degrade even faster on its own obsiety than to inflate C
for little gain.

Sorry, but I find your entire response
completely vacuous.

And I found your response to my response to be full of bad logic
and strawman arguments.

Keith Thompson · Jan 15, 2007

As best I recall, Ada's premise was that all operations would be
precisely defined and implemented on all platforms. It was
very important in the history of Ada that there would be only *one*
legal interpretation of any Ada program. The strong intent was that
an Ada program would have exactly the same semantics on -every-
platform that Ada was available on, and that every Ada program could
be taken and executed (with the same results) without ANY changes
(other than recompilation) on every Ada implementation. And it
was important to Ada that it be usable for robust multiprocessing
and time-sensitive work. Ada was, in short, intended to provide
portable high-level interfaces to hardware. It seems to me that the
lackluster demand for Ada should tell us something about the
market conditions for 'a really first rate "portable assembler"'.

<OT>
Not quite. Ada has the equivalent of C's "undefined behavior" (Ada
calls it "erroneous execution"), though there are fewer instances than
in C. And there are a number of things that are system-specific. For
example, the sizes and ranges of Ada's predefined integer types
(Integer, Long_Integer, etc.) are implementation-defined, much as they
are in C.

Ada aims to make portable code easier to write (for example, you can
easily declare an integer type with a specified range), but it doesn't
make non-portable code impossible, or even particularly difficult.

It also has a number of features designed to interface to low-level
hardware (embedded systems are a major target), but an attempt is made
to keep such features cleanly separated from the higher-level
features.

Ada is no more a "portable assembler" than C is. I'm not sure what
its lackluster demand tells us.
</OT>

Walter Roberson · Jan 15, 2007

My ideas come from looking at other programming languages, and from
looking at real world applications:

Coroutines come from the fact that Lua has them, Python has something
similar but less general (generators) and they are very useful for
web-browsers (yeilding on socket blocks to allow a single tasking
application to efficiently download a web page) and chess engines (just
the way the jumble of loops for move generation intertwines with the
alpha-beta algorithm can be drastically simplified with coroutines).

In the web-browser case, what you are essentially doing is asking
to import thread capabilities into C -- possibly only
"cooperative threading" on uniprocessors, but still thread capabilities.
It seems to me that you would need to import noticably more than
just co-routines: you would need to import socket blocking
control, extend fread() and kin to return states such as
EAGAIN (i.e., no data is waiting), and probably a signal or two
would have to get involved so as to provide notification that
the co-routine is ready to proceed.

There is perhaps room for very lightweight threads in C: the
POSIX threading model seems to require a big library and
understanding a lot of routines. I would have to think more
about how such a thing would require extending C itself, versus
how much of it could essentially be pushed off to a set of library
routines; if it can all be reasonably handled as library routines,
then I'm not certain that it would be a good thing to nail the
functionality into the C standard.

websnarf · Jan 15, 2007

Walter said:
In the web-browser case, what you are essentially doing is asking
to import thread capabilities into C -- possibly only
"cooperative threading" on uniprocessors, but still thread capabilities.
It seems to me that you would need to import noticably more than
just co-routines: you would need to import socket blocking
control, extend fread() and kin to return states such as
EAGAIN (i.e., no data is waiting), and probably a signal or two
would have to get involved so as to provide notification that
the co-routine is ready to proceed.

That is incorrect. All you need is a probing/peeking function for any
potential blocking read. Everything else is just a matter of program
design. There is a very specific reason why you don't want to add in
full multi-threading. Multithreading is very hard to make totally
portable, and introduces advanced concepts like semaphores, mutexes,
and other critical section solutions. Coroutines are a very special
subcase that doesn't require any of those complications, is extremely
low-overhead, and does not, by itself lend itself to dead-locking. So
it would introduce even more undefined behavior into C (which would
probably please the standards committee people to no end.)

It turns out that there are many server applications that are most
appropriately solved by just coroutines. But the added value is that
coroutines are also more useful than for the simplest of multitasking
problems. They allow you to synch up two complicated loops while
keeping each loop as simple as possible.

There is perhaps room for very lightweight threads in C: the
POSIX threading model seems to require a big library and
understanding a lot of routines. I would have to think more
about how such a thing would require extending C itself, versus
how much of it could essentially be pushed off to a set of library
routines; if it can all be reasonably handled as library routines,
then I'm not certain that it would be a good thing to nail the
functionality into the C standard.

Well, Microsoft has their own threading system with, as far as I
understand it, far more complicated synchronization objects. Another
useful standard is MPI. Of course none of these standards are
universally implemented, of course.

websnarf · Jan 16, 2007

Keith said:
You can, of course, come up with a single example of such an
"advantage" (that applies to the 10-20 year time frame Jacob was
talking about)?

Click to expand...

I don't know about a 10-20 year time frame, but consider this. If a
program is going to scan a string anyway, there's not much benefit in
storing its length separately. In a recent discussion here, somebody
posted an example of such a program (a fairly small one). jacob
claimed that a solution using memcpy() (which requires knowing the
length in advance) was faster than an equivalent solution using
strcpy() (which doesn't) -- but he only provided actual numbers for an
x86 platform. I demonstrated that the strcpy() solution is actually
faster on some other platforms. [...]

Well, those platforms would definately be looking *backwards* in time.
So indeed the 10-20 year time frame qualification *does matter*. But
more to the point *EVERY* architecture created from this point forward
will prefer length prefixed string copying (that is because a parallel
dependency is always better than a serial one -- its easier to add ALUs
than increase the clock rate). If the C standard has no interest in
the future and is only concerned with architectures from antiquity,
then fine. But don't complain when C get branded with the COBOL label.

Now if you're doing a lot of processing that *does* require knowing
the length in advance, then yes, counted strings are advantageous.

Whether it requires it or not, having the length will *speed it up* or
be neutral for all scenarios, on all modern platforms, and make your
code safer, and make it easier to write and maintain.

But if you don't happen need it, then computing and storing it is
useless overhead.

If you dive into Bstrlib, you will find that often that additional
overhead can exist primarily in your auto-space, not necessarily in
your heap space (depending on your algorithm, or what exactly you are
doing.) In general, where it matters, this overhead can usually be
amortized using various packing methods (Bstrlib comes with good CSV
parsing code and netstrings if you really want to pack and serialize
many strings at once) or by treating the string data as if it were a
file (Bstrlib comes with something called bstreams which does exactly
this) which again only costs auto-space.

[...] I'm not arguing that C-style zero-terminated
strings are superior to counted strings, merely that there is a
tradeoff.

Still waiting for the example.

[...] I don't know which is better in general. jacob thinks he
does know, and that zero-termainted strings are inherently a bug in
the language.

Well, I only know it from direct comparison and fairly extensive
analysis of the situation. '\0' terminated strings *are* more error
prone; there is just no comparison. Its a white-hot flash point of
MAXIMIZED manifestations of buffer overflows (which TR 24731 doesn't
usefully address, BTW). Compare that with Bstrlib where its nearly
impossible to cause any kind UB due to a buffer overflow scenario
unless you are directly and unnecessarily hacking on it, or have
corrupted the data externally. (Other solutions such as Vstr are
basically about as good on this point.)

And in terms of performance comparison, you can put the two side by
side on any general task -- bstrings never give up the possibility of
falling back onto the Clib, so it cannot lose. However, it never needs
to do this as all the portable hand coded algorithms are equal or
faster than pretty much all the Clibs out there on a wide variety of
string kernels.

In particular, look at sub-string searching. Thats an algorithm which,
intuitively, should really be equal for both styles, since you have to
do character by character stuff no matter what. But it turns out that
good algorithms try to *unroll* the inner loop so that you can examine
two characters back to back without an intervening loop check. In C
you have to put in an extra test for an intermediate '\0' check (see
GCC's Clib source for strstr() for an example of this). With Bstrlib
you only do one test to see if you have at least two characters more
that you can scan. Its little things like this that just show up all
over the place.

And that doesn't even bring up the fiasco that is strcat(), where C
actually manages to lose to pathetically slow languages like TCL and
Python.

jacob navia · Jan 17, 2007

(e-mail address removed) a écrit :

In particular, look at sub-string searching. Thats an algorithm which,
intuitively, should really be equal for both styles, since you have to
do character by character stuff no matter what. But it turns out that
good algorithms try to *unroll* the inner loop so that you can examine
two characters back to back without an intervening loop check. In C
you have to put in an extra test for an intermediate '\0' check (see
GCC's Clib source for strstr() for an example of this). With Bstrlib
you only do one test to see if you have at least two characters more
that you can scan. Its little things like this that just show up all
over the place.

And that doesn't even bring up the fiasco that is strcat(), where C
actually manages to lose to pathetically slow languages like TCL and
Python.

I have repeated this over and over. For instance strrchr is vastly more
efficient when it can start at the end of the string and find the first
occcurrence of the searched for character *backwards* instead of
searching the whole string to find the last one!!!

Eric Sosman · Jan 17, 2007

jacob said:
I have repeated this over and over. [...]

Play it again, Sam. This time with more cowbell.

jaysome · Jan 18, 2007

(e-mail address removed) a écrit :

I have repeated this over and over. For instance strrchr is vastly more
efficient when it can start at the end of the string and find the first
occcurrence of the searched for character *backwards* instead of
searching the whole string to find the last one!!!

Given the mean and varance of the length of a string, I find it hard
to believe that it would be "vastly more efficient". If you're talking
about a Mega-byte-length sting, then yes. But most strings are no more
than 4 or 16 or 64 or even hundreds of bytes in length. And in these
cases, strrchr(), as defined by the standard, should suffice.

Certainly it might take longer for strrchr() to operate on longer
strings compared to shorter strings, on the average, but if that's the
least of your worries, then you have bigger fish to fry.

#include <stdio.h>
#include <string.h>
#include <time.h>

int main(void)
{
clock_t t1;
clock_t t2;
volatile char *p;

t1= clock();
p = strrchr("Hello", 'H');
t2= clock();
printf("p is %p,\n", (void*)p);
printf("and that took %.12f seconds\n",
(double)(t2 - t1) / CLOCKS_PER_SEC);
t1= clock();
p = strrchr("Hello World!", 'H');
t2= clock();
printf("p is %p,\n", (void*)p);
printf("and that took %.12f seconds\n",
(double)(t2 - t1) / CLOCKS_PER_SEC);
t1= clock();
p = strrchr
(
"Hello the very, very, "
"quite contrary, benevolent "
"and sometimes forgiving, but also, "
"at the same time, very "
"unforgiving, World!",
'H'
);
t2= clock();
printf("p is %p,\n", (void*)p);
printf("and that took %.12f seconds\n",
(double)(t2 - t1) / CLOCKS_PER_SEC);
return 0;
}

Output:

p is 0042603C,
and that took 0.000000000000 seconds
p is 0042602C,
and that took 0.000000000000 seconds
p is 00426FA4,
and that took 0.000000000000 seconds
Press any key to continue

Regards

jacob navia · Jan 18, 2007

jaysome a écrit :

Given the mean and varance of the length of a string, I find it hard
to believe that it would be "vastly more efficient". If you're talking
about a Mega-byte-length sting, then yes. But most strings are no more
than 4 or 16 or 64 or even hundreds of bytes in length. And in these
cases, strrchr(), as defined by the standard, should suffice.

Certainly it might take longer for strrchr() to operate on longer
strings compared to shorter strings, on the average, but if that's the
least of your worries, then you have bigger fish to fry.

#include <stdio.h>
#include <string.h>
#include <time.h>

int main(void)
{
clock_t t1;
clock_t t2;
volatile char *p;

t1= clock();
p = strrchr("Hello", 'H');
t2= clock();
printf("p is %p,\n", (void*)p);
printf("and that took %.12f seconds\n",
(double)(t2 - t1) / CLOCKS_PER_SEC);
t1= clock();
p = strrchr("Hello World!", 'H');
t2= clock();
printf("p is %p,\n", (void*)p);
printf("and that took %.12f seconds\n",
(double)(t2 - t1) / CLOCKS_PER_SEC);
t1= clock();
p = strrchr
(
"Hello the very, very, "
"quite contrary, benevolent "
"and sometimes forgiving, but also, "
"at the same time, very "
"unforgiving, World!",
'H'
);
t2= clock();
printf("p is %p,\n", (void*)p);
printf("and that took %.12f seconds\n",
(double)(t2 - t1) / CLOCKS_PER_SEC);
return 0;
}

Output:

p is 0042603C,
and that took 0.000000000000 seconds
p is 0042602C,
and that took 0.000000000000 seconds
p is 00426FA4,
and that took 0.000000000000 seconds
Press any key to continue

Regards

OK. Your arguments are very convincing, being voiced by all people that
support the c strings:

WE DO NOT CARE ABOUT OPTIMIZATION OR GOOD ALGORITHMS.
Machines are fast this days. Yes. Bad constructs can go on
forever without anyone noticing it.

Richard Heathfield · Jan 18, 2007

jacob navia said:

OK. Your arguments are very convincing, being voiced by all people that
support the c strings:

WE DO NOT CARE ABOUT OPTIMIZATION OR GOOD ALGORITHMS.

Yes, we do.

Would you mind dropping the shouting and the sarcasm and the knee-jerk
responses and the thoughtlessness and the "anyone who disagrees with me
must be an idiot" thing?

We'd get on a lot better with you if you just *tried* a little, you know.

Richard Bos · Jan 18, 2007

jacob navia said:
jaysome a écrit :

OK. Your arguments are very convincing, being voiced by all people that
support the c strings:

WE DO NOT CARE ABOUT OPTIMIZATION OR GOOD ALGORITHMS.

Wrong. Not only do we not shout, because we are not petulant children
whose favourite toy is being criticised; but also, we _do_ care about
good programming constructs. That is why, being well aware of the use of
strings in the average program, we know that counted strings are _less_
efficient under most circumstances than terminated strings, bumf and
blather notwithstanding.

Richard

Kenny McCormack · Jan 28, 2007

Richard Bos said:
good programming constructs. That is why, being well aware of the use of
strings in the average program, we know that counted strings are _less_
efficient under most circumstances than terminated strings, bumf and

Simply not true. As Jacob notes, simple dishonesty on your (and your
brethen's) part. Everybody knows that the only reason we stick with
terminated strings is because of history.

Note: I fully understand why you are lying and I'll even say that it
(doing so) is a necessary evil. But, it is a lie nonetheless.

Sorta like those WMDs... (another necessary lie)

Kenny McCormack · Jan 28, 2007

jacob navia said:

Yes, we do.

Would you mind dropping the shouting and the sarcasm and the knee-jerk
responses and the thoughtlessness and the "anyone who disagrees with me
must be an idiot" thing?

Um, Pot, Kettle, Black.

I.e., he (Jacob) learned from the best. He was not at all abusive until
quite a while after you and your ilk had been pouring it on him.

Richard Heathfield · Jan 28, 2007

Kenny McCormack said:

Um, Pot, Kettle, Black.

Not so.

I.e., he (Jacob) learned from the best. He was not at all abusive until
quite a while after you and your ilk had been pouring it on him.

Again, not so.

santosh · Jan 28, 2007

Kenny said:
Simply not true. As Jacob notes, simple dishonesty on your (and your
brethen's) part. Everybody knows that the only reason we stick with
terminated strings is because of history.

Note: I fully understand why you are lying and I'll even say that it
(doing so) is a necessary evil. But, it is a lie nonetheless.

If so, then why're you doing the disservice of exposing it?

Kenny McCormack · Jan 28, 2007

If so, then why're you doing the disservice of exposing it?

Because that is what I do.

I would, in fact, wager that most "muckrakers" - people who tell the
truth about government and other corrupt entities - actually know why
the lies are being told, but they choose to go ahead and tell the truth
(often at great personal peril) anyway, just because that is what they do.

Keith Thompson · Jan 28, 2007

santosh said:
Kenny McCormack wrote:

[more of the same]

If so, then why're you doing the disservice of exposing it?

KM is a troll. I strongly recommend ignoring him, and killfiling him
if you're so inclined.

Kenny McCormack · Jan 28, 2007

santosh said:
santosh said:

Kenny McCormack wrote:

Click to expand...

[more of the same]

If so, then why're you doing the disservice of exposing it?

Click to expand...

KM is a troll. I strongly recommend ignoring him, and killfiling him
if you're so inclined.

KT is a moron. I strongly recommend ignoring him, and killfiling him
if you're so inclined.

Richard Heathfield · Jan 29, 2007

santosh said:

If so, then why're you doing the disservice of exposing it?

If it is a lie, then it's a big one, and it should certainly be exposed. But
of course it's not a lie. Richard Bos may be many things, but he is no
liar. The reason we stick with terminated strings is... is... well, there
isn't one, because we *don't* all stick with them! At least, not all of us
do so all the time.

The C language provides support for a rudimentary string model, which makes
no great claims to be anything special, but which basically works. If that
is good enough for you, fine, use it - and it /is/ good enough for many
people, so they use it. But for many /other/ people, it isn't good enough,
because their needs (or desires, or perceptions) are different. So C makes
it fairly easy to develop your own string model in C.

I've done this myself, but nevertheless I often find myself writing C
programs using the in-built string model. Why? Well, because it's simple
and quick to cut code that way. There are reasons to use more powerful
models, of course, but those reasons don't *always* apply. When they don't
apply, the good old-fashioned C string is perfectly adequate to the task
and is generally a bit quicker from the developer's (typist's!) point of
view. Someone who spends a lot of time writing programs that don't have to
deal with (the possibility of) insanely long inputs may well find that C
strings are more efficient than so-called "counted" (or "stretchy")
strings.

Just bear in mind when replying to Kenny McCormack that his articles give
every indication that he's not interested in C, not interested in helping
people, not interested in correctness, not interested in truth - he's only
interested in trying to poke fun at those who /are/ interested in C,
helping people, correctness, and truth. Don't expect reasoning, and don't
expect a shared objective. He's just trying to wreck the group. But we
don't have to let him.

New C1X Draft	3	Feb 24, 2011
Still no dirent.h in C1X	44	Nov 24, 2011
difference in stdint.h and inttypes.h	6	Apr 1, 2012
wtf is n1570?	1	Jun 29, 2011
what is n1570?	13	Jun 29, 2011
Meaning of "maximal contiguous sub-sequence of side effects in themodification order"	4	Apr 29, 2011
Derivation of the C Standard's Formula for FLT_DIG, DBL_DIG, LDBL_DIG	9	Jul 7, 2012
Derivation of the C Standard's Formula for FLT_DIG, DBL_DIG, LDBL_DIG	0	Jul 7, 2012

Draft Secure C

Walter Roberson

Walter Roberson

Keith Thompson

Walter Roberson

websnarf

websnarf

jacob navia

Eric Sosman

jaysome

jacob navia

Richard Heathfield

Richard Bos

Kenny McCormack

Kenny McCormack

Richard Heathfield

santosh

Kenny McCormack

Keith Thompson

Kenny McCormack

Richard Heathfield

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads