Buffer Overruns other C Gotchas -- "Coders at Work"

  • Thread starter Casey Hawthorne
  • Start date
C

Casey Hawthorne

I thought of this question, of buffer overruns, after one of the
people interviewed for the book "Coders at Work" said that C was great
for systems programming by well trained programmers, but that C had
leaked out into the applications area.

For systems programming you do need the access to the machine that C
provides, but for applications programming, you don't need/shouldn't
have such access.
 
T

Tom St Denis

I thought of this question, of buffer overruns, after one of the
people interviewed for the book "Coders at Work" said that C was great
for systems programming by well trained programmers, but that C had
leaked out into the applications area.

For systems programming you do need the access to the machine that C
provides, but for applications programming, you don't need/shouldn't
have such access.

Is there a question here?

Tom
 
K

Kenny McCormack

Is there a question here?

Tom

Does there have to be? Look at Seebs's postings (specifically, the ones
directed at me): Are there any questions there?

P.S. Now there are questions in this thread (count the ? marks).
Are you happy?

P.P.S. Look at any of the postings by RH - no questions there. Ever.
Kiki? Rarely, but sometimes.
 
J

jacob navia

Casey Hawthorne a écrit :
I thought of this question, of buffer overruns, after one of the
people interviewed for the book "Coders at Work" said that C was great
for systems programming by well trained programmers, but that C had
leaked out into the applications area.

For systems programming you do need the access to the machine that C
provides, but for applications programming, you don't need/shouldn't
have such access.

The deeper problem is that the C users community doesn't even want to a knowledge this problem.

A buffer overrun is *specified* in the code of the C standard itself. The many discussions in this
group or in the similar group comp.lang.c have led to nothing. Endless discussions about trivia but
an enormous BUG specified in the C standard (the asctime() function) will be conserved as it was the
best thing to do.

The code of the asctime() function is written in the C standard as follows:

char *asctime(const struct tm *timeptr)
{
static const char wday_name[7][3] = {
"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat" };
static const char mon_name[12][3] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec" };
static char result[26];

sprintf(result, "%.3s %.3s%3d %.2d:%.2d:%.2d %d\n",
wday_name[timeptr->tm_wday],
mon_name[timeptr->tm_mon],
timeptr->tm_mday, timeptr->tm_hour,
timeptr->tm_min, timeptr->tm_sec,
1900 + timeptr->tm_year);
return result;
}

This code will provoke a buffer overflow if the year is, for instance, bigger than 8099.
Nowhere in the standard are the ranges for the year are specified.

In a “Defect Report” filed in 2001, Clive Feather proposed to fix this bug.

The answer of the committee was:

"...asctime() may exhibit undefined behavior... [ snip] .

As always, the range of undefined behavior permitted includes:
Corrupting memory, ... [snip]"

This attitude towards the C language is promoted by all people in the committee apparently since
after dozens of discussions like this one the function (and the code) is still there.

Is it because most people have decided that C should be killed and C++ should be the language of
choice?

Probably, I can't tell.

The same for any evolution of the language. The proposed new C standard to be released somewhen in
2019 or later is a textual copy of the C99 one, including (of course) functions like gets() and
asctime(). The only "concession" of the committee has been to add a footnote where it says that
gets() is deprecated.

A footnote.

Buffer overflows are no more than a footnote worth.

jacob
 
I

Ian Collins

Casey said:
I thought of this question, of buffer overruns, after one of the
people interviewed for the book "Coders at Work" said that C was great
for systems programming by well trained programmers, but that C had
leaked out into the applications area.

For systems programming you do need the access to the machine that C
provides, but for applications programming, you don't need/shouldn't
have such access.

Which is why we have operating systems...
 
K

Kenny McCormack

Casey Hawthorne a écrit :

The deeper problem is that the C users community doesn't even want to a
knowledge this problem.

A buffer overrun is *specified* in the code of the C standard itself.
The many discussions in this
group or in the similar group comp.lang.c have led to nothing. Endless
discussions about trivia but
an enormous BUG specified in the C standard (the asctime() function)
will be conserved as it was the
best thing to do.

C as a language *is* dead. (for Seebs) That's not to say that there
aren't still systems out there that use it and programmers who earn
their keep programming it, nor that there won't be for decades to come.

But the future is obviously in safer languages, given:
a) The vast improvements in technology (i.e., we can now *afford*
safe languages)
b) The vast reduction in the quality of the American educational
system.
The code of the asctime() function is written in the C standard as follows:

I like that you always bring up the asctime() function. Very good
example of the state of things.
 
T

Tom St Denis

In



Not really, even though he uses the word "question". But there's a
discussion here, certainly. Usenet is not *just* about answering
questions. And it's an interesting and thought-provoking point,
wouldn't you say?

What? That C spilling into userspace applications was a mistake?

You youngins....

Back in the day we wrote applications on bare metal. Heck I'm 27 and
I grew up on writing DOS applications that had direct control over the
VGA, Sound card [or PC speaker whichever] and other devices. If I had
to write everything in say QBASIC or something detached from
successful bit twiddling I'd probably shoot people.

Tom
 
T

Tom St Denis

Does there have to be?  Look at Seebs's postings (specifically, the ones
directed at me): Are there any questions there?

Well he writes "I thought of this question" and nowhere in his post is
a question.

"It's like I was thinking of a C specification issue, then my dog came
over to my lap and I gave it a treat."

Does that make as much sense?

Also the question of whether C is useful for userland applications is
a fairly stupid one. Of course it is. Anyone who has written an
application or two will find the same things that make C successful in
Kernel space applications are useful in userspace.

Tom
 
T

Tom St Denis

[QUOTE="jacob navia   said:
I thought of this question, of buffer overruns, after one of the
people interviewed for the book "Coders at Work" said that C was great
for systems programming by well trained programmers, but that C had
leaked out into the applications area.
For systems programming you do need the access to the machine that C
provides, but for applications programming, you don't need/shouldn't
have such access.
The deeper problem is that the C users community doesn't even want to a
knowledge this problem.
A buffer overrun is *specified* in the code of the C standard itself.
The many discussions in this
group or in the similar group comp.lang.c have led to nothing. Endless
discussions about trivia but
an enormous BUG specified in the C standard (the asctime() function)
will be conserved as it was the
best thing to do.

C as a language *is* dead.  (for Seebs) That's not to say that there
aren't still systems out there that use it and programmers who earn
their keep programming it, nor that there won't be for decades to come.

But the future is obviously in safer languages, given:
    a) The vast improvements in technology (i.e., we can now *afford*
        safe languages)
    b) The vast reduction in the quality of the American educational
        system.
The code of the asctime() function is written in the C standard as follows:

I like that you always bring up the asctime() function.  Very good
example of the state of things.[/QUOTE]

Really?

tstdenis@photon:~$ cat test.c
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
struct tm t;
char *p;
memset(&t, 0, sizeof t);
t.tm_year = 10000;
p = asctime(&t);
printf("%p %s\n", p, p);
return 0;
}

produces:

tstdenis@photon:~$ ./t
0x7fcb834ec540 Sun Jan 0 00:00:00 11900

So not only does it print out a result (by adding 1900 to it) but it
didn't crash.

Maybe the C spec gives an EXAMPLE routine that is meant to be used in
a certain scenario, but it definitely seems like GNU libc is sane.

In short, your opinions are ill founded and totally without merit.

Tom
 
J

jacob navia

Tom St Denis a écrit :
Maybe the C spec gives an EXAMPLE routine that is meant to be used in
a certain scenario, but it definitely seems like GNU libc is sane.

Lcc-win doesn't crash either.

I did NOT said that "all implementations of asctime will crash"

What I said is that the code in the text of the C standard
will crash. And I quoted that code. FOR A REASON.

But you are unable to understand what the other person says.

Your conclusion is fixed even before you READ what the other guy
is saying:
In short, your opinions are ill founded and totally without merit.

It is not *my opinion* since if you use the code of the C standard
the buffer that contains 26 positions will NOT support writing
a number with 5 digits!

In short:

You do not read the posts you answer to.
 
T

Tom St Denis

Tom St Denis a écrit :


Lcc-win doesn't crash either.

I did NOT said that "all implementations of asctime will crash"

What I said is that the code in the text of the C standard
will crash. And I quoted that code. FOR A REASON.

I don't get what the reason is though. If they claim the function
only behaves in a certain range of values how is it ANY DIFFERENT FROM
SAY

char buf[4];
memcpy(buf, "sdklsdhfjkshdfjksdhfjksd", 10);

???

Are you now saying that memcpy is unsafe?

What about

snprintf(buf, 200, "waytooomuchtext");

???

So you're calling asctime with invalid parameters. Maybe your app
should sanity check its inputs?

Tom
 
J

jacob navia

Tom St Denis a écrit :
I don't get what the reason is though. If they claim the function
only behaves in a certain range of values how is it ANY DIFFERENT FROM
SAY

char buf[4];
memcpy(buf, "sdklsdhfjkshdfjksdhfjksd", 10);

???

Are you now saying that memcpy is unsafe?

What about

snprintf(buf, 200, "waytooomuchtext");

???

So you're calling asctime with invalid parameters. Maybe your app
should sanity check its inputs?

Tom

You still do not read what I said. I said that nowhere in the C standard the ranges
for the year are specified! And that Mr Cleaver in 2001 presented a defect report
precisely because of this. It is all in my post that you refuse to READ!

Your attitude is a further proof of what I said in the first sentence of my post:

"The deeper problem is that the C users community doesn't even want to a acknowledge this problem."

CAN YOU READ?

Then READ my post before answering.

Thanks
 
T

Tom St Denis

Tom St Denis a écrit :


I don't get what the reason is though.  If they claim the function
only behaves in a certain range of values how is it ANY DIFFERENT FROM
SAY
char buf[4];
memcpy(buf, "sdklsdhfjkshdfjksdhfjksd", 10);

Are you now saying that memcpy is unsafe?
What about
snprintf(buf, 200, "waytooomuchtext");

So you're calling asctime with invalid parameters.  Maybe your app
should sanity check its inputs?

You still do not read what I said. I said that nowhere in the C standard the ranges
for the year are specified! And that Mr Cleaver in 2001 presented a defect report
precisely because of this. It is all in my post that you refuse to READ!

Your attitude is a further proof of what I said in the first sentence of my post:

"The deeper problem is that the C users community doesn't even want to a acknowledge this problem."

CAN YOU READ?

Then READ my post before answering.

Thanks

So this is a flaw in C because? Nowhere have I read that asctime must
be written that way, and in fact glibc doesn't have such a defect.

Maybe I'm missing the part where you had a point.

Tom
 
T

Tom St Denis

Tom St Denis a écrit :
I don't get what the reason is though.  If they claim the function
only behaves in a certain range of values how is it ANY DIFFERENT FROM
SAY
char buf[4];
memcpy(buf, "sdklsdhfjkshdfjksdhfjksd", 10);
???
Are you now saying that memcpy is unsafe?
What about
snprintf(buf, 200, "waytooomuchtext");
???
So you're calling asctime with invalid parameters.  Maybe your app
should sanity check its inputs?
Tom
You still do not read what I said. I said that nowhere in the C standard the ranges
for the year are specified! And that Mr Cleaver in 2001 presented a defect report
precisely because of this. It is all in my post that you refuse to READ!
Your attitude is a further proof of what I said in the first sentence of my post:
"The deeper problem is that the C users community doesn't even want to a acknowledge this problem."
CAN YOU READ?
Then READ my post before answering.

So this is a flaw in C because?  Nowhere have I read that asctime must
be written that way, and in fact glibc doesn't have such a defect.

Maybe I'm missing the part where you had a point.

Tom

And in fact the ISO C99 spec says "... using the equivalent of this
function."

Nowhere does it state that you should use that function nor is that
the official version of the function to use. Only that for valid
inputs that's what the behaviour should be.

And really it should be fixed, but it's a really petty thing to base a
"C is useless" argument on.

Tom
 
J

jacob navia

Tom St Denis a écrit :
So this is a flaw in C because? Nowhere have I read that asctime must
be written that way, and in fact glibc doesn't have such a defect.

Maybe I'm missing the part where you had a point.

Tom

So you think it is a good thing to have code that overflows its buffer in the
C standard itself???

As an "EXAMPLE" ???

I am discussing the flaw in the C standard as an example of C code that provokes
buffer overflows. And as an example of the refusal of many people in the C
community to acknowledge that buffer overflows are a serious thing.

You are proving that this attitude towards buffer overflows is widespread.

Thanks for your help.

jacob
 
T

Tom St Denis

So you think it is a good thing to have code that overflows its buffer in the
C standard itself???

I didn't say that's a good thing. I didn't say I'd write code like
that. I didn't say it shouldn't be fix.

What I ***DID*** say is that I don't see what this has to do with the
price of tea in china, let alone whether C is a good language for
userspace application development.

I can improperly use any programming language. I don't get what the
point is though. Sure they should fix it. But it doesn't mean I'll
throw away the years of training and experience writing C applications
because some twat writing snippets for a C spec goofed on something.

I don't get you people, form another group if all you want to do is
talk about what's NOT in the C language spec.

I mean, are you a happier, better person for having trolled clc? Does
it fulfill your life? Are you more complete now? I don't get your
motivations. Maybe that's what bugs me the most about trolls. I've
personally come to the realization that my life is finite and I best
use it the way I think I'll enjoy most. Presumably you have similar
goals. So do you enjoy trolling usenet? Is that what makes your life
all that you want it to be?

....

Tom
 
K

Keith Thompson

Tom St Denis said:
Really?

tstdenis@photon:~$ cat test.c
#include <time.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void)
{
struct tm t;
char *p;
memset(&t, 0, sizeof t);
t.tm_year = 10000;
p = asctime(&t);
printf("%p %s\n", p, p);
return 0;
}

produces:

tstdenis@photon:~$ ./t
0x7fcb834ec540 Sun Jan 0 00:00:00 11900

So not only does it print out a result (by adding 1900 to it) but it
didn't crash.

Maybe the C spec gives an EXAMPLE routine that is meant to be used in
a certain scenario, but it definitely seems like GNU libc is sane.

Your program's behavior is undefined. On an implementation that
uses the code provided in the standard, asctime will write past
the end of the static "result" string, with unpredictable results.
It happens to behave sanely in your case, just as "i = i++;" might
happen to behave sanely.

And that code is not just an example routine; it's the definition of
the algorithm:

The asctime function converts the broken-down time in the
structure pointed to by timeptr into a string in the form

Sun Sep 16 01:03:52 1973\n\0

using the equivalent of the following algorithm.

[code snipped]

There is an escape clause for implementers here. An algorithm can
be considered *equivalent* to the one provided if it produces the
same result and behavior in all cases where the behavior of the
provided implementation is defined. In cases where the provided
implementation's behavior is not defined, the implementation can do
anything it likes, including using a bigger buffer. And the glibc
implementation does this; it uses a 114-character buffer rather than
a 26-character buffer, avoiding overflow for any possible arguments.
But it's not *required* to do so.

On another system (Solaris 9), your program dies with a segmentation
fault. This behavior is admittedly unfriendly, but it does not
make the implementation non-conforming. (Note that it does so
whether I compile with Sun's compiler or with gcc; in either case,
it uses Sun's C library, not glibc.)
In short, your opinions are ill founded and totally without merit.

I agree that jacob's opinions on this matter are ill founded, though
they're not totally without merit.

The behavior of most calls to asctime() is perfectly well defined
(though it's not a behavior I personally find useful; I dislike
the date format it imposes, the addition of a '\n' character, and
the use of a static buffer).

It's possible to invoke asctime() with an argument that makes its
behavior undefined, as you've unintentionally demonstrated. The same
is true of most functions in the C standard library:

char s[5];
strcpy(s, "hello, world");

And yet I don't see jacob claiming that strcpy() itself is "an
enormous BUG specified in the C standard".

The difference, I suppose, is that the circumstances in which
asctime()'s behavior is undefined are a bit more difficult to define
(for example, you can get away with more than 2 digits for tm_sec if
you use fewer than 4 digits for tm_year). On the other hand, those
circumstances are defined, clearly and unambiguously, by the code in
the standard. The definition of asctime() is probably the closest
thing the standard has to a formal specification. The behavior
is rigorously defined in certain cases (all the normal ones), and
clearly undefined in others. Don't call it with unusual values,
and you can depend on it to work as specified.
 
T

Tom St Denis

Your program's behavior  is  undefined.  On an implementation that
uses the code provided in the standard, asctime will write past
the end of the static "result" string, with unpredictable results.
It happens to behave sanely in your case, just as "i = i++;" might
happen to behave sanely.

My point more so was that as an instantiation of an environment in
which C may be used it's not always so fubared or dramatic. Ideally,
if the spec says the output can only be 26 chars the Sun platform
should return an appropriate error condition instead of crashing. And
that's THEIR fault for so blindly copying it.
There is an escape clause for implementers here.  An algorithm can
be considered *equivalent* to the one provided if it produces the
same result and behavior in all cases where the behavior of the
provided implementation is defined.  In cases where the provided
implementation's behavior is not defined, the implementation can do
anything it likes, including using a bigger buffer.  And the glibc
implementation does this; it uses a 114-character buffer rather than
a 26-character buffer, avoiding overflow for any possible arguments.
But it's not *required* to do so.

I just read the C99 spec for asctime. Nowhere does it say the buffer
can only be 26 bytes. It describes what the output format must look
like, but never mentions the length. The C code happens to mention
the length in passing but I really consider the C code an example [a
poor one] that produces the desired output format.
It's possible to invoke asctime() with an argument that makes its
behavior undefined, as you've unintentionally demonstrated.  The same
is true of most functions in the C standard library:

I haven't read anywhere that says you can't have a year of 11900. I
also don't consider the C code in the spec to define how the algorithm
that produces the output must be written. To me the definition of
asctime() is

---
The asctime function converts the broken-down time in the structure
pointed to by timeptr into a string in the form
Sun Sep 16 01:03:52 1973\n\0
---

The broken C code serves to explain the different textual elements of
the output.
    char s[5];
    strcpy(s, "hello, world");

And yet I don't see jacob claiming that strcpy() itself is "an
enormous BUG specified in the C standard".
Exactly.

The difference, I suppose, is that the circumstances in which
asctime()'s behavior is undefined are a bit more difficult to define
(for example, you can get away with more than 2 digits for tm_sec if
you use fewer than 4 digits for tm_year).  On the other hand, those
circumstances are defined, clearly and unambiguously, by the code in
the standard.  The definition of asctime() is probably the closest
thing the standard has to a formal specification.  The behavior
is rigorously defined in certain cases (all the normal ones), and
clearly undefined in others.  Don't call it with unusual values,
and you can depend on it to work as specified.

Well I think that's the bigger point here, these C functions have
explicit/implicit assumptions of the inputs.

I haven't read anywhere that explicitly states you can't pass memcpy()
[section 7.21.2.1] NULL as one of the pointers. We just "know" that
because dereferencing a NULL pointer leads to undefined behaviour.
Similarly, by reading the code, assuming you assume that that is the
way your function is implemented it's obvious that you can't have a 5+
digit year. That's an implicit assumption based on the behaviour of
the function based on the description.

And really that's the point. We have people who are not strong
software developers bitching about the fact that they have to sanitize
and properly test their inputs. They'd rather hack together whatever
they can as fast and as incoherently as possible and are pissed that
software development is ACTUALLY REALLY HARD WORK.

Tom
 
K

Keith Thompson

jacob navia said:
The deeper problem is that the C users community doesn't even want
to a knowledge this problem.

A buffer overrun is *specified* in the code of the C standard
itself. The many discussions in this group or in the similar group
comp.lang.c have led to nothing. Endless discussions about trivia but
an enormous BUG specified in the C standard (the asctime() function)
will be conserved as it was the best thing to do.

The code of the asctime() function is written in the C standard as follows:

char *asctime(const struct tm *timeptr)
{ [snip]
}

This code will provoke a buffer overflow if the year is, for
instance, bigger than 8099. Nowhere in the standard are the ranges
for the year are specified.

Then how do you know that the limit is 8099?

The range for tm_year is not specified explicitly, and I agree that it
would be helpful if it were. But the range is rigorously specified by
the code provided in the standard. You and I were able to figure it
out.

Consider:

char s[5];
strcpy(s, "hello, world");

Is this a bug in strcpy, or a bug in the code that uses it?

Stealing from Tom St Denis's followup:

struct tm t;
char *p;
memset(&t, 0, sizeof t);
t.tm_year = 10000;
p = asctime(&t);

Is this a bug in asctime, or a bug in the code that uses it?

If your answers are different, can you explain the difference?

Let me be clear. I don't like asctime(). I never use it, other than
in small programs intended to test the functionality of asctime()
itself. I personally think it should be deprecated. But the fact
that its behavior is undefined *given certain arguments* is something
it shares with most other functions in the standard library. And
(the following is just my opinion), given that it's already rigorously
specified, I don't think it would be worth the committee's time to
improve the specification or change the behavior. Why sharpen the
corners on a square wheel?

If you're an implementer, you can provide an asctime() implementation
that doesn't blow up unless you give it a bad pointer (glibc has
done this; I presume you have as well, though I wonder whether yours
is 100% conforming in the corner cases). If you're a programmer,
you can avoid calling asctime() with exotic argument values, or
you can avoid calling asctime() altogether and use the much more
flexible strftime() instead.

Has there been an epidemic of crashing software caused by bad calls
to asctime()? The worst I've seen is a few extraneous newlines in
log files, which had nothing to do with undefined behavior.

It's just not as big a problem as you repeatedly make it out to be.

Ok, if there were a bug in the standard, a clause that implicitly
requires a buffer overflow to occur, that would be a problem worth
fixing, even if it didn't have much effect on the real world.
But that's just not the case here. (It is for gets(), which is
being deprecated, and I'd also be happier if that had been done
sooner or at a quicker pace.)

[...]
Is it because most people have decided that C should be killed and C++
should be the language of choice?

Of course not, don't be silly. Do you seriously think that someone
who has decided C should be killed would spend time and money
serving on the C standard committee? You're really not very good
at inferring other people's motivations.
Probably, I can't tell.

The same for any evolution of the language. The proposed new C
standard to be released somewhen in 2019 or later is a textual copy
of the C99 one, including (of course) functions like gets() and
asctime(). The only "concession" of the committee has been to add a
footnote where it says that gets() is deprecated.

The proposed new standard is a work in progress. We don't know
how closely it will match the early drafts that have been released
so far. In particular, the changes that have been made so far
are probably not representative of the changes that will be made;
they're just what the committee got to first, and I don't think
they're doing the work in decreasing order of importance.

[A personal note: jacob, I recently sent you a private e-mail message
regarding something you wrote in another thread. I would appreciate
a response, either by e-mail or as a followup in the relevant thread.
Thank you.]
 
K

Keith Thompson

jacob navia said:
You still do not read what I said. I said that nowhere in the C
standard the ranges for the year are specified! And that Mr Cleaver
in 2001 presented a defect report precisely because of this. It is
all in my post that you refuse to READ!
[...]

That was Clive Feather, not Mr Cleaver.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top