A very interesting book

K

Keith Thompson

Dann Corbit said:
OK.
Where is the newsgroup where discussions about reparing the most
significant defect in the C language are topical?

comp.std.c.

Is that what we're discussing? Are there proposals for changes to the
next version of the standard?
 
J

jameskuyper

jacob said:
Dann Corbit wrote: .... ....
The same answer always:

"There are no problems with C. Only with lazy programmers
that do not know how to do their job".

comp.lang.c has nothing to do with the real world.

Interesting prediction. Does it bother you that there's essentially no
similarity between your predicted answer and the two answers that he's
actually received to that question?
 
L

lawrence.jones

In comp.std.c jacob navia said:
Specially because those limits ARE NOT EVEN MENTIONED in the
standards document. They can be inferred by reading the code
and seeing where it would overflow!

The normal ranges of the struct tm members are mentioned quite
prominently in 7.23.1p4 where they're specified. I would have thought
it fairly obvious that passing out of range values to any function that
doesn't explicitly document that it allows them is a bad idea and likely
to result in undefined behavior. The only one that needs to be inferred
is the range for tm_year.
 
K

Keith Thompson

Richard Heathfield said:
[followups set to comp.lang.c]

[followups overridden for now, since what I have to say is relevant to
comp.std.c]
Dann Corbit said: [...]
I am surprised that you don't see any connection between buffer overruns
and the C language.

I am surprised that you don't see any connection between buffer overruns
and programming languages in general.

If you *can't* overrun a buffer in a given language, then that language
isn't as powerful as it could be and, some would say, as it ought to be.
Safety restrictions, no matter how praiseworthy they may be on their own
merits, nevertheless represent a diminution of freedom and power.
It's a fundamental defect inherent in its design.

No, it's a fundamental risk inherent in the provision of power - i.e. that
the power might be misused, either through accident or design.

If you want Ada, you know where to find it.

Ada is as powerful as C. It doesn't forbid unsafe actions, it merely
requires you to specify them explicitly in most cases. For example,
to interpret an integer as a pointer (something C allows with a simple
cast), you have to instantiate Unchecked_Conversion and then call the
instance. (Strictly speaking, the C cast performs a type conversion,
not a reinterpretation, but it's implemented as a reinterpretation in
every implementation I've seen.)

It would be nice if C could achieve similar safety in normal use,
while still allowing unsafe constructs that are sometimes necessary in
practice, without changing the language so radically that most
existing code would be broken. I'm skeptical that this is possible,
but I'd be interested in seeing any concrete proposals.

If you want C with Ada-like safety (say, because you really like curly
braces), I think your best bet is to invent a new language.
 
K

Keith Thompson

The normal ranges of the struct tm members are mentioned quite
prominently in 7.23.1p4 where they're specified. I would have thought
it fairly obvious that passing out of range values to any function that
doesn't explicitly document that it allows them is a bad idea and likely
to result in undefined behavior. The only one that needs to be inferred
is the range for tm_year.

Yes, but calling asctime() with tm_hour==99 and all other members in
their normal ranges *doesn't* invoke undefined behavior, because the
implementation is required to use an algorithm equivalent to the
sample code presented in the standard.

I understand why the definition of asctime() was presented using
sample code, unlike every other function in the standard; it's much
easier to describe it that way than it would be in English. But a
consequence of that decision is that the behavior in the edge cases
can be a bit odd.

(Yes, sample implementations are provided for rand() and srand(), but
an actual implementation isn't required to be equivalent to the
samples.)

In my opinion:

(1) The standard committee should re-open DR 217
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_217.htm> and
fix the definition of asctime() so that (a) it remains compatible
with the current definition, and (b) undefined behavior does not
occur for any argument that's a valid pointer to a struct tm;

(2) Implementations should implement asctime() in a way that doesn't
cause an internal buffer overflow, even though the standard
doesn't currently require this (I think this is most safely done
*without* making the buffer bigger); and

(3) Users should either use asctime() very carefully, or should avoid
it in favor of the more versatile strftime().
 
K

Keith Thompson

Richard Heathfield said:
[followups set to comp.lang.c]

[followups overridden for now, since what I have to say is relevant to
comp.std.c]
Dann Corbit said: [...]
I am surprised that you don't see any connection between buffer overruns
and the C language.

I am surprised that you don't see any connection between buffer overruns
and programming languages in general.

If you *can't* overrun a buffer in a given language, then that language
isn't as powerful as it could be and, some would say, as it ought to be.
Safety restrictions, no matter how praiseworthy they may be on their own
merits, nevertheless represent a diminution of freedom and power.
It's a fundamental defect inherent in its design.

No, it's a fundamental risk inherent in the provision of power - i.e. that
the power might be misused, either through accident or design.

If you want Ada, you know where to find it.

Ada is as powerful as C. It doesn't forbid unsafe actions, it merely
requires you to specify them explicitly in most cases. For example,
to interpret an integer as a pointer (something C allows with a simple
cast), you have to instantiate Unchecked_Conversion and then call the
instance. (Strictly speaking, the C cast performs a type conversion,
not a reinterpretation, but it's implemented as a reinterpretation in
every implementation I've seen.)

It would be nice if C could achieve similar safety in normal use,
while still allowing unsafe constructs that are sometimes necessary in
practice, without changing the language so radically that most
existing code would be broken. I'm skeptical that this is possible,
but I'd be interested in seeing any concrete proposals.

If you want C with Ada-like safety (say, because you really like curly
braces), I think your best bet is to invent a new language.
 
K

Keith Thompson

Mark McIntyre said:
Oh, god - has Navia hijacked this thread for one of his crusades?

No, he *started* this thread (cross-posted to comp.std.c and
comp.lang.c).
 
J

jameskuyper

Keith Thompson wrote:
....
(1) The standard committee should re-open DR 217
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_217.htm> and
fix the definition of asctime() so that (a) it remains compatible
with the current definition, and (b) undefined behavior does not
occur for any argument that's a valid pointer to a struct tm;

I think that this should be modifed to "a valid pointer to a struct
tm, none of whose members contain a trap representation". For most
implementations, that won't make any difference. However, for
implementations where 'int' has trap representations, ensuring defined
behavior when that condition is not met would be very burdensome.
 
K

Keith Thompson

Followups directed to comp.std.c, since we're now focusing on standard
issues.

Keith Thompson wrote:
...

I think that this should be modifed to "a valid pointer to a struct
tm, none of whose members contain a trap representation". For most
implementations, that won't make any difference. However, for
implementations where 'int' has trap representations, ensuring defined
behavior when that condition is not met would be very burdensome.

Agreed, good catch.

And saying that "undefined behavior does not occur", as I suggested
above, is hardly sufficient.

Given that the argument is a valid pointer to [see above], I suggest
either:

(a) For any member values for which the behavior of the currently
specified algorithm is undefined, the buffer after asctime()
returns must contain a valid (null-terminated) string; or

(b) If all the member values are within their normal ranges, the
behavior is as currently specified. Otherwise, the buffer after
asctime() returns must contain a valid (null-terminated) string.

The difference is that (b) could change the behavior of code that
depends on the current definition *and* that calls asctime() with
member values outside their normal ranges. The question is whether
this is a problem. I think approach (b) is cleaner, but it could
break some code (that arguably deserves to be broken anyway).

Rather than just "valid", we might want to insist that strlen()
applied to the result returns 25, and that all characters in the
string are printable (see isprint()) except for a mandatory trailing
'\n'. Or it might not be worth being that specific.

And we need to define the "normal range" for tm_year, and specify that
we don't care about the value of tm_isdst.

I think it would also be worth mentioning in a footnote that asctime()
is a legacy function (and perhaps even deprecated), and strftime() is
recommended as a more flexible alternative.
 
C

CBFalconer

Dann said:
.... snip ...


Where is the newsgroup where discussions about reparing the most
significant defect in the C language are topical?

That would be comp.std.c. Not c.l.c.
 
C

CBFalconer

jacob said:
Keith Thompson wrote:
.... snip ...


The C standard shows a piece of code that will overflow its static
buffer if used with a year value greater than 8900 (if I remember
correctly)

Similarly, if the month value is greater than 12 it will
show UB.

Obviously, showing such a piece of code is a reminder to the rest
of the world how much the standard cares about buffer overflows.

The discussion in this group confirms this. Look at Mr Thomson:


Absolutely not. I derived the formula for the EXACT size of the
buffer in this discussion group. It is relatively simple. The
only thing that needs to be changed is the "26" in the size of
the buffer.

Nothing needs derivation. The standard adequately specifies
everything. First, see the definition of "struct tm", which
follows:

[#3] The types declared are size_t (described in 7.17);

clock_t
and
time_t

which are arithmetic types capable of representing times;
and
struct tm

which holds the components of a calendar time, called the
broken-down time.

[#4] The tm structure shall contain at least the following
members, in any order. The semantics of the members and
their normal ranges are expressed in the comments.251)

int tm_sec; // seconds after the minute -- [0, 60]
int tm_min; // minutes after the hour -- [0, 59]
int tm_hour; // hours since midnight -- [0, 23]
int tm_mday; // day of the month -- [1, 31]
int tm_mon; // months since January -- [0, 11]
int tm_year; // years since 1900
int tm_wday; // days since Sunday -- [0, 6]
int tm_yday; // days since January 1 -- [0, 365]
int tm_isdst; // Daylight Saving Time flag

The value of tm_isdst is positive if Daylight Saving Time is
in effect, zero if Daylight Saving Time is not in effect,
and negative if the information is not available.

____________________

251The range [0, 60] for tm_sec allows for a positive leap
second.

Note the allowable range for values. Then see the definition for
asctime:

7.23.3.1 The asctime function

Synopsis
[#1]
#include <time.h>
char *asctime(const struct tm *timeptr);

Description

[#2] The asctime function converts the broken-down time in
the structure pointed to by timeptr into a string in the
form
Sun Sep 16 01:03:52 1973\n\0

using the equivalent of the following algorithm.

char *asctime(const struct tm *timeptr) {
static const char wday_name[7][3] = {
"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"
};
static const char mon_name[12][3] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
};
static char result[26];

sprintf(result, "%.3s %.3s%3d %.2d:%.2d:%.2d %d\n",
wday_name[timeptr->tm_wday],
mon_name[timeptr->tm_mon],
timeptr->tm_mday, timeptr->tm_hour,
timeptr->tm_min, timeptr->tm_sec,
1900 + timeptr->tm_year);
return result;
}

Returns

[#3] The asctime function returns a pointer to the string.

All of which guarantees no overflows for any legitimate value,
provided only that the year does not exceed 9999, or become
negative. Note that there is no buffer for the user to create.
This is intimately connected with the use of the word 'static'.

There is no point to raving about non-existent failings in the C
standard.
 
C

CBFalconer

jacob said:
Specially because those limits ARE NOT EVEN MENTIONED in the
standards document. They can be inferred by reading the code
and seeing where it would overflow!

Pure unmitigated nonsense. The limits are clearly set out in the
description of "struct tm". You would do well to take some time
off and spend it reading the existing standard. As a poor
substitute you could follow my recent postings in c.l.c.
 
C

CBFalconer

Dann said:
Richard Heathfield said:
Dann Corbit wrote:
[snip]
The cost is not less than if Grand Coulee dam had broken wide
open and destroyed all property downsteam in the state of
Washington.

Given that Redmond is downstream of that dam, this issue might
not be as simple as at first appears.

Protected by the Cascade mountains, I am afraid.

Surely the combined thought processes of all c.l.c readers can
overcome that minor difficulty? WARNING: It might require taxes.
 
C

CBFalconer

Keith said:
.... snip ...

If you want C with Ada-like safety (say, because you really like
curly braces), I think your best bet is to invent a new language.

Or an old one. Try ISO10206.
 
H

Harald van Dijk

Pure unmitigated nonsense. The limits are clearly set out in the
description of "struct tm".

Please explain how

"int tm_year; // years since 1900"

specifies that tm_year must be between -2899 and 8099. If you can't, but
you do agree that such a limit exists when calling asctime, then obviously
the limits are not clearly set out in the description of "struct tm".
 
K

Keith Thompson

CBFalconer said:
Nothing needs derivation. The standard adequately specifies
everything.

Not directly, I'm afraid.
First, see the definition of "struct tm", which
follows:

[#3] The types declared are [...]
and
struct tm

which holds the components of a calendar time, called the
broken-down time.

[#4] The tm structure shall contain at least the following
members, in any order. The semantics of the members and
their normal ranges are expressed in the comments.251)

int tm_sec; // seconds after the minute -- [0, 60]
int tm_min; // minutes after the hour -- [0, 59]
int tm_hour; // hours since midnight -- [0, 23]
int tm_mday; // day of the month -- [1, 31]
int tm_mon; // months since January -- [0, 11]
int tm_year; // years since 1900
int tm_wday; // days since Sunday -- [0, 6]
int tm_yday; // days since January 1 -- [0, 365]
int tm_isdst; // Daylight Saving Time flag [...]
Note the allowable range for values.

What "allowable range"? It describes the "normal ranges", not the
same thing at all.
Then see the definition for
asctime:

7.23.3.1 The asctime function

Synopsis
[#1]
#include <time.h>
char *asctime(const struct tm *timeptr);

Description

[#2] The asctime function converts the broken-down time in
the structure pointed to by timeptr into a string in the
form
Sun Sep 16 01:03:52 1973\n\0

using the equivalent of the following algorithm.

char *asctime(const struct tm *timeptr) { [snip]
}

Returns

[#3] The asctime function returns a pointer to the string.

All of which guarantees no overflows for any legitimate value,
provided only that the year does not exceed 9999, or become
negative. Note that there is no buffer for the user to create.
This is intimately connected with the use of the word 'static'.
[...]

That's correct as far as it goes (though as Harald pointed out, the
value of 9999 can only be derived by studying the code of the sample
implementation). But the "normal ranges" for the members of struct tm
are a *subset* of the values that yield well-defined behavior for
asctime. For example, this rather bizarre program:

#include <time.h>
#include <stdio.h>
int main(void)
{
struct tm foo;
foo.tm_year = -1891;
foo.tm_mon = 11;
foo.tm_mday = 999;
foo.tm_hour = 999;
foo.tm_min = 999;
foo.tm_sec = 99;
foo.tm_wday = 6;

fputs(asctime(&foo), stdout);
return 0;
}

*must* print

Sat Dec999 999:999:99 9

on any conforming implementation, because of the "equivalent of the
following algorithm" clause. Implementations are not allowed the
freedom to exhibit undefined behavior whenever any of the members are
outside their normal ranges, because the presented algorithm doesn't
do so.

Note also that the struct tm object pointed to by the argument to
mktime is explicitly allowed to have members whose values are outside
their normal ranges. The same permission applies to asctime() simply
because no restriction is given, other than the implicit restriction
to values that don't cause undefined behavior.
 
K

Keith Thompson

CBFalconer said:
Or an old one. Try ISO10206.

ISO 10206 is Extended Pascal. First, it would have been polite to
mention that fact; second, it hardly qualifies as "C with Ada-like
safety".
 
N

Nick Keighley

Nick Keighleywrote:




Such libraries permits new types of attack.

I'm assuming the string libraries are based on something like

struct __string
{
size_t __size;
char *__buffer;
};

I don't agree there are new forms of attack. There are old forms of
attack such as corrupting the internal data structures with pointers.
These are probably forms of Undefined Behaviour anyway. They can't
be fixed without severe limits on pointer arithmatic.

I submit that a secure string library *could* be written.
It might not be easy but it could be done. This Design by
Contract.
Actual strings forces reading the memory from
beginning, so strings should terminate earlier
(by a "random" 0 character, or a segmentation fault).

I don't understand the above.
On new libraries, the libraries could take advantage
of new fields, and thus it could access the end of
the string, so in kernel or other library space.

again. I don't understand
OTOH new strings will reduce programmer error, and
thus start of attacks.
yes

Anyway alternate strings libraries already exists,
but it seems that they are not widely used, so
I don't think they are ready to be standardized.
maybe

BTW, an alternate string library should really
have a good design, allowing simple plug-in
of i18n, so reducing transition costs.
maybe

FYI I found n1173, with some rationale on these goals:
1.1.6 Preserve the null terminated string datatype
1.1.7 Do not require size arguments for unmodified strings
1.1.9 Library based solution

interesting
 
G

Giacomo Catenazzi

Nick said:
I'm assuming the string libraries are based on something like

struct __string
{
size_t __size;
char *__buffer;
};

I don't agree there are new forms of attack. There are old forms of
attack such as corrupting the internal data structures with pointers.
These are probably forms of Undefined Behaviour anyway. They can't
be fixed without severe limits on pointer arithmatic.

I submit that a secure string library *could* be written.
It might not be easy but it could be done. This Design by
Contract.

I agree, secure string library could be written.
I pointed on possible problems on the size based strings.
Which was the part you did not understand (sorry for my
bad English). So I try again.

On C-string, library should all read memory from start to the end.

On Unix, the stack, code, heap, libraries and kernel are in different
memory region.

So a bad c-string will segfault before to change region,
which limits some attacks (or probably at a zero memory
which is found before region change).

With size-based-string, an implementation COULD skip
checking memory ranges and thus allowing easily to access
other memory regions.

Note: in c-string the same attack could be done by changing
the string pointer, so the problem is mainly on starting
an attack, and using only the string data
(removing terminating zero or changing size)


Combining the two method (reading all the memory (or
checking regions/allocation ranges), checking also the
size) would improve security, but the few libraries
I've see, use size also for efficient string maipulation,
without further checking.

ciao
cate
 
C

CBFalconer

Keith said:
.... snip ...
[#3] The asctime function returns a pointer to the string.

All of which guarantees no overflows for any legitimate value,
provided only that the year does not exceed 9999, or become
negative. Note that there is no buffer for the user to create.
This is intimately connected with the use of the word 'static'.
[...]

That's correct as far as it goes (though as Harald pointed out,
the value of 9999 can only be derived by studying the code of
the sample implementation). But the "normal ranges" for the
members of struct tm are a *subset* of the values that yield
well-defined behavior for asctime. For example, this rather
bizarre program:

Well, to me that is just normal C behaviour, because the language
doesn't have the ability to define sub-ranges (a la Pascal) for
acceptance. Of course the behaviour you quoted, of compensating
for out-of-range values, is rather hard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,067
Latest member
HunterTere

Latest Threads

Top