lcc-win32

K

Keith Thompson

jacob navia said:
Should return zero and set errno to EDOM. Zero would make NULL
identical to the empty string, what is consistent with the
usage many applications make of NULL.

Do you think that the standard should be changed to require
strlen(NULL) to return zero for all implementations?

A null pointer and a pointer to an empty string are two entirely
different things. You propose muddying the distinction. Given the
following:

char *x = "";
char *y = NULL;

x points to a string, and y does not. The fact that strlen(NULL)
invokes undefined behavior is entirely deliberate; the solution, for
any programmer, is Don't Do That.
The domain of strlen is the set of non-empty strings. Hence, EDOM
is a correct indicator.

Does "non-empty strings" exclude ""? Do you want strlen("") to set
errno to EDOM? I doubt that you intended that, but it is implied by
what you wrote.
The behavior now (just undefined) allows for the above implementation,
Specifying this would make the library more consistent.

No, it wouldn't. It would make its behavior more well-defined in some
cases, but it would make it less consistent, in my opinion.

And the problem of undefined behavior for some calls to library
functions is not solvable without a major redesign. You want
strlen(NULL) to be well-behaved. What about strlen((char*)0xdeadbeef)?
What about strlen(buf), given either
char buf[5] = "hello"; /* no '\0' character */
or
char buf[5]; /* uninitialized */
? And if you're going to require consistent behavior for these cases,
how do you expect it to be implemented? Keep in mind that it would
have to be implemented on *all* systems that support C.

Defining the behavior of strlen(NULL) plugs one minor hole in an
edifice composed of thinly sliced Swiss cheese.

[...]
Why do you quote my questions when you don't intend to answer them?
The problem is made worst when the standard publishes code
like
char buffer[26];

Why did they specify asctime with all this detail?

Probably because a C implementation is the easiest way to specify the
algorithm for asctime(). Its behavior is specified in 19 lines of C
code (including the empty line after the declaration of "result").
Try specifying the same algorithm in English. I suspect the resulting
description would be much longer and more ambiguous, and we'd now be
arguing about whether certain calls invoke undefined behavior or not
rather than about the fact that certain calls unambiguously do invoke
indefined behavior.
There are many other functions where code would have been useful.

Not really. Most other functions in the standard library either have
simple enough semantics that they can be described unambiguously in
English (like strlen()), or cannot be portable implemented in standard
C (like fopen()). (The standard provides sample implementations of
srand() and rand(); unfortunately, they're not very good, and too many
implementers have just copied them. The result is that you need to
use non-standard functions if you want decent random numbers.)
No, they had to publish code that overflows and crashes at
the slightest error.

I don't consider calling asctime() (which I consider to be a legacy
function) an argument representing a year after 9999 to be "the
slightest error". Don't do that.
I strive in my programs to write code as bulletproof as possible.
This can be done in C.

If C is to be understood as a language where "anything goes", we
have to reach the conclusion that C is to be avoided at all
costs in any serious application.

Defensive programming and error analysis are part of a system
design, and it has to be done. C is no exception: programs
should be robust and handle gracefully incorrect inputs
without blowing up.

Sure, which means that it's each programmer's responsibility to avoid
calling library functions with invalid arguments. We all know that C
gives you enough rope to shoot yourself in the foot. It's a highly
successful language in spite of that.

[...]
The discussion in comp.std.c didn't lead to anything.

We told you that comp.std.c, not comp.lang.c, is the place to discuss
proposed changes to the language, and that's still the case. Nobody
promised you that the folks in comp.std.c would be receptive to your
ideas. The fact that they didn't agree with you does *not* suddenly
make the discussion topical here.
The gets function is still there, trigraphs are still there,
the general attitude is that C should avoid any change
and go into obsolescence.

Most of the people in comp.std.c and here think that
C++ is the way to go, hence, C should stay where it is
and disappear.

Nonsense. Nobody here wants C to disappear.
 
G

goose

jacob navia said:
Should return zero and set errno to EDOM. Zero would make NULL
identical to the empty string,

?

Making strlen(NULL) return the exact same thing as strlen("")
isn't a very good idea; NULL (the macro) and "" (the empty
string) are as different as chalk and cheese.
what is consistent with the
usage many applications make of NULL.

I beg to differ. Many programmers check for an empty string
differently from checking for a nonexistent string.

char *user_input = malloc (MAX_INPUT);
....

if (user_input) { /* check whether string now exists */
...
}
....
/* get input from user */
....
if (strlen (user_input)==0) { /* check for empty string */
...
}
The domain of strlen is the set of non-empty strings. Hence, EDOM
is a correct indicator.

so what will length be below? and errno?

length = strlen ("");


I (sorta) agree that UB is difficult to live with. Maybe UB
should be done away with and replaced with "Will Crash The Program",
or even have a global bool UB indicater, like errno, which will get
set whenever UB occurs, so we can read it and shut down?

Those dont sound good enough, though; I'd rather just learn
how to avoid UB (not too difficult at all).

goose,
 
C

chris

Dan said:
Old said:
On Sat, 09 Oct 2004 01:35:49 +0200, in comp.lang.c, jacob navia wrote:


This, is reflected in the standards comitee too: I discovered a buffer
overflow in the code of the asctime function, printed in the standard.

Where's the error? I checked N869, and it looks fine to me; it supplies
a buffer 'char result[26]' for holding a 25-character string plus null
terminator. Unless Jacob is thinking that it'll stop working correctly in
the year 10000; but that's not a reasonable complaint in my book.
So, what's the deal here?


A local attacker could set the clock to the year 10000,
causing undefined behaviour (which could be a root exploit).
So there is a small amount of seriousness.

There is no need to set the clock.
Just call asctime with a year of 10000
You fill a structure tm, with say year 10000, month 665, etc, and call
asctime.

No need to set the clock.


You missed the point. The idea is not to compromise *your* program, but
another program, running with root privileges, whose code you cannot
control. If that program uses the asctime function with the current date
as an argument, you can compromise it by setting the system date to
year 10k. If you have enough information about a lot of things,
you may be able to control the actual behaviour of the program to
your advantage. In theory, at least.

I have pointed out elsethread the flaw in the reasoning.
While I do think this bug is stupid, I don't think it's that stupid.

Imagine code which accepts input from a user which it will pass to
asctime. Now I would expect that I'd have to check for some things
myself (like months<12, check the number of days is approriate) else bad
things might happen. However I might not expect that I have to also
check year<10000 (and there are applications where year>10000 is not
entirely stupid).

Chris
 
D

Dan Pop

In said:
I (sorta) agree that UB is difficult to live with. Maybe UB
should be done away with and replaced with "Will Crash The Program",
or even have a global bool UB indicater, like errno, which will get
set whenever UB occurs, so we can read it and shut down?

It can't be done. Far too many instances of UB are both harmless and
undetectable. Others are too expensive to detect and trying to detect
them would slow down correct programs for no redeeming benefits.

Purify provides a good example of the overheads incurred by the attempt
to detect as many memory access related instances of UB as possible.

I'm happy enough with the Unix systems that unconditionally crash the
program at any attempt of dereferencing a null pointer. And I hate the
ones that make null pointers behave like empty strings (page zero
mapped in the process address space and filled with zeroes).

Dan
 
D

Dan Pop

In said:
Dan said:
Old Wolf wrote:



On Sat, 09 Oct 2004 01:35:49 +0200, in comp.lang.c, jacob navia wrote:


This, is reflected in the standards comitee too: I discovered a buffer
overflow in the code of the asctime function, printed in the standard.

Where's the error? I checked N869, and it looks fine to me; it supplies
a buffer 'char result[26]' for holding a 25-character string plus null
terminator. Unless Jacob is thinking that it'll stop working correctly in
the year 10000; but that's not a reasonable complaint in my book.
So, what's the deal here?


A local attacker could set the clock to the year 10000,
causing undefined behaviour (which could be a root exploit).
So there is a small amount of seriousness.

There is no need to set the clock.
Just call asctime with a year of 10000
You fill a structure tm, with say year 10000, month 665, etc, and call
asctime.

No need to set the clock.


You missed the point. The idea is not to compromise *your* program, but
another program, running with root privileges, whose code you cannot
control. If that program uses the asctime function with the current date
as an argument, you can compromise it by setting the system date to
year 10k. If you have enough information about a lot of things,
you may be able to control the actual behaviour of the program to
your advantage. In theory, at least.

I have pointed out elsethread the flaw in the reasoning.
While I do think this bug is stupid, I don't think it's that stupid.

Imagine code which accepts input from a user which it will pass to
asctime. Now I would expect that I'd have to check for some things
myself (like months<12, check the number of days is approriate) else bad
things might happen. However I might not expect that I have to also
check year<10000 (and there are applications where year>10000 is not
entirely stupid).

If you're writing such an application to be executed with root privileges
you must be more than paranoid when checking the user input. Furthermore,
I can't remember ever using ctime() in the first place, strftime() has
always proved to be the best tool for the job.

Dan
 
J

jacob navia

goose said:
Like I said, I'm perfectly happy with learning not to
cause UB. But, to the new C programmer (which I'm seeing
less these days),

Me too. There are less and less new programmers that learn C.
After all the negative publicity done by the C++ people (forget
C, C is primitive, etc) and after the buffer overflows fiascos
many companies and people are just getting away from it.

The atmosphere in this forum, the whole attitude towards data
processing concepts like using higher level data-structures,
etc. All this contributes to make C obsolete slowly but surely.
It will die a natural death when the programmers that now
know it retire.

We *could* fight against this, but I see it as very difficult here.

Concepts such as security, defensive programming, avoiding
buffer overflows, etc, is shunned from the discussion.

And in the standard itself, we have the reference code for
asctime that will crash with very bad consequences at the
slightest malformed input.
UB is a hard concept to *get through*

Yes. And avoiding it always is impossible if the
language doesn't cooperate, if the standards comitee
doesn't cooperate, etc.

The answers I get are significative:

"There is so much UB in the library that is hopeless to even start
trying to fix it"

Great. Let's go on then.
 
K

Keith Thompson

Calling a <time.h> function with a date that cannot be represented by
time_t invokes undefined behaviour.

That's not true for asctime(). The algorithm, which is presented in
the standard (and is admittedly flawed) makes no reference to time_t.
On a system where time_t overflows in, say, 2038, actime() must still
work properly for a date in the year 2100. On a system where time_t
can represent years after 9999, asctime still invokes undefined
behavior given an argument representing a time in the year 11000.

There are other examples.
 
C

CBFalconer

Dan said:
It can't be done. Far too many instances of UB are both harmless
and undetectable. Others are too expensive to detect and trying
to detect them would slow down correct programs for no redeeming
benefits.

In addition, UB in a conforming program can be well defined in a
non-conforming program running in a known environment. Some areas
that almost certainly take advantage of this in your system are the
malloc implementation, any operations on FILEs, signals, longjumps,
and more.
 
C

CBFalconer

chris said:
.... snip ...

While I do think this bug is stupid, I don't think it's that stupid.

Imagine code which accepts input from a user which it will pass to
asctime. Now I would expect that I'd have to check for some things
myself (like months<12, check the number of days is approriate) else
bad things might happen. However I might not expect that I have to
also check year<10000 (and there are applications where year>10000
is not entirely stupid).

And any time you increment a signed integer you have to check that
its original value did not exceed 32767 (for complete portability)
or INT_MAX for portability to the destination system. In the same
vein, I can conceive of applications where integers larger than
32767 are not entirely stupid. So we should fix the ++ operator.
 
G

goose

Dan said:
It can't be done. Far too many instances of UB are both harmless and
undetectable. Others are too expensive to detect and trying to detect
them would slow down correct programs for no redeeming benefits.

Purify provides a good example of the overheads incurred by the attempt
to detect as many memory access related instances of UB as possible.

I'm happy enough with the Unix systems that unconditionally crash the
program at any attempt of dereferencing a null pointer. And I hate the
ones that make null pointers behave like empty strings (page zero
mapped in the process address space and filled with zeroes).

Like I said, I'm perfectly happy with learning not to
cause UB. But, to the new C programmer (which I'm seeing
less these days), UB is a hard concept to *get through*

(sorta like explaining Chartered accountants to amoebas:
they're missing links all the way to the top[1]).

goose,
[1]Something Terry Pratchett once said, I think
 
M

Mark McIntyre

Keith Thompson wrote:

The discussion in comp.std.c didn't lead to anything.

I see. So, having failed to convince anyone when discussing it in the right
place (which implies that your argument was weak, or your powers of
persuasion small), you decided to come and annoy us instead?

Sensible.
Most of the people in comp.std.c and here think that
C++ is the way to go, hence, C should stay where it is
and disappear.

Your attitude seems to be "they didn't agree with me, they must all have an
evil ulterior motive".

Perhaps you need to consider another possible explanation.
 
M

Mark McIntyre

The gets function is still there, trigraphs are still there,

I'm sorry, did I just notice you arguing that trigraphs should be removed
from the standard? You really are strangely obsessive.
 
O

Old Wolf

jacob navia said:
The atmosphere in this forum, the whole attitude towards data
processing concepts like using higher level data-structures,
etc. All this contributes to make C obsolete slowly but surely.
It will die a natural death when the programmers that now
know it retire.

I think you cannot read. Can you reference one post where
somebody has expressed a negative attitude towards the use
of high-level data structures?

Most of the posts responding to yours have been "Why don't
you discuss this in the appropriate forum?" and you have
shown no sign of heeding that advice.
Concepts such as security, defensive programming, avoiding
buffer overflows, etc, is shunned from the discussion.

Rubbish. What's shunned from the discussion is informal
proposals for updates to the C standard, in a forum whose
purpose is to discuss code which conforms to the existing
standards only.

For example, the advice "Don't use gets()" is regularly seen
here, and that is surely a discussion of avoiding buffer overflows.
The answers I get are significative:

"There is so much UB in the library that is hopeless to even start
trying to fix it"

Actually the answers are "There is so much UB in the library
so that compilers can generate the most optimal code". For
example I don't go around calling strlen(NULL) so why should
_I_ pay the performance penalty of a library checking for it?
(Answer the question, don't waffle on about covering up
for other programmers' errors).
 
D

Dave Thompson

No. You do not want it?
Do not include
#include <stdlist.h>
<snip> If you do not use the floating
point library (math.h) you just do not write:
#include <math.h>
and all names are still available to you and it is a legal
program.

Not really. All standard library function names are reserved with
external linkage whether or not you #include their declarations and/or
ever reference them. You can't legally write your own sin() or fopen()
or whatever. (At least on a hosted implementation.) Some (many?)
implementations do allow you to provide your own routines, which are
selected in preference to the standard one(s), as long as you don't
interfere with library internals. As a practical matter most of the
math routines stand pretty much alone, except for setting errno, and
overridng them has a fairly good chance of working but not standard.
You can then write:
myfloat mysin(myfloat arg);
and you can code the sinus function using only integer operations
and table lookups.
That's true, since it uses a different name than the standard one. You
could do that even if you DID #include <math.h>.


- David.Thompson1 at worldnet.att.net
 
D

Dan Pop

In said:
(e-mail address removed) (Dan Pop) writes:
[...]
Calling a <time.h> function with a date that cannot be represented by
time_t invokes undefined behaviour.

That's not true for asctime(). The algorithm, which is presented in
the standard (and is admittedly flawed) makes no reference to time_t.

It doesn't have to. If you're passing asctime() a value out of range
for the <time.h> implementation, you're invoking undefined behaviour.
Admittedly, the standard fails to say so, leaving the algorithm to
define when an asctime call invokes undefined behaviour.

The real flaw in the standard is not the algorithm, but specifying
asctime() using an implementation. The right thing would have been to
require asctime to take as input the result of converting a time_t
value or the output of the mktime() function and to describe the format
of its output.
On a system where time_t overflows in, say, 2038, actime() must still
work properly for a date in the year 2100.

It is this requirement that I consider the real bug in the standard.
asctime() must be seen as part of a package, rather than as an
independently defined entity.
On a system where time_t
can represent years after 9999, asctime still invokes undefined
behavior given an argument representing a time in the year 11000.

It's the caller that invokes undefined behaviour, by passing asctime()
a value not allowed by the definition of asctime(). If you have a
voltmeter rated up to 9999V, whose fault is if you connect it to 10000V
and it breaks? Was the voltmeter design flawed?

From a realistic point of view, it makes no sense to fill a tm struct
with values by hand and then pass it to asctime. In real applications,
if asctime() is used at all, its argument is the result of converting
a time_t value or has been properly adjusted by an mktime call.

Note that a year above 9999 is only one way of abusing the algorithm
from the standard. ALL the other fields of a struct tm involved in the
algorithm can be filled with values that would either cause out of
bounds array accesses or the total length of the output exceed the
allocated space.

Dan
 
K

Keith Thompson

In said:
(e-mail address removed) (Dan Pop) writes:
[...]
Calling a <time.h> function with a date that cannot be represented by
time_t invokes undefined behaviour.

That's not true for asctime(). The algorithm, which is presented in
the standard (and is admittedly flawed) makes no reference to time_t.

It doesn't have to. If you're passing asctime() a value out of range
for the <time.h> implementation, you're invoking undefined behaviour.
Admittedly, the standard fails to say so, leaving the algorithm to
define when an asctime call invokes undefined behaviour.

So you're arguing that time_t is the fundamental type on which
everything in <time.h> is based, and that a struct tm value that
represents a time outside the range of time_t is invalid. As you
admit, the standard doesn't actually say this, so I don't understand
how you came to this conclusion.

The actual standard defines several types: clock_t, time_t, and
struct tm. There is no indication that any of them is any more
fundamental than the others. There is no indication that manually
constructing a struct tm value is invalid. There's even an example in
7.23.2.3 that does exactly that (to determine the day of the week for
July 4, 2001); that example does depend on time_t, but it's clear that
you *can* construct a valid struct tm value that's not derived from a
time_t value.

[...]
It is this requirement that I consider the real bug in the standard.
asctime() must be seen as part of a package, rather than as an
independently defined entity.

Why? It's not a "bug", it's just something that you don't like.

[...]
From a realistic point of view, it makes no sense to fill a tm struct
with values by hand and then pass it to asctime.

Sure it does. The standard says that this is allowed (unless you can
point to explicit wording that says otherwise).

I agree that making time_t the fundamental time type, and defining
everything else in <time.h> in terms of it, would have made for a
cleaner design. Perhaps whatever interface eventually replaces
<time.h> will do something like that. But that's a topic for
comp.std.c; here in comp.lang.c, we discuss the language as it's
actually defined.
 
D

Dan Pop

In said:
[email protected] (Dan Pop) said:
In said:
(e-mail address removed) (Dan Pop) writes:
[...]
Calling a <time.h> function with a date that cannot be represented by
time_t invokes undefined behaviour.

That's not true for asctime(). The algorithm, which is presented in
the standard (and is admittedly flawed) makes no reference to time_t.

It doesn't have to. If you're passing asctime() a value out of range
for the <time.h> implementation, you're invoking undefined behaviour.
Admittedly, the standard fails to say so, leaving the algorithm to
define when an asctime call invokes undefined behaviour.

So you're arguing that time_t is the fundamental type on which
everything in <time.h> is based, and that a struct tm value that
represents a time outside the range of time_t is invalid. As you
admit, the standard doesn't actually say this, so I don't understand
how you came to this conclusion.

I have discussed the issue at great length in comp.std.c, several months
ago. I see no point in rehashing it.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Why?

Reading impaired? The answer was just above your question, readily
available for your perusal.

I hope you didn't ask why asctime() must be seen as part of a package,
because the answer should be obvious.
It's not a "bug", it's just something that you don't like.

Reading impaired? I didn't say it's a bug, I said "I consider the real
bug" which strongly implies a personal judgment, i.e. it's just something
that I don't like.
[...]
From a realistic point of view, it makes no sense to fill a tm struct
with values by hand and then pass it to asctime.

Sure it does. The standard says that this is allowed (unless you can
point to explicit wording that says otherwise).

Many things the standard allows make no sense from a realistic point of
view. I have plenty of examples available, if needed.

Dan
 
K

Keith Thompson

In said:
(e-mail address removed) (Dan Pop) writes: [...]
It is this requirement that I consider the real bug in the standard.
asctime() must be seen as part of a package, rather than as an ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
independently defined entity.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Why?

Reading impaired? The answer was just above your question, readily
available for your perusal.

I see only a blatant assertion which you've chosen not to justify.
I hope you didn't ask why asctime() must be seen as part of a package,
because the answer should be obvious.

No, I'm not reading impaired, but thank you for your kind concern.

I'm reading the actual standard. I don't know what you're reading.
Reading impaired? I didn't say it's a bug, I said "I consider the real
bug" which strongly implies a personal judgment, i.e. it's just something
that I don't like.

Ok, so you don't like what the standard actually says. That's fine;
you might consider discussing it in comp.std.c.

What the standard actually *says* is internally consistent (if a bit
odd), and your interpretation of it is no more than wishful thinking.

Try this, Dan. Pretend that I've asserted that "Calling a <time.h>
function with a date that cannot be represented by time_t invokes
undefined behaviour." I'm sure you wouldn't have any trouble arguing
that I'm wrong.
 
D

Dan Pop

In said:
[email protected] (Dan Pop) said:
In said:
(e-mail address removed) (Dan Pop) writes: [...]
It is this requirement that I consider the real bug in the standard.
asctime() must be seen as part of a package, rather than as an ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
independently defined entity.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Why?

Reading impaired? The answer was just above your question, readily
available for your perusal.

I see only a blatant assertion which you've chosen not to justify.

As I wrote below, I thought that the justification is obvious, even to
you.
No, I'm not reading impaired, but thank you for your kind concern.

I'm reading the actual standard. I don't know what you're reading.

The same, but with an open mind, rather than anally clutching to the
strict wording which, quite often, fails to reflect the actual intent of
the authors.
Ok, so you don't like what the standard actually says. That's fine;
you might consider discussing it in comp.std.c.

Memory impaired? I have already done it, several months ago, as I have
already pointed out in my previous message.
What the standard actually *says* is internally consistent (if a bit
odd), and your interpretation of it is no more than wishful thinking.

If expecting <time.h> to provide the specification of a package, rather
than the descriptions of a bunch of *unrelated* functions is wishful
thinking, then you're right, of course.
Try this, Dan. Pretend that I've asserted that "Calling a <time.h>
function with a date that cannot be represented by time_t invokes
undefined behaviour." I'm sure you wouldn't have any trouble arguing
that I'm wrong.

I'd argue both sides and finally agree with you, because I do consider
<time.h> as a package (with the obvious exception of the clock() function
and its associated stuff, which was better placed in <stdlib.h>, the
catch-all for anything that didn't belong elsewhere). It may not be a
perfect package, but it's still a package and the current specification
of asctime() looks like a brain fart of the committee.

The only *sensible* requirement about the size of the asctime output is
that it accomodates the largest year supported by the <time.h>
implementation, usually the largest year representable by time_t.
If int is a 64-bit type, there is no point in making the buffer large
enough to accomodate year LLONG_MAX, an entity devoid of any meaning
for any reasonable C program.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,219
Latest member
KristieKoh

Latest Threads

Top