A very interesting book

L

lawrence.jones

In comp.std.c Keith Thompson said:
Note also that the struct tm object pointed to by the argument to
mktime is explicitly allowed to have members whose values are outside
their normal ranges. The same permission applies to asctime() simply
because no restriction is given, other than the implicit restriction
to values that don't cause undefined behavior.

I would argue exactly the opposite -- that the explicit permission in mktime
implies that such permission is *not* granted otherwise.
 
K

Keith Thompson

I would argue exactly the opposite -- that the explicit permission in mktime
implies that such permission is *not* granted otherwise.

I disagree, but I might be willing to be convinced.

An asctime() algorithm equivalent to the one in the standard (which
is what the standard requires) exhibits well-defined behavior with,
for example, time_ptr->tm_day == 32. I see no permission for the
implementation to do anything other than what that code specifies.

The only argument I can see that the behavior might be undefined
is 7.1.4p1:

If an argument to a function has an invalid value (such as a
value outside the domain of the function, or [snip]) or [snip],
the behavior is undefined.

(The snipped text doesn't apply in this case.)

But 7.23.1p4 defines the "normal ranges" for the members of struct
tm. You'd have to assume that anything outside the "normal range"
is an "invalid value" -- but normality and validity are not the
same thing.

Note that the standard does explicitly state that the characters
stored by strftime() are undefined if any members have values outside
the normal range, and this is listed under "Unspecified behavior"
in J.1 (which is non-normative). This doesn't tell us anything
definitive, but it is suggestive.

I think the term "normal ranges" refers, not to the ranges of
values the members are *permitted* to have, but to the ranges of
values that will be set by localtime() and gmtime() -- though it
would have been nice if that had been stated explicitly.

None of the other uses of the word "normal" in the standard are
useful for resolving this.

Now I think that it would have made more sense for the standard
to say that the behavior of calling asctime() with values outside
the "normal ranges" is undefined (as it does for strftime()).
But since the authors of the standard chose to define the behavior of
asctime() by presenting an explicit implementation in C source code,
I think we're stuck with the behavior of that code (except that an
implementation can do what it likes in cases where the behavior of
the sample code is undefined).

I think the real problem is that the standard fails to define
what it means by "normal range".

But feel free to convince me that anything outside the "normal range"
is an "invalid value".
 
W

Wojtek Lerch

I would argue exactly the opposite -- that the explicit permission in
mktime
implies that such permission is *not* granted otherwise.

No, the explicit permission is necessary because without it, the behaviour
of mktime() would be undefined by omission when the values are outside of
their "normal" ranges. To avoid undefined behaviour when the values do not
represent a valid date and time, the text needs to explain how such values
are interpreted, and mentioning that they're allowed is a reasonable
introduction to such an explanation. Given that that's a sufficient
justification for the explicit permission being there, there's no reason to
assume that its purpose might *also* be to imply that values outside of
their "normal" ranges are forbidden everywhere else.

(Unfortunately, even though the explicit perrmision makes it clear that the
standard wants to avoid undefined behaviour, I don't feel that it's doing a
very good job of defining the behaviour. What exactly does it mean that the
components of the structure "are set to represent the specified calendar
time, but with their values forced to the ranges indicated above"? What
date is "specified" by setting tm_mon to 12 and tm_mday to 50 -- is it
supposed to be obvious that it's referring to 19 February of the following
year? Or are the values "forced" into their "normal" ranges *before* being
interpreted as specifying a calendar time, and therefore my example really
specifies the 31th of December?)

Anyway, no such issue exists for asctime(). Its behaviour is defined
without depending on whether the values in the structure represent a valid
calendar time or not.
 
F

Friedrich

Mark McIntyre said:
Perhaps thats because more programmes have been written in C than any
other language?
And C is easier to learn (and therefore has more novice
> programmers)?
Well C is not especially hard to learn but applying it is. But I would
dare to day that most "scripting" languages are easier to learn. And I
can not convince myself that e.g Smalltalk is harder than C.
By the way, we are potentially now into the "safety by obscurity"
model here which as any fule no is spurious. Plan-9 is safe because
only experts use it...
Well some arguments here a more than strange. Some wrote every
language is equally unsafe because there might be a buffer
overrun. Howerver AFAIKT I just remember one recent problem with
buffer overruns in a certain Interpreter (written in C). I can not
remember having read about it let's say in Ocaml, Haskell or even
FreePascal. I can not see how one can deny that not doing bounds
checking can be safer than the other way. I can also not see why the
standard shouldn't be checking and just on occasion you have something
like
#pragma unsafe or the like to play your "nasty" tricks.

If that all wouldn't be a problem than this hardly would be worth a
discussion. But those bugs do exist and they do mainly exist in C and
C++. So I do see more as a specific language problem than a general
problem.

And strange enough there is a flourishing industry for that all kinds
of tools for handling this problem. Just let's name few, dmalloc,
libfence, valgrind, splint and tons of others. So that is a very
strong indicator for this kind of weakness for me.


Regards
Friedrich
 
C

CBFalconer

Friedrich said:
.... snip ...

Well some arguments here a more than strange. Some wrote every
language is equally unsafe because there might be a buffer
overrun. Howerver AFAIKT I just remember one recent problem with
buffer overruns in a certain Interpreter (written in C). I can
not remember having read about it let's say in Ocaml, Haskell or
even FreePascal. I can not see how one can deny that not doing
bounds checking can be safer than the other way. I can also not
see why the standard shouldn't be checking and just on occasion
you have something like
#pragma unsafe or the like to play your "nasty" tricks.

C can't have run-time checking without horrendous loss of
efficiency (basically it would require interpretation) because of
the unrestricted use of pointers. There is no destinction between
a pointer to a char, and a pointer to an array of 10,000 chars.
One may be created from the other at any time. In addition
malloced storage is not distinguished in any way (except some usage
aspects) from any other storage. All this requires that every
pointer carry with it information of max and min indices allowable
for use.

There are programming advantages to this freedom, but they have to
be used with care. Most often the care is missing.

To have safe code, all you have to do is switch languages. Pascal
and Ada are much safer. No language is completely safe. Good C
requires first class programmers. Most often it doesn't get them.
 
J

jacob navia

CBFalconer said:
Friedrich wrote:
... snip ...

C can't have run-time checking without horrendous loss of
efficiency (basically it would require interpretation) because of
the unrestricted use of pointers.


This is plainly not true. Each access to an array would
have two reads from memory + two integers comparisons
to do. Progress in hardware make such tests completely
transparent. And nobody is saying they should be anything
more than optional.

There is no destinction between
a pointer to a char, and a pointer to an array of 10,000 chars.
One may be created from the other at any time.

???? What is this???
In addition
malloced storage is not distinguished in any way (except some usage
aspects) from any other storage. All this requires that every
pointer carry with it information of max and min indices allowable
for use.

This is called "fat" pointers. Yes, they carry size information
and what is wrong with that?
There are programming advantages to this freedom, but they have to
be used with care. Most often the care is missing.

To have safe code, all you have to do is switch languages. Pascal
and Ada are much safer. No language is completely safe. Good C
requires first class programmers. Most often it doesn't get them.

This is the old elitist argument much in vogue here in comp.lang.c

"We are geniuses, the others are just bad/lazy programmers".

They concept of "error prone" tool doesn't get into their minds
 
R

Ron Ford

Willem said:
Richard Heathfield wrote:
) [followups set to comp.lang.c]
)
) jacob navia said:
)
)> Ron Ford wrote:
)<snip>
)>>
)>> I'm somewhat of a non-believer here. There is no calculus to decide.
)>>
)>
)> Well, mathematics doesn't need beliefs.
)
) It needs a few. They are called axioms.

A mathematician doesn't need to believe in the correctness his axioms.

Fair point - they're more sort of meta-beliefs, aren't they? IF we believe
THESE things, then THOSE are the consequences...

That is how the crab argues with Achilles. A lot of folks think that the
strength of C is its blinding speed; they aren't wrong, but logicians have
their say.

The most unsatisfying axioms I ever read were in Analysis I or Calculus IV.
The ultimate one assumed that the last century in mathematics didn't
happen.
 
S

santosh

jacob said:
Because functions like gets() asctime() and other standard
functions (still in C99 standard even if gets() got deprecated)
make buffer overflows almost MANDATORY.

I don't think anyone these days ever uses gets().
Zero terminated strings, where there are no bounds checking make it
almost impossible to avoid errors since it requires from the
programmer never to forget the lengths of buffers!

IME it's a lot more bookkeeping, but I wouldn't say that it is "almost
impossible". The size of each and every object must be known at some
point in a program, and that information need only be carefully and
logically preserved and used each time the object is accessed.
I have proposed a string library for C to make those errors more
difficult. The reception was as expected... :-(

When did you propose this? If you had made operator overloading an
integral part of this proposal, then not many would've been every
enthusiastic since operator overloading is not supported by nearly all
the C compilers out there.

OTOH a counted strings interface based on the existing infrastructure of
C like BStrlib would not be an unwelcome addition to C1x, IMHO.
 
S

santosh

jacob said:
The C standard shows a piece of code that will overflow its static
buffer if used with a year value greater than 8900 (if I remember
correctly)

Similarly, if the month value is greater than 12 it will
show UB.

Obviously, showing such a piece of code is a reminder to the rest
of the world how much the standard cares about buffer overflows.

The discussion in this group confirms this. Look at Mr Thomson:

By your logic we could say that the Standard allows potential buffer
overruns with memcpy, memmove, strcpy, strcat, fread, fgets and so.

I think you may be happier with a language other than C.
Then why you don't support it now and act to get rid of a buffer
overflowing code written in the C standard document?

Let me turn the tables and ask you jacob that as an outspoken
implementor (and critic) of ISO C, why not arrange for your lcc-win to
emit a diagnostic for any use of asctime (and perhaps of other
functions that you deem as unsafe)? As far as I can see my recent copy
your compiler emits no helpful warnings upon the usage of "dangerous"
functions like gets and asctime and others. Gcc for example emits a
diagnostic for gets and tmpnam among others. It is not as good as
remedy at the source, but it's nonetheless helpful.

It would be good to see you judging yourself with the same high
standards with which you judge others.

<snip>
 
R

Richard

CBFalconer said:
C can't have run-time checking without horrendous loss of
efficiency (basically it would require interpretation)

Basically I think that is not true whatsoever. It needs run time
checks. Many of which can be very, very efficient depending on the HW
and compiler involved.
 
R

Richard Bos

CBFalconer said:
Friedrich wrote:

C can't have run-time checking without horrendous loss of
efficiency (basically it would require interpretation) because of
the unrestricted use of pointers. There is no destinction between
a pointer to a char, and a pointer to an array of 10,000 chars.

Oh, bollocks. C certainly allows fat pointers. The only thing you need
to keep is:
- base pointer to block;
- size of block, in bytes;
- offset of pointer in block, in bytes;
- type of pointer.

_Nothing_ in that requires interpretation. It does add overhead, that's
true, and that is a very valid reason for most C implementation not to
use fat pointers. However, bounds-checking compilers are not only
allowed but do, IIRC, in fact exist. Such an implementation is, of
course, slower; but not "horrendously" so.
To have safe code, all you have to do is switch languages.

Or brains.

Richard
 
R

Richard Bos

Keith Thompson said:
Ada is as powerful as C. It doesn't forbid unsafe actions, it merely
requires you to specify them explicitly in most cases. For example,
to interpret an integer as a pointer (something C allows with a simple
cast), you have to instantiate Unchecked_Conversion and then call the
instance. (Strictly speaking, the C cast performs a type conversion,
not a reinterpretation, but it's implemented as a reinterpretation in
every implementation I've seen.)

In other words, Ada requires a special construct to be applied to
interpret an integer as a pointer; C, by contrast, requires a special
construct to be applied to interpret an integer as a pointer. Apart from
the size of the construct, I see no difference.

Richard
 
R

Richard Bos

CBFalconer said:
Or an old one. Try ISO10206.

No; Keith was looking for Ada with a C-like syntax, while what you
suggest is Algol with a BASIC-like syntax.

Richard
 
K

Keith Thompson

In other words, Ada requires a special construct to be applied to
interpret an integer as a pointer; C, by contrast, requires a special
construct to be applied to interpret an integer as a pointer. Apart from
the size of the construct, I see no difference.

The difference is that the "special construct" in C, a cast doesn't
stand out. It's the same construct used to for perfectly safe type
conversions, such as a conversion from int to long.

There *should* be few casts in well-written C code, but unfortunately
that's not the case in real-world code. So to some small extent,
perhaps the difference is more in how C is used than in how it's
designed -- though there are still plenty of ways to do dangerous
things in C without trying very hard.

I can see this turning into a language war, so perhaps we should drop
this sub-thread.
 
W

Walter Roberson

CBFalconer wrote:
This is plainly not true. Each access to an array would
have two reads from memory + two integers comparisons
to do. Progress in hardware make such tests completely
transparent.

"Progress in hardware" ?? Sounds to me like the old VAX scheme,
where a "pointer" was really a reference to a memory descriptor.

If I recall my history correctly, the people who implemented
the official C for VAX/VMS really tried to adhere to that
architecture, but eventually had to get an operating system
modification to allow the compiled programs to use plain addresses
rather than descriptors.

As best I recall, their plans to use descriptors fell down because of
type punning. For one thing, every malloc'd byte must be accessible
under the definition of malloc, but "objects" get carved up out of
malloc'd space: do you follow the semantics that the bytes are all
accessible because they are part of the malloc'd object, or do you
follow the semantics that the bytes before/after an sub-object
are inaccessible because C says accessing outside of objects is
undefined behaviour? If you have a plain int (for example) and you
pass its address to a subroutine, then *(ptr+1) should trigger
bounds exception processing, but if the space for the int was
part of a malloc'd area then you are really talking about an array
of int's and *(ptr+1) must work [if you stay within the area.]

Even without dynamic memory, it is common (especially in older
C routines that were written to take on mathematical processing
subplanting FORTRAN), to pass in a 1D array but then to use it
internally as a 2D array. Bounds checking based upon the
1D descriptor from when the memory area was created needs to be
flexible enough to handle that kind of type punning, without C
having the semantics to convey to lower level routines how big
something should be. If you do a 1D to 2D punning and then
you pass to a lower routine the address of the first row, intending
only to denote the row, the lower level routine doesn't have
a way to know that an access outside the row but still within
the block should be caught -- as far as the lower level routine
is concerned, you might have wanted to type-pun back to the
entire 1D array. And indeed, I have encountered C code that did
flip back and forth between 1D and 2D at different call depths
(e.g., you might have done a 2D matrix operation and then want
to take the absolute value of all of the elements; taking the
absolute value row-by-row would be less efficient than taking
the absolute element of the entire block as a 1D vector.)

There are languages in which arrays or array slices are "first
class objects", for which hardware bounds checking makes perfect sense.
Unfortunately, C isn't one of those languages, and people *do*
take advantage of the type laxity in real programs; changing the rules
now would have serious issues with backwards compatability.
 
I

Ian Collins

jacob said:
This is plainly not true. Each access to an array would
have two reads from memory + two integers comparisons
to do. Progress in hardware make such tests completely
transparent. And nobody is saying they should be anything
more than optional.
I'd be interested in seeing an efficient access checker implemented at
compile time.

On the platform I use where the debugger does the checking (Solaris) the
impact can be truly painful. Every memory access is replaced at runtime
with a function call. I guess the function called has to scan
allocation tables for the address. If the compiler could do this, one
could select types of accesses to be checked.
 
I

Ian Collins

Keith said:
The difference is that the "special construct" in C, a cast doesn't
stand out. It's the same construct used to for perfectly safe type
conversions, such as a conversion from int to long.
Which was two of the reasons C++ added "new style" casts. Maybe C
should follow?
 
A

Anand Hariharan

There is no destinction between
a pointer to a char, and a pointer to an array of 10,000 chars.

Are they not different types?

In addition malloced storage is not distinguished in any way
(except some usage aspects) from any other storage.

Does not 'free' require a pointer returned by malloc/calloc/realloc? Or
do you consider this an usage aspect?
 
K

Keith Thompson

Anand Hariharan said:
Are they not different types?

Yes, they are. The point, I think, is that given

char *ptr;

you can't tell (unless you keep track it yourself *very* carefully)
whether it points to a single object of type char, or to the first
element of an array of 10,000 chars.
Does not 'free' require a pointer returned by malloc/calloc/realloc? Or
do you consider this an usage aspect?

Yes, it requires such a pointer, but the requirement isn't enforce;
you just get undefined behavior if you pass it something else (unless
it's a null pointer). Given a pointer value, there's no way to
determine at runtime whether it points to an object allocated via
malloc(), to a local variable, or to something else.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,072
Latest member
trafficcone

Latest Threads

Top