Horrible Visual C Bug!

G

goose

Richard Heathfield said:
[Followups set to comp.lang.c]

Falcon said:
Theoretically, any undefined behavior should return an error (as in, the
standard should change). This might help solve the troubles of awful
programming.

Feel free to write a C compiler which diagnoses any and all instances of
undefined behaviour. Consider whether it should generate a diagnostic for
this code. If so, why? If not, why not?

int foo(int *a, int *b)
{
return *a = *b++;
}

<snip>

could it be that the reason for the addition of restrict in c99
was so that programmers can hint to the compiler that in the
above case there will definately be no UB ?

in the foo() example a helpfull c99 compiler might generate a warning
along the lines of
warning: in function foo: possible UB on line xx.

and adding in the "restrict" to the arguments then makes this
warning go away ?

goose,
just speculating, flame away ;-)
 
K

Kelsey Bjarnason

[snips]

I believe you're right about the csc consensus. I'm afraid I am just as much
in the dark over the difference between undefined behaviour and behaviour
which is not defined. I think I'm correct in saying that the committee sees
the two words "undefined behaviour" as being a key term with a particular
meaning, the meaning being, of course, the "this Standard imposes no
requirements" thing. But since the Standard imposes no requirements on
extensions, either, I continue to fail to see the distinction.

Apart from the obvious: one involves presumably correct operation outside
the scope of the standard, the other denotes very likely incorrect
behaviour.

int x = SomeWinAPICall(); is quite possibly correct, but beyond C's
ability to say much; i = ++i + i++; on the other hand...
 
K

Kevin Easton

Kelsey Bjarnason said:
[snips]

I believe you're right about the csc consensus. I'm afraid I am just as much
in the dark over the difference between undefined behaviour and behaviour
which is not defined. I think I'm correct in saying that the committee sees
the two words "undefined behaviour" as being a key term with a particular
meaning, the meaning being, of course, the "this Standard imposes no
requirements" thing. But since the Standard imposes no requirements on
extensions, either, I continue to fail to see the distinction.

Apart from the obvious: one involves presumably correct operation outside
the scope of the standard, the other denotes very likely incorrect
behaviour.

int x = SomeWinAPICall(); is quite possibly correct, but beyond C's
ability to say much; i = ++i + i++; on the other hand...

An implementation could describe some "correct" behaviour of i = ++i + i++,
if it wanted to, in the same way it can describe the "correct" behaviour
of SomeWinAPICall(), couldn't it?

- Kevin.
 
R

Richard Heathfield

Kelsey said:
[snips]

I believe you're right about the csc consensus. I'm afraid I am just as
much in the dark over the difference between undefined behaviour and
behaviour which is not defined. I think I'm correct in saying that the
committee sees the two words "undefined behaviour" as being a key term
with a particular meaning, the meaning being, of course, the "this
Standard imposes no requirements" thing. But since the Standard imposes
no requirements on extensions, either, I continue to fail to see the
distinction.

Apart from the obvious: one involves presumably correct operation outside
the scope of the standard, the other denotes very likely incorrect
behaviour.

It is not at all obvious to me that the distinction exists.
int x = SomeWinAPICall(); is quite possibly correct, but beyond C's
ability to say much; i = ++i + i++; on the other hand...

i = ++i + i++ is exactly as "quite possibly correct" as SomeWinAPICall(), in
terms of what the Standard says about them both. Any implementation is free
to make either of them do a certain thing, and document that thing. And of
course any implementation is free to do the nasal demon thing with either
of them.
 
J

Joona I Palaste

i = ++i + i++ is exactly as "quite possibly correct" as SomeWinAPICall(), in
terms of what the Standard says about them both. Any implementation is free
to make either of them do a certain thing, and document that thing. And of
course any implementation is free to do the nasal demon thing with either
of them.

Does the C standard anywhere say that i = ++i + i++; may do something
that SomeWinAPICall() may not, or the other way around? If not, then I
agree with Richard Heathfield, both are equally undefined behaviour.

--
/-- Joona Palaste ([email protected]) ---------------------------\
| Kingpriest of "The Flying Lemon Tree" G++ FR FW+ M- #108 D+ ADA N+++|
| http://www.helsinki.fi/~palaste W++ B OP+ |
\----------------------------------------- Finland rules! ------------/
"The trouble with the French is they don't have a word for entrepreneur."
- George Bush
 
D

Dave Thompson

Pointers

The set of all pointers in the program is initialized at startup. They are either NULL or they point
to valid addresses, established in the raw data of the program. For instance:

int a,*pint = &a;

The set of valid addresses is established by the compiler at startup. When control arrives at main()
all pointers are valid.
Not startup. Pointer variables (objects) can be static (including all
file scope) in which case the objects exist at startup; or local,
created when the block (usually function) is entered, even
recursively, and "vanish" when it exits; or dynamic, created and
released as directed by program code. Static pointer variables are
initialized, to null by default or to the stated initializer, but it
is not necessarily obvious if the value is valid, consider:
extern int a [ /* bound not given, defined in other t.u. */ ];
int * p = &a[37]; /* have to look there to validate this */
Automatic pointer variables are indeterminate if not initialized, and
dynamic pointer variables always indeterminate until assigned unless
calloc is used and all-bits-zero is valid, as it is on many machines.

The set of valid addresses = pointer values, or equivalently of
objects that can validly be pointed to, similarly varies: it
includes static variables, local auto but not register variables, and
dynamically allocated aka heap space; the latter two have lifetimes
shorter (often much shorter) than the whole program execution.
Undefined behavior (what pointers concerns) is when any pointer is used that
1) Has not been initialized to point to an existing valid object.
or
2) Is NULL or points somewhere else than

Object start address <= p < (start address)+sizeof(Object)
Assuming by 'used' you mean dereferenced, almost; in the case of
pointing within a composite (array or struct) object, we further need
the pointer to be to a valid subobject (element or field), or any
character=byte. For a pointer (value) that is only computed, stored,
passed, returned, etc., we also allow == startaddr + size.

Note that 'existing' requires the object existed when the address was
taken, and the *same* object *still* exists when the pointer is used;
this is trivial for a static target, but not for the other durations.
We can distinguish two types of pointers:

A) Unbounded pointers, i.e. pointers where the calculation of sizeof(Object) is impossible
B) Bounded pointers where sizeof(Object) is known and can be checked at run time.

Detecting this class of UB is called bounds checking and is done in many languages.
C is notoriously lacking this facility. Worse, the machine is not used to automatically test the
programmer's assumptions and all pointers are considered unbound.
'Bounds checking' usually means checking of subscripts for arrays,
which are in (most) compiled languages the only actual objects that
can have differing sizes at runtime. C99 'flexible' structs are
stored in, and can access, differing space, but the type itself has
the fixed size, and layout, of the fixed part, only the array part is
variable. Various polymorphic pointers and references can point to
different types of object with different sizes, but each actual type
has a fixed size. Pascal variant records can have different sizes,
but each variant still has a fixed layout. C does not actually
prohibit runtime checking, but it does largely prevent the
optimizations that make it (mostly) practical for other languages.
Lisp, APL, and many other languages check array accesses and avoid memoy corruption. C doesn't, and
we are plagued by memory corruption and obscure bugs.

An improvement would be to encourage the automatic checking of object accesses and discouraging the
usage of unbounded pointers. Instead of writing:

void matmult(int n,int m, double *pmat)

we would write:

void matmult(int n, int m, double mat[n][m]);

Such a proto would allow to check in the calling program that the buffer passed has enough space as
declared, and in the called function it would be possible to check that no index is being misused.
Yes. And I believe you realize, but other readers may not, that this
syntax is standard in C99, but bounds checking (still) isn't.

(I would expect a function named matmult to multiply matrices, and
that requires three matrix arguments, or two and one return "value",
which of course in C must be done by returning a pointer to allocated
or reused memory or by wrapping a fixed-size array in a struct.)

In APL it is conventional to do whole-array operations whenever
reasonable, so fewer subscripts need actually be checked. This is
also supported and encouraged in HPF/Fortran>=90 and PL/I.
Most of this attitude comes because the Pascal language has this facility, and many C people see
Pascal as something quite horrible.
Pascal does not have quite the semantics you show; it (optionally) has
a form which binds(!) each bound (and in Pascal both lower and upper
bounds are specified, not just upper as in C) to an array formal:
procedure mat_something (mat: array [a:b, c:d] of real);
which can then be called with any 2-D array of real, and a,b,c,d used
(if necessary) to obtain the actual bound values. I'm not sure if you
can have multiple formals with the same variable-size (or as they call
it 'conformant') array type, and I'm pretty sure you can't have
mutiple formals with interdependent types, as e.g. actual matmult
needs: if A is [Xrange, Yrange], B must be [Yrange, Zrange] for some
Zrange, and result is [Xrange, Zrange].

Fortran has (since F77 IIRC) the form you describe, except slightly
different syntax, and since F90 also the equivalent of the Pascal form
(assumed-shape) except that the bounds are unnamed and instead
obtained using special builtin functions. I will call that 'attached'
since the bounds are carried with the array argument not written
separately, and your form 'separate'. I'm pretty sure PL/I has both.
Ada has attached (unconstrained array) and AFAICT prevents separate
except by instantiating a generic (with a different syntax).
I think that was a good feature of Pascal. I miss this in C and I see each day the consequences in
array overruns, obscure bugs, and many other problems. Encouraging the use of bounded pointers would
introduce some hygienical concepts isn't it?

Nobody is proposing banning unbounded pointers. They should remain for special uses or in old
software. Encouraging the use of bounded pointers will make them slowly les and less frequent,
that's all.

The sizeof calculation is very problematic in C because of the refusal of passing this information
in array prototypes by the standards comitee. Of course this has historical reasons, but I just do
not understand why in 2003 we still want to save us the few machine cycles that that would cost, and
spare the users and the programmers the stack overruns, memory corruption and other problems!

An array decays in C, to an unbounded pointer when passed to a subroutine. All sizeof information is
not passed along. This is (maybe) efficient but it is a problem for checking the bounds of array
indexes!
You can't pass bounds for arrays only and not pointers, because array
parameters *are* pointers; this is fundamental to the language, and
cannot be changed while still calling the result C. An implementation
can compute, store and pass (and check) bounds for *all* pointers, at
extra cost which most people apparently consider too high. You might
also want to store and check bounds for C99 'flexible' structs, either
hidden in the struct or in (all!) pointer-to-struct.

Or, you could design a callling sequence which passes bounds for
'array' arguments, similar to the way most Fortrans pass a 'hidden'
length of (there fixed-size) character-string arguments, but not keep
bounds in explicit pointers, so where a pointer has passed through a
variable or out-of-line function, and probably certain kinds of
conversions/casts, the bound information would become 'unknown --
can't check this one'. And I suspect so much C code does (and would
still do) this as to make the feature worth little, though legal.

In C++ you can create a bounded_ptr (probably template) which
keeps/passes (and checks) a bound in exactly the cases you specify a
bounded_ptr, and no others, and I think, without actually working out
the details, that you can have the (visible) code optimize away in
cases determined at compile time. Or you can write a template
function that deduces the bound of an array parameter where known, or
uses 'unknown' for a pointer; to avoid code bloat you could (and I
would) have it inline into code that passes the bound explicitly to a
shared implementation, but then AFAICT you can't prevent a caller from
cheating and directly specifying a wrong bound.

- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top