undefined behavior or not undefined behavior? That is the question

  • Thread starter Mantorok Redgormor
  • Start date
M

Michael Wojcik

Which constraint does it violate?

I think Arthur's wrong. Both 9899-1990 and N869 have a constraint for
the * unary operator, but it says only that the operand shall have
pointer type, not that the pointed-to type be complete. Contrast that
with, say, the constraint for the sizeof operator, which specifically
forbids incomplete types (though not, obviously, pointer types where
the pointed-to type is incomplete). Of course I may have missed some
other constraint elsewhere in the standard.

Also contrast with the constraint for array subscripting, which requires
that the first operand have "pointer to object type" - so dereferencing
a pointer to an incomplete type using array subscripting syntax would
be a constraint violation. (By 6.1.2.5 (C90) / 6.2.5 (N869), incomplete
types are not object types, so pointers to incomplete types are not
pointers to object types.)

In short:

extern struct foo *P;
*P; /* no constraint violation */
P[0]; /* constraint violation */
 
D

Dan Pop

In said:
[on "PDP-endian"-ness]

The PDP-11's successor, the VAX, had fixed that hardware design blunder
of its ancestor and was little endian over all the range of supported
types.

Well, except D-float -- but this is not an integral type and it has
"bit" as well as "byte" order issues, when one takes it apart
(because the bit-fields do not fall on "natural" byte boundaries).

And D-float was supported exclusively for PDP-11 compatibility reasons.
The "native" double precision format of the VAX was G-float (having a
far wider range than the D-float).
I am willing to bet that at least one CPU architecture will resurrect
mixed-endian integers in its 32-to-64-bit transition. (I suspect
it will be an existing little-endian architecture that is heavily
stack-oriented, too.)

Which one would be that?

Dan
 
P

pete

Dan said:
In <[email protected]>
Chris Torek said:
[on "PDP-endian"-ness]
There are exactly two byte orders that have survived and, by no
coincidence, they are the two entirely consistent ones.

I am willing to bet that at least one CPU architecture will resurrect
mixed-endian integers in its 32-to-64-bit transition. (I suspect
it will be an existing little-endian architecture that is heavily
stack-oriented, too.)

Which one would be that?

In the context of this newsgroup, I don't think it's worthwhile
to memorize a subset of the allowable configurations,
which are extant today. I don't even think it's valid to claim
that you know all the extant ones.

When is the Death Station 9000 paradigm,
(any allowable configuration which invalidates your code)
for the abstract machine, appropriate on this newsgroup ?
 
A

Arthur J. O'Dwyer

I think Arthur's wrong.

I guess I am. Having looked very carefully at N869, I don't see
any evidence that '*P', where P is a pointer to an incomplete type,
violates any constraints.
Both 9899-1990 and N869 have a constraint for
the * unary operator, but it says only that the operand shall have
pointer type, not that the pointed-to type be complete. Contrast that
with, say, the constraint for the sizeof operator, which specifically
forbids incomplete types (though not, obviously, pointer types where
the pointed-to type is incomplete). Of course I may have missed some
other constraint elsewhere in the standard.

Also contrast with the constraint for array subscripting, which requires
that the first operand have "pointer to object type" - so dereferencing
a pointer to an incomplete type using array subscripting syntax would
be a constraint violation. (By 6.1.2.5 (C90) / 6.2.5 (N869), incomplete
types are not object types, so pointers to incomplete types are not
pointers to object types.)

In short:

extern struct foo *P;
*P; /* no constraint violation */
P[0]; /* constraint violation */

I think you're right; but I don't understand why the asymmetry
here. Why does the Standard allow '*P' but not 'P[0]'? (It's
obvious enough why it doesn't allow 'P[foo]', but I would have
extended that to make '*P' invalid, rather than making a seemingly
arbitrary distinction between the two indirection operators.)

-Arthur
 
K

Kevin Bracey

Show us one concrete example.

C implementors are usually not the most stupid people around and they
understand quite well the need for a consistent byte order among all the
types.

All the realistic examples of mixed endianness are historical, from an
age where the byte order issues were not as well understood as they have
been for the last quarter of century.

Here are two examples illustrating Christian's exact point:

The ARM SDT toolset (1996-2001?) had long long numbers stored with the least
significant word first, even when the ARM was in big-endian mode, although
this was changed in the ARM ADS toolset.

The ARM FP instruction set provides IEEE single, double and 80-bit extended
double formats, but these are stored most significant word first, even on an
ARM in little-endian mode.

Another "historical", but still current example: many PC graphics cards
provide 4bpp modes where the pixel nibbles are packed big-endian within
bytes, but the bytes are packed little-endian within words.
 
J

Jeremy Yallop

Arthur said:
I think you're right; but I don't understand why the asymmetry
here. Why does the Standard allow '*P' but not 'P[0]'? (It's
obvious enough why it doesn't allow 'P[foo]', but I would have
extended that to make '*P' invalid, rather than making a seemingly
arbitrary distinction between the two indirection operators.)

The distinction is not entirely arbitrary: 'P[0]' is disallowed
because of the pointer arithmetic, not the indirection. Pointer
arithmetic /must/ be disallowed on incomplete types in general,
because the size of the referenced type is not known; making a special
exception to allow P[0] /would/ be arbitrary, especially since the
value of the subscript is not generally known at compile time.

Still, I consider it a flaw in the standard that no constraint
disallows indirection on pointers to incomplete types.

Jeremy.
 
D

Dan Pop

In said:
Dan said:
In <[email protected]>
Chris Torek said:
[on "PDP-endian"-ness]
There are exactly two byte orders that have survived and, by no
coincidence, they are the two entirely consistent ones.

I am willing to bet that at least one CPU architecture will resurrect
mixed-endian integers in its 32-to-64-bit transition. (I suspect
it will be an existing little-endian architecture that is heavily
stack-oriented, too.)

Which one would be that?

In the context of this newsgroup, I don't think it's worthwhile
to memorize a subset of the allowable configurations,
which are extant today. I don't even think it's valid to claim
that you know all the extant ones.

Even in the context of this newsgroup, it makes sense to distinguish
between realistically portable code and absolutely portable code.

If absolute portability comes at a cost, it's worth evaluating this cost
before choosing between a realistically portable solution and an
absolutely portable one. And this is usually the case when dealing with
byte order issues: if accomodating *any* byte order is too expensive in
terms of code complexity/performance, by restricting yourself to the
two popular byte orders you're portable to practically any hosted
implementation in current use.

Portability *exclusively* for portability's sake is a form of religion
few competent professionals are willing to embrace. YMMV.

Dan
 
K

Keith Thompson

If absolute portability comes at a cost, it's worth evaluating this cost
before choosing between a realistically portable solution and an
absolutely portable one. And this is usually the case when dealing with
byte order issues: if accomodating *any* byte order is too expensive in
terms of code complexity/performance, by restricting yourself to the
two popular byte orders you're portable to practically any hosted
implementation in current use.
[...]

Agreed. It's best to write code that doesn't care about byte order,
but of course that's not always possible.

Once you've made a decision to limit the portability of your program,
it's often a good idea to have an explicit check that your assumptions
aren't violated. It's always possible that someone might try to
compile and run your code on some exotic system with middle-endian
byte order or 13-bit bytes.

If you can check your assumptions at compile time, you can prevent the
program from compiling in the first place; for example:

#include <limits.h>

#if CHAR_BIT != 8
#error "This program is supported only on systems with 8-bit bytes."
#endif

... Code that assumes 8-bit bytes ...

If that's not possible, you can include a function that aborts the
program if your conditions are not met, and call it once at program
startup.

(Of course, you also have to decide whether this is worth the effort.)
 
D

Dan Pop

In said:
(e-mail address removed) (Dan Pop) writes:
[...]
If absolute portability comes at a cost, it's worth evaluating this cost
before choosing between a realistically portable solution and an
absolutely portable one. And this is usually the case when dealing with
byte order issues: if accomodating *any* byte order is too expensive in
terms of code complexity/performance, by restricting yourself to the
two popular byte orders you're portable to practically any hosted
implementation in current use.
[...]

Agreed. It's best to write code that doesn't care about byte order,
but of course that's not always possible.

Once you've made a decision to limit the portability of your program,
it's often a good idea to have an explicit check that your assumptions
aren't violated. It's always possible that someone might try to
compile and run your code on some exotic system with middle-endian
byte order or 13-bit bytes.

If you can check your assumptions at compile time, you can prevent the
program from compiling in the first place; for example:

#include <limits.h>

#if CHAR_BIT != 8
#error "This program is supported only on systems with 8-bit bytes."
#endif

... Code that assumes 8-bit bytes ...

If that's not possible, you can include a function that aborts the
program if your conditions are not met, and call it once at program
startup.

(Of course, you also have to decide whether this is worth the effort.)

And regardless of what you decide, you have to also document the fact
that the code relies on a particular set of assumptions. This way, you
can save the trouble of building and executing the program only to
discover (one way or another) that it doesn't work on that implementation.

There are still people who read the quick start documentation *before*
trying anything else ;-)

Dan
 
D

Dan Pop

In said:
At least two possible options, including:

#include <stddef.h>
#define NULL (void *)0

But, since he didn't use any of these, including <stdio.h> *was needed*.

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top