Making Fatal Hidden Assumptions

Keith Thompson · Mar 14, 2006

Rod Pemberton said:
BTW, I've heard one of the Pascal standards added pointers...

<OT>Pascal has always had pointers.</OT>

Keith Thompson · Mar 14, 2006

Ed Prochak said:
not if you can live with a WARNING message.

Assigning an integer value to a pointer object, or vice versa, without
an explicit conversion (cast operator) is a constraint violation. The
standard requires a diagnostic; it doesn't distinguish between
warnings and error messages. Once the diagnostic has been issued, the
compiler is free to reject the program. If the compiler chooses to
generate an executable anyway, the behavior is undefined (unless the
implementation specifically documents it).

An assignment without a cast, assuming the compiler accepts it (after
the required diagnostic) isn't even required to behave the same way as
the corresponding assignment with a cast -- though it's likely to do
so in real life.

C compilers commonly don't reject programs that violate this
particular constraint, because it's a common construct in pre-standard
C, but that's an attribute of the compiler not of the language as it's
now defined.

[...]

Steve O'Hara-Smith · Mar 14, 2006

I thought those died out. Were any those CPU's actually used in a computer
sufficiently advanced enough to compile C?

Texas Instruments had a range of machines based on a bitslice
design that became the basis of the TMS9900 processor design. I don't
recall if there was a C compiler for it but it certainly could have
supported one easily.

Andrew Reilly · Mar 14, 2006

But other HLL's don't even have register storage.

I know that it is just a suggestion. The point is Why was it included
in the language at all? Initially it gave the programmer more control.

Anecdote:

About ten years ago I did a project involving an AT&T/WE DSP32C processor
that had a very original-feeling AT&T K&R C compiler. This compiler did
essentially no "optimization", that I could see. It didn't even do
automatic register spill or fill (other than saves and restores at
subroutine entry and exit, of course): normal "auto" local variables
existed entirely in the stack frame, and had to be accessed from there on
every use, and "register" local variables existed entirely in registers:
specify too many in any context and the code wouldn't compile.

A very different (and somewhat more laborious) experience than
programming with a modern compiler of, say, gcc vintage, but it was
actually pretty easy to get quite efficient code this way. That compiler
really was very much like a macro assembler with expression parsing.

[The C code that resulted was very much DSP32C-specific C code.
That's why a "universal assembler" would want a more abstract notion of
register variables that corresponds quite closely to that of modern C.]

Cheers,

Chris Torek · Mar 15, 2006

Prochak said:
Prochak said:

[C] doesn't impose strict data type checking, especially between
integers and pointers.

Click to expand...

Click to expand...

not if you can live with a WARNING message.

Funny, I get an "error" message:

% cat t.c
void *f(void) {
return 42;
}
% strictcc -O -c t.c
t.c: In function `f':
t.c:2: error: return makes pointer from integer without a cast
%

Is my compiler not a "C compiler"?

(I really do have a "strictcc" command, too. Some might regard it
as cheating, but it works.)

(Real C compilers really do differ as to which diagnostics are
"warnings" and which are "errors". In comp.lang.c in the last week
or two, we have seen some that "error out" on:

int *p;
float x;
...
p = x;

and some that accept it with a "warning".)

CBFalconer · Mar 15, 2006

Keith said:
<OT>Pascal has always had pointers.

But not uncontrolled pointers, except in abortive quasi-Pascals
such as Borland/Delphi.
</OT>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Keith Thompson · Mar 15, 2006

CBFalconer said:
I think there is something specific, but even if not, it is
certainly implied. For example, if p points to the last item in an
array, p++ is valid. p-- will then produce a valid dereferencable
pointer, while p = NULL; p-- will not.

No, that doesn't follow.

This:
p = NULL;
p --;
normally invokes undefined behavior, but only because the behavior
isn't defined. As far as I can tell, there's actually no explicit
statement in the standard that arithmetic on a null pointer invokes
undefined behavior.

But assume that there exists an array
char foo[15];
such that foo+15==NULL:
char *p = foo + 15; /* p == NULL */
p --;

In this case, the behavior of "p --;" is defined by C99 6.5.6p8:

... and if the expression Q points one past the last element of an
array object, the expression (Q)-1 points to the last element of
the array object.

Allowing a pointer just past the end of an array to be equal to NULL
would be inconvenient (which is why I think the standard should be
corrected), but as far as I can tell it wouldn't actually violate the
standard.

CBFalconer · Mar 15, 2006

Chris said:
.... snip ...

(I really do have a "strictcc" command, too. Some might regard
it as cheating, but it works.)

Not me (regard it as cheating). I also have the equivalent,
through an alias. I call it cc. The direct access is called gcc.
The alias causes "cc <nothing>" to be translated into gcc --help.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Dik T. Winter · Mar 15, 2006

>
> guess I'm getting forgetful in my old age. (haven't touched PASCAL in
> over 10tears). I thought PASCAL defined fixed ranges for the datatypes
> like integers. I guess I didn't port enough PASCAL applications to see
> the difference. (I could have swore you'd get an error on X=32767+2 ;

In the original version of Pascal that was certainly *not* an error.
The range of integers was from -576460752303423487 to +576460752303423487,
although values in absolute value greater than 281474976710655 were
unreliable in some calculations. Do you know where the original limit
of set sizes to 60 elements comes from?

>
> Yes PASCAL and P-code, you have a point there, but I'm not sure it is
> in your favor.

*Not* P-code. The Pascal version I refer to predates P-code quite a bit.
P-code came in the picture when porting the language to other processors
came in the picture. The original compiler generated direct machine
instructions (and that could still be done on other architectures).

> C->native assembler->program on native hardware
> Pascal->program in p-code-> runs in p-code interpreter

In the original version of Pascal it was:
Pascal->program on native hardware
without an intermediate assembler or an interpreter.

>
> The point is why even include this feature?

It was included at a point in time when optimisation by some compilers
was not as good as you would wish. In a similar way the original version
of Fortran had a statement with which you could tell the compiler what
the probability was that a branch would be taken or not.

>
> So the PASCAL compiler was more advanced than the C compiler of the
> time. DO you think maybe it was due to PASCAL being a more abstract
> HLL than C might have had an effect here? (more likely though, it was
> PASCAL predated C, at least in widespread use.)

No, it was because the machine that compiler was running on was quite a
bit larger and faster than the machines C compilers tended to run on.

Dik T. Winter · Mar 15, 2006

> Dik T. Winter wrote: ....
>
> Well you cannot, but those processors did not even exist when C was
> created. So those features didn't make it. To some degree, C is more of
> a PDP assembler.

How do you get access to the condition bits?

Mark F. Haigh · Mar 15, 2006

Keith said:
Ed Prochak said:

Keith Thompson wrote:

Click to expand...

[...]

There's a continuum from raw machine language to very high-level
languages. Macro assembler is only a very small step up from
non-macro assembler. C is a *much* bigger step up from that. Some C
constructs may happen to map to single instructions for *some*
compiler/CPU combinations; they might map to multiple instructions, or
even none, for others. An assignment statement might copy a single
scalar value (integer, floating-point, or pointer) -- or it might copy
an entire structure; the C code looks the same, but the machine code
is radically different.

Using entirely arbitrary units of high-level-ness, I'd call machine
language close to 0, assembly language 10, macro assembler 15, and C
about 50. It might be useful to have something around 35 or so.
(This is, of course, mostly meaningless.)

That's really interesting, because I have pondered the question and
answered it in nearly the same way. However, the conclusion I came to
is that C occupies a relatively large range in the continuum, not a
single point.

C dialects that accept inline assembly push the lower bound much lower
than it would otherwise be. Likewise, the wealth of freely-available C
libraries push the upper bound much further up. You can run a
real-world system written in C all the way from the top to the bottom--
from the GUI (GNOME, for example), to the kernel; from the compiler to
the shells and interpreters.

As the continuum of C expands, the C standard (ideally) acts to reclaim
and make (semi-) portable chunks of this continuum. Look at
'restrict', for example. It enables portable annotation of alasing
assumptions so that encroachment of the lower bounds is not necessary
for better efficiency. Is it perfect? No. Does it need to be? No.
Is it workable? Yes.

Assembly language is usually untyped; types are specified by which
instruction you use, not by the types of the operands. C, by
contrast, associates types with variables. It often figures out how
to implement an operation based on the types of its operands, and many
operations are disallowed (assigning a floating-point value to a
pointer, for example).

I know the old joke that C combines the power of assembly language
with the flexibility of assembly language. I even think it's funny.
But it's not realistic, at least for C programmers who care about
writing good portable code.

I've also pondered that joke on many occasions. Each time I see it, I
think it's more and more of a compliment. But if it added that C is
usually more efficient than non-diety-level assembly, and that
well-written C is nearly pathologically portable, it wouldn't really be
a joke, would it?

In some respects, C is like English: overly succeptable to insane
degrees of corruption, but all the same, nearly universally understood
regardless, for better or for worse.

Mark F. Haigh
(e-mail address removed)

CBFalconer · Mar 15, 2006

Dik T. Winter said:
How do you get access to the condition bits?

With the usual gay abandon about extensions, you might define a
variable in system space, say _ccd, to hold those bits. You
specify the conditions under which it is valid, such as immediately
after an expression with precisely two operands, preserved by use
of the comma operator. Then:

a = b + c, ccd = _ccd;

allows you to detect overflow and other evil things. A similar
thing such as _high could allow capturing all bits from a
multiplication. i.e.:

a = b * c, ccd = _ccd, ov = _high;

tells you all about the operation without data loss.

Just blue skying here.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Keith Thompson · Mar 15, 2006

Mark F. Haigh said:
Keith said:

Ed Prochak said:

Keith Thompson wrote:

Click to expand...

[...]

Click to expand...

There's a continuum from raw machine language to very high-level
languages. Macro assembler is only a very small step up from
non-macro assembler. C is a *much* bigger step up from that. Some C
constructs may happen to map to single instructions for *some*
compiler/CPU combinations; they might map to multiple instructions, or
even none, for others. An assignment statement might copy a single
scalar value (integer, floating-point, or pointer) -- or it might copy
an entire structure; the C code looks the same, but the machine code
is radically different.

Using entirely arbitrary units of high-level-ness, I'd call machine
language close to 0, assembly language 10, macro assembler 15, and C
about 50. It might be useful to have something around 35 or so.
(This is, of course, mostly meaningless.)

Click to expand...

That's really interesting, because I have pondered the question and
answered it in nearly the same way. However, the conclusion I came to
is that C occupies a relatively large range in the continuum, not a
single point.

C dialects that accept inline assembly push the lower bound much lower
than it would otherwise be. Likewise, the wealth of freely-available C
libraries push the upper bound much further up. You can run a
real-world system written in C all the way from the top to the bottom--
from the GUI (GNOME, for example), to the kernel; from the compiler to
the shells and interpreters.

Standard C doesn't accept inline assembly. If some particular
compiler does so, it's an extension -- and you might as well think of
it as an assembler that also accepts C syntax.

As far as libraries are concerned, most of them aren't part of
standard C either -- and they needn't be either implemented in, or
called from, C.

Mark F. Haigh · Mar 15, 2006

Keith said:
Mark F. Haigh said:

Keith said:

Keith Thompson wrote:
[...]

Click to expand...

There's a continuum from raw machine language to very high-level
languages. Macro assembler is only a very small step up from
non-macro assembler. C is a *much* bigger step up from that. Some C
constructs may happen to map to single instructions for *some*
compiler/CPU combinations; they might map to multiple instructions, or
even none, for others. An assignment statement might copy a single
scalar value (integer, floating-point, or pointer) -- or it might copy
an entire structure; the C code looks the same, but the machine code
is radically different.

Using entirely arbitrary units of high-level-ness, I'd call machine
language close to 0, assembly language 10, macro assembler 15, and C
about 50. It might be useful to have something around 35 or so.
(This is, of course, mostly meaningless.)

Click to expand...

That's really interesting, because I have pondered the question and
answered it in nearly the same way. However, the conclusion I came to
is that C occupies a relatively large range in the continuum, not a
single point.

C dialects that accept inline assembly push the lower bound much lower
than it would otherwise be. Likewise, the wealth of freely-available C
libraries push the upper bound much further up. You can run a
real-world system written in C all the way from the top to the bottom--
from the GUI (GNOME, for example), to the kernel; from the compiler to
the shells and interpreters.

Click to expand...

Standard C doesn't accept inline assembly. If some particular
compiler does so, it's an extension -- and you might as well think of
it as an assembler that also accepts C syntax.

Thank you, Captain Pedantic. Is that response to me or the masses of
corruptable youths eagerly waiting to sprinkle their code with inline
assembly?

If the former, then my point was that in the realm of C dialects,
Standard C (ideally) occupies a slowly expanding middle ground. If the
latter, then kids, listen to Keith, he's right.

As far as libraries are concerned, most of them aren't part of
standard C either -- and they needn't be either implemented in, or
called from, C.

Again, true, but oblique to the point that it is possible to run a
fully functional _modern_ system using only code written in C dialects.
I don't believe any other language "family" can boast this (for
currently available hardware).

Mark F. Haigh
(e-mail address removed)

Keith Thompson · Mar 15, 2006

Mark F. Haigh said:
Thank you, Captain Pedantic. Is that response to me or the masses of
corruptable youths eagerly waiting to sprinkle their code with inline
assembly?

Was that intended as an insult?

This discussion is cross-posted to three different newsgroups (and
it's been interesting in spite of that). I'm reading and writing this
in comp.lang.c, where we discuss the C programming language as defined
by the standards. The standard specifically allows extensions, and we
sometimes discuss the permitted nature of those extensions, but
details about particular extensions (such as inline assembly) are
considered off-topic.

My comments were intended for anyone who cares to read them.

If the former, then my point was that in the realm of C dialects,
Standard C (ideally) occupies a slowly expanding middle ground. If the
latter, then kids, listen to Keith, he's right.

Again, true, but oblique to the point that it is possible to run a
fully functional _modern_ system using only code written in C dialects.
I don't believe any other language "family" can boast this (for
currently available hardware).

I have no argument with that point. I was merely making a different
point.

S.Tobias · Mar 16, 2006

Seems my second "of course" in not quite so. I wrote that because
I had always imagined that after an array there *is* something (albeit
possibly inaccessible), that past-the-end pointer points to.
Nice pictures are sometimes dangerous.

Interesting. I think you may have found a small flaw in the standard.

I intended my first sentence as a joke, actually. I could only take the
credit for finding anything if saying something stupid could count as
a discovery, too.

Since you've started a new discussion in c.s.c, I left some of my
thoughts there, too.

Mark F. Haigh · Mar 16, 2006

Keith said:
Was that intended as an insult?

Heh, it's just tongue-in-cheek; a little jab. All in good fun.

This discussion is cross-posted to three different newsgroups (and
it's been interesting in spite of that). I'm reading and writing this
in comp.lang.c, where we discuss the C programming language as defined
by the standards. The standard specifically allows extensions, and we
sometimes discuss the permitted nature of those extensions, but
details about particular extensions (such as inline assembly) are
considered off-topic.

True, but the topic was really a meta-discussion about where C dialects
fit in to the continuum of languages, and where Standard C fits into
the continuum of C dialects. It's topical enough for me, and I was
hoping to spur some genuine discussion.

<snip>

Mark F. Haigh
(e-mail address removed)

Chris Torek · Mar 16, 2006

Dik T. Winter said:
How do you get access [from C] to the [hardware's] condition bits?

Click to expand...

With the usual gay abandon about extensions, you might define a
variable in system space, say _ccd, to hold those bits. You
specify the conditions under which it is valid, such as immediately
after an expression with precisely two operands, preserved by use
of the comma operator. Then:

a = b + c, ccd = _ccd;

allows you to detect overflow and other evil things.

This turns out not to work very well in Real Compilers. The reasons
are outlined rather nicely in the GCC documentation:

It is a natural idea to look for a way to give access to the condition
code left by the assembler instruction. However, when we attempted to
implement this, we found no way to make it work reliably. The problem
is that output operands might need reloading, which would result in
additional following "store" instructions. On most machines, these
instructions would alter the condition code before there was time to
test it. This problem doesn't arise for ordinary "test" and "compare"
instructions because they don't have any output operands.

For reasons similar to those described above, it is not possible to
give an assembler instruction access to the condition code left by
previous instructions.

(from <http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Extended-Asm.html>).

(You *can* actually capture the condition codes, by doing the
operation itself *and* the condition-code-capture all in one
single inline __asm__, so that no reloading occurs between the
two parts. Getting it right is still fairly tricky. In many
cases you are better off just writing the desired routines in
assembly, and calling them as ordinary functions from the C code.)

Paul Keinanen · Mar 16, 2006

Dik T. Winter said:
Dik T. Winter said:

How do you get access [from C] to the [hardware's] condition bits?

Click to expand...

Click to expand...

With the usual gay abandon about extensions, you might define a
variable in system space, say _ccd, to hold those bits. You
specify the conditions under which it is valid, such as immediately
after an expression with precisely two operands, preserved by use
of the comma operator. Then:

a = b + c, ccd = _ccd;

allows you to detect overflow and other evil things.

Click to expand...

This turns out not to work very well in Real Compilers. The reasons
are outlined rather nicely in the GCC documentation:

It is a natural idea to look for a way to give access to the condition
code left by the assembler instruction. However, when we attempted to
implement this, we found no way to make it work reliably. The problem
is that output operands might need reloading, which would result in
additional following "store" instructions. On most machines, these
instructions would alter the condition code before there was time to
test it. This problem doesn't arise for ordinary "test" and "compare"
instructions because they don't have any output operands.

In many architectures, there are at least one way around this problem.
When an interrupt occurs, at least the processor status word
containing the condition codes and the return address are
automatically pushed on the stack (and possible some other registers).
This usually applies also to software generated traps. The ISR can
then copy the condition codes from the stack frame to a safer place.

There are of cause some problems, this would consume one trap
instruction, perhaps shareable with some debugger traps. With separate
user and kernel spaces, a trap will usually switch to kernel mode,
which would require a trap handler to be installed into kernel space
and would generate quite a lot of instructions due to mode switching
and safe copying between modes.

However, in single address space systems, such as most embedded
systems, this should be quite usable.

Paul

CBFalconer · Mar 16, 2006

Chris said:
CBFalconer said:

Dik T. Winter said:

How do you get access [from C] to the [hardware's] condition bits?

Click to expand...

With the usual gay abandon about extensions, you might define a
variable in system space, say _ccd, to hold those bits. You
specify the conditions under which it is valid, such as immediately
after an expression with precisely two operands, preserved by use
of the comma operator. Then:

a = b + c, ccd = _ccd;

allows you to detect overflow and other evil things.

Click to expand...

This turns out not to work very well in Real Compilers. The reasons
are outlined rather nicely in the GCC documentation:

It is a natural idea to look for a way to give access to the condition
code left by the assembler instruction. However, when we attempted to
implement this, we found no way to make it work reliably. The problem
is that output operands might need reloading, which would result in
additional following "store" instructions. On most machines, these
instructions would alter the condition code before there was time to
test it. This problem doesn't arise for ordinary "test" and "compare"
instructions because they don't have any output operands.

For reasons similar to those described above, it is not possible to
give an assembler instruction access to the condition code left by
previous instructions.

(from <http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/Extended-Asm.html>).

That's why I specified a limitation to a comma operator separated
statement, or some such. This would give an opportunity to use the
equivalent of "push psw" in the generated code, after which the
storage can be assigned and filled. By limiting to some specific
format we can avoid excessive overhead elsewhere. My mechanism is
hardly well thought out, it is only intended to trigger thoughts
from others. As I said, blue skying.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
C pipe	1	Dec 9, 2021
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
I need help making a zooming function	11	Dec 14, 2021
I need help making an html website	2	Aug 2, 2023
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Fibonacci	0	May 13, 2023
CUDA segmentation error I cannot resolve	0	Mar 7, 2024

Making Fatal Hidden Assumptions

Keith Thompson

Keith Thompson

Steve O'Hara-Smith

Andrew Reilly

Chris Torek

CBFalconer

Keith Thompson

CBFalconer

Dik T. Winter

Dik T. Winter

Mark F. Haigh

CBFalconer

Keith Thompson

Mark F. Haigh

Keith Thompson

S.Tobias

Mark F. Haigh

Chris Torek

Paul Keinanen

CBFalconer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads