Making Fatal Hidden Assumptions

Andrew Reilly · Mar 25, 2006

It simply doesn't make sense to do things that way since the only purpose is
to allow violations of the processor's memory protection model. Work with
the model, not against it.

Why force the AS/400's restrictive memory model on coders for all other
architectures? The "use the subset that's known to work anywhere"
argument is no kind of absolute: it's a moveable trade-off. Some systems
are just so restricted or so different, and represent such a small portion
of the market that it doesn't make sense to accommodate them: they need to
accommodate the wider community, if they want to join in that particular
game. Now, obviously, the AS/400 memory model managed to sneak it's nose
into the C standards tent one day when the standards body was feeling
particularly inclusive, and now we're all saddled with it (to mix a few
metaphors.)

Andrew Reilly · Mar 25, 2006

I asked in comp.std.c whether the AS/400 actually influenced the C
standard. Here's a reply from P.J. Plauger:

] AS/400 might have been mentioned. Several of us had direct experience
] with the Intel 286/386 however, and its penchant for checking anything
] you loaded into a pointer register. IIRC, that was the major exmaple
] put forth for disallowing the generation, or even the copying, of an
] invalid pointer.

I don't understand this argument. The 286/386 doesn't even *have* pointer
registers, as such. It has segment descriptors, which can be used to make
things complicated, if you want to, but when you use a 286 as the 16-bit
machine that it is, then there is no issue here at all. Similarly, the
386 can be used as a perfectly reasonable "C machine", and generally is,
these days. It only gets curly when you try to synthesize an extended
address range out of it. Unfortunately, the dominant compiler and platform
made a hash of that, rather than putting in the effort to make it work in
a (more) reasonable way.

Since that particular platform is (thankfully) falling into obsolescence,
can't we start to consider tidying up the standard, to allow more
traditional, idiomatic, symmetrical codeing styles? Restore
pointer-oriented algorithm expressions to their place of idempotic
symmetry with index-oriented expressions? Please?

Chris Torek · Mar 25, 2006

I don't understand this argument. The 286/386 doesn't even *have* pointer
registers, as such. It has segment descriptors, which can be used to make
things complicated, if you want to, but when you use a 286 as the 16-bit
machine that it is, then there is no issue here at all.

It has a 20-bit architecture, and people did (and still do) use it
that way.

Since that particular platform is (thankfully) falling into obsolescence,
can't we start to consider tidying up the standard, to allow more
traditional, idiomatic, symmetrical codeing styles?

And now the x86-64 is coming, and everything old will be new again.

Keith Thompson · Mar 25, 2006

Andrew Reilly said:
I asked in comp.std.c whether the AS/400 actually influenced the C
standard. Here's a reply from P.J. Plauger:

] AS/400 might have been mentioned. Several of us had direct experience
] with the Intel 286/386 however, and its penchant for checking anything
] you loaded into a pointer register. IIRC, that was the major exmaple
] put forth for disallowing the generation, or even the copying, of an
] invalid pointer.

Click to expand...

I don't understand this argument. The 286/386 doesn't even *have* pointer
registers, as such. It has segment descriptors, which can be used to make
things complicated, if you want to, but when you use a 286 as the 16-bit
machine that it is, then there is no issue here at all. Similarly, the
386 can be used as a perfectly reasonable "C machine", and generally is,
these days. It only gets curly when you try to synthesize an extended
address range out of it. Unfortunately, the dominant compiler and platform
made a hash of that, rather than putting in the effort to make it work in
a (more) reasonable way.

I don't know enough about the 286/386 architecture(s) to offer any
meaningful commentary on this. Possibly some committee members
thought that future architectures might take some ideas from the
286/386 and extend them.

Since that particular platform is (thankfully) falling into obsolescence,
can't we start to consider tidying up the standard, to allow more
traditional, idiomatic, symmetrical codeing styles? Restore
pointer-oriented algorithm expressions to their place of idempotic
symmetry with index-oriented expressions? Please?

The only way that's going to happen is if somebody (1) comes up with a
specification and (2) pushes it through the committee. Advocating it
in comp.lang.c won't get it done.

Step 1 means, for each pointer operation, either specifying its
semantics, or stating that the behavior is either
implementation-defined, unspecified, or undefined. Once you get into
the details, you can expect a lot of arguments, such as people
pointing out that the suggested required semantics won't necessarily
work on some real-world system(s).

Step 2 is left as an exercise.

Or you can create your own language, or you can limit your development
to implementations that you *know* meet your requirements (which go
beyond the requirements of the current standard).

Keith Thompson · Mar 26, 2006

Chris Torek said:
It has a 20-bit architecture, and people did (and still do) use it
that way.

And now the x86-64 is coming, and everything old will be new again.

As far as I can tell, the x86-64 uses (or at least is capable of
using) a flat 64-bit address space.

Andrew Reilly · Mar 26, 2006

It has a 20-bit architecture, and people did (and still do) use it
that way.

It's vaguely plausible to call the VM86 (real-mode) x86 arch 20-bit, but
it's a stretch, as no processor-visible registers, and no ALU ops are
20-bits long. It's 16-bit in the same sense that the later PDP-11s with
various memory extension schemes were 16-bit. It still gets used, to some
extent, because it's the boot environment of PCs.

The 286 could plausibly be called a 24-bit segmented machine, and shares
much of the memory model from it's IBM FS, OS/36 (which grew up to be
AS/400) and intel 432 anticedants. A nice protected architecture for
Pascal, PL/1, COBOL, and other HLL's of the age. You certainly couldn't
call it a "C machine" other than when used within it's 16-bit, flat memory
model (small) modes. Everything else required language extensions ("near"
and "far" pointers), and any pointer misbehaviour sanctioned by the
standard and by the implmentations could reasonably be said to be limited
to those extensions, anyway. The fact that as much milage was had out of
C in that environment is a testament to the industry's determination and
enthusiasm. When compilation was done so that non-standard pointer
extensions weren't required in the source, then it should have been the
system run-time that gave ground, rather than the standard. I doubt very
much that any new development work is being done in 286 protected mode,
anywhere.

And now the x86-64 is coming, and everything old will be new again.

The x86-64 is a lovely architecture for a C machine. Specifically, it has
jetissoned much of the segmentation issues. All 64-bits worth of address
space can be loaded into any "pointer" register, and manipulated with the
full compliment of integer and logical operations (because the pointer
registers are also the integer ALU registers), and the only time you can
ever get a peep out of a trap handler is if you try to actually
access memory at an address not mapped into the process' address space.

CBFalconer · Mar 26, 2006

Keith said:
.... snip ...

The only way that's going to happen is if somebody (1) comes up
with a specification and (2) pushes it through the committee.
Advocating it in comp.lang.c won't get it done.

Step 1 means, for each pointer operation, either specifying its
semantics, or stating that the behavior is either
implementation-defined, unspecified, or undefined. Once you get
into the details, you can expect a lot of arguments, such as
people pointing out that the suggested required semantics won't
necessarily work on some real-world system(s).

Step 2 is left as an exercise.

Or you can create your own language, or you can limit your
development to implementations that you *know* meet your
requirements (which go beyond the requirements of the current
standard).

We already have an ugly example of this process, in C# and the
entire .NET hoax, from people with more influence (and money) than
Mr Reilly.

--
Some informative links:
http://www.geocities.com/nnqweb/
http://www.catb.org/~esr/faqs/smart-questions.html
http://www.caliburn.nl/topposting.html
http://www.netmeister.org/news/learn2quote.html

Paul Keinanen · Mar 26, 2006

Considering that s is probably already in an address register, doing the
manipulation your way would require transferring it to an integer register,
doing the decrement, then doing the increment, then transferring it back to
an address register when it's needed for dereferencing. Why do that when
you can adjust the address register directly?

We have just been discussing in dozens of messages

that this would
trap on AS/400 and that trap could not be ignored.

By doing the calculations in integer registers this problem can be
avoided. Going this route would only be necessary when such
problematic expressions exists in the source code, not always.

Paul

Richard G. Riley · Mar 26, 2006

As far as I can tell, the x86-64 uses (or at least is capable of
using) a flat 64-bit address space.

Your caveat covers you. It can have a flat address space, but also has
its legacy "hw mode" allowing 16 & 32 bit stuff to see the relevant
addressing space.

Stephen Sprunk · Mar 26, 2006

Jordan Abel said:
Because it's a stupid memory protection model.

Why can't the trap be caught and ignored?

It can't be ignored because (apparently) the AS/400 and similar machines
only do permission checks on pointer formation. Once the pointer is formed,
accesses do not need permission checks. If you were able to ignore the trap
on formation, that would mean all pointer accesses would be exempt from the
security model.

Personally, I'd rather have my processor trap when an invalid pointer is
formed, since in my code such an occurrence is _always_ a bug. Waiting
until the pointer is dereferenced makes it significantly harder to debug.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Aaron Sorkin

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***

Keith Thompson · Mar 26, 2006

Keith Thompson said:
As far as I can tell, the x86-64 uses (or at least is capable of
using) a flat 64-bit address space.

The piece I missed is that an x86-64 system can run 32-bit code. If I
compile and run a program on an x86-64 system, it uses 64-bit
pointers. If I compile a program on an x86-32 system and copy the
executable to an x86-64 system, it runs properly and uses 32-bit
pointers. (At least on the systems I have access to.)

Chris Torek · Mar 26, 2006

The piece I missed is that an x86-64 system can run 32-bit code. If I
compile and run a program on an x86-64 system, it uses 64-bit
pointers. If I compile a program on an x86-32 system and copy the
executable to an x86-64 system, it runs properly and uses 32-bit
pointers. (At least on the systems I have access to.)

Yes. I am not saying that x86-64 has re-created the old 80x86
segmentation model. No, this is merely the thin end of the wedge.
Segmentation will come back, sooner or later.

Keith Thompson · Mar 26, 2006

Chris Torek said:
Yes. I am not saying that x86-64 has re-created the old 80x86
segmentation model. No, this is merely the thin end of the wedge.
Segmentation will come back, sooner or later.

Why does it need to?

If we restrict the discussion to hosted environments, the trend seems
to be toward 64-bit systems. That provides an address space that
should be big enough for at least several decades, even assuming
exponential grown in memory sizes. A flat 64-bit virtual address
space should be the simplest way to manage this, and the need to run
32-bit code should diminish over time.

Segmentation done right could be useful for bounds checking; assigning
a segment to each malloc()ed chunk of memory, and for each declared
object, could nearly eliminate buffer overruns. But it hasn't really
been done yet, and I'm not convinced it will be in the future.

Why do you think segmentation will come back?

Chris Torek · Mar 26, 2006

Segmentation done right could be useful for bounds checking; assigning
a segment to each malloc()ed chunk of memory, and for each declared
object, could nearly eliminate buffer overruns. But it hasn't really
been done yet, and I'm not convinced it will be in the future.

Something like this *is* done on the AS/400.

Why do you think segmentation will come back?

When done right, it works quite well (see the AS/400) and allows
single-level-store (with "capability" protections). This is a very
functional and fast model (and it is "multiprocessor-friendly" and
has other good properties).

(Right now, one big penalty for context switches in general is that
you lose cached data: TLBs, and RAM-cache in virtual cache systems.
This is partly patched-up, in some architectures at least, by
tagging TLB entries with "address space identifiers" and doing
flushes only when running out of ASIDs, but this is a kludge.)

Richard G. Riley · Mar 26, 2006

This rule essentially means that *p-- is an invalid access mechanism,
unless peculiar care is taken to exit loops early, while *p++ is valid,
*only* because they made a particular exception for that particular case,
because they figured that C compilers on AS/400 systems could afford to
over-allocate all arrays by one byte, so that that last p++ would not
leave the pointer pointing to an "invalid" location. That's a hack, plain
and simple.

Having written a lot of low level stuff in years gone by in assembler,
c and c++ I have to agree with you. For *p-- to be invalid when we are
looking at possible home brew memory allocations and cleverly aligned
objects while allowing an out of range *p++ is a tad
inconsistent. Having said that I dont think I ever had any such
trap/breakdown so maybe I was lucky or too careful.

Dik T. Winter · Mar 26, 2006

>
> Because it's a stupid memory protection model.
>
> Why can't the trap be caught and ignored?

It can be ignored. But the result is that the operation is a no-op. Again
consider:
char a[10];
char *p;
p = a - 1;
p = p + 1;
what is the value of p after the fourth statement if the trap in the third
statement is ignored?

Andrew Reilly · Mar 26, 2006

Because it's a stupid memory protection model.

Why can't the trap be caught and ignored?

Click to expand...

It can be ignored. But the result is that the operation is a no-op. Again
consider:
char a[10];
char *p;
p = a - 1;
p = p + 1;
what is the value of p after the fourth statement if the trap in the third
statement is ignored?

The trap isn't ignored. There is no trap: the platform's "sane C memory
model" compiler and run-time system updated p.array_index to -1 and
p.array_base to a.array_base at the third line, as expected. The trap
would be left enabled, so that it would actually hit if/when a real
pointer was formed from &p.array_base[p.C_pointer_index] if/when *p was
ever referenced in the subsequent code.

Consequently, the above code leaves p == a, as expected, and no trap is
encountered. Neat, huh?

Dave Thompson · Mar 26, 2006

How do you get access to the condition bits?

On some PDP-11 models (only), the PSW is also addressable as memory,
somewhere in the vicinity of 0177660; I don't recall exactly.

Admittedly, even among the CPU design(er)s that do use condition codes
I know of no others that provided this option for accessing them.

(I'm not counting cases where an interrupt or trap, and sometimes at
least some calls, saves state including the CC on the stack or in
memory. That's much more common.)

- David.Thompson1 at worldnet.att.net

Richard Bos · Mar 27, 2006

Andrew Reilly said:
And this is the behaviour that is at odds with idiomatic C.

_Whose_ idiom? No programmer I'd respect writes such code intentionally.

Richard

Paul Keinanen · Mar 27, 2006

If you are just interested in zero, negative, signed and unsigned
overflows, you do not need to read directly these bits. Using
conditional branches, you can determine which bits are set. In any
sensible architecture the conditional branch instructions do not alter
these bits, so by combining conditional branches, multiple bits (such
as C and V) can be obtained. The state of negative and zero bits can
easily be determined in C-language, however, getting carry, signed
overflow half-carry etc. is very problematic.

On some PDP-11 models (only), the PSW is also addressable as memory,
somewhere in the vicinity of 0177660; I don't recall exactly.

The PSW and all the general purpose registers (and hence you could get
the address of a register

are available in the 8 KiB I/O page
which is in the top of the physical memory starting at different
physical addresses in systems with 16, 18 or 22 physical address bits.

For 18 and 22 address bit systems, the I/O address had to be mapped to
the 64 KiB program address space, which on most operating systems
required special privileges and consumed 8 KiB of your precious 64 KiB
program address space. If you only needed the PSW, direct mapping of
the I/O would usually avoided by using the trapping mechanism. I used
it mostly to alter the trace bit, but of course, you could get also
the N, Z, C, V bits at once.

Paul

Universal BMP Steganography Tool (AES-128-CTR + SP800-90A CSPRNG) Full Encoder/Decoder with 3LSB Payload, PasswordDerived Key & External Key File	4	Mar 26, 2026
Unable to read input from keyboard, in below C code, for a BST.	0	Jul 20, 2025
RSA implementation issues in public key pem loader function	0	May 21, 2025
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
The Horror of pointers...	5	Jan 11, 2025
C pipe	1	Dec 9, 2021
Mini Web Server in C++ (Part One)	4	Oct 2, 2025
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 22, 2021

Making Fatal Hidden Assumptions

Andrew Reilly

Andrew Reilly

Chris Torek

Keith Thompson

Keith Thompson

Andrew Reilly

CBFalconer

Paul Keinanen

Richard G. Riley

Stephen Sprunk

Keith Thompson

Chris Torek

Keith Thompson

Chris Torek

Richard G. Riley

Dik T. Winter

Andrew Reilly

Dave Thompson

Richard Bos

Paul Keinanen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads