Sizes of pointers

James Kuyper · Aug 2, 2013

I'm not sure how helpful that really is. Linearity within each
object is important, but as long as I can get a unique address for
each object (including each byte within each object), why should
I care how addresses of distinct objects relate to each other
(apart from "==" and "!=" working properly)?

The ability to meaningfully compare pointers for order, even if they're
not pointers into or one past the end of the same array, would allow you
to test whether a given pointer points within a given block of memory by
just performing two comparisons - one with each end of that block. If
all you can count on is "==" and "!=", your only choice is to iterate
over the entire block, checking each possible location within the block
for equality.

Comparisons for order that provide a consistent total ordering of all
pointers also make it possible to sort a table of arbitrary pointers,
and, for instance perform a binary search on that table to locate a
pointer in that table.

In C++, std::less<T*>(p,q) is required to return p<q, except that it is
supposed to proved a total ordering, even if < does not. I gather that
there exist platforms for which std::less(p,q) is substantially slower
than p<q, because of the hoops the C++ standard library has to jump
through in order to meet that requirement.

James Kuyper · Aug 2, 2013

On 08/02/2013 02:33 PM, Stephen Sprunk wrote:
....

The latter would work for portable code, but there's a lot of code out
there that assumes relative comparison or subtraction of pointers to
different objects is meaningful, i.e. a flat linear address space. I
see no need to gratuitously break such code.

I see a positive advantage to breaking such code, so it can be fixed.

Kenny McCormack · Aug 2, 2013

On 08/02/2013 02:33 PM, Stephen Sprunk wrote:
...

I see a positive advantage to breaking such code, so it can be fixed.

A point-of-view that only a CLCer could appreciate.

Keith Thompson · Aug 2, 2013

James Kuyper said:
The ability to meaningfully compare pointers for order, even if they're
not pointers into or one past the end of the same array, would allow you
to test whether a given pointer points within a given block of memory by
just performing two comparisons - one with each end of that block. If
all you can count on is "==" and "!=", your only choice is to iterate
over the entire block, checking each possible location within the block
for equality.

Comparisons for order that provide a consistent total ordering of all
pointers also make it possible to sort a table of arbitrary pointers,
and, for instance perform a binary search on that table to locate a
pointer in that table.

In C++, std::less<T*>(p,q) is required to return p<q, except that it is
supposed to proved a total ordering, even if < does not. I gather that
there exist platforms for which std::less(p,q) is substantially slower
than p<q, because of the hoops the C++ standard library has to jump
through in order to meet that requirement.

Perhaps a hypothetical JVM C implementation could jump through similar
hoops -- though determining when that's necessary could be nasty.

You *could* model the C abstract machine's memory as a single
continguous byte array, but I like the idea of a distinct array for each
top-level object, which could give you some nice error checking.
Compiling and running C code under such an implementation could be a
good way to track down certain kinds of bugs.

Stephen Sprunk · Aug 2, 2013

VAX put user programs at the bottom of the 32 bit address space, with
the system space at the top. VAX was around a long time before people
could afford 4GB. I don't know that it was described as signed, but
the effect was the same.

Currently no-one can fill a 64 bit address space, so there are tricks
to avoid the overhead of working with it.

To avoid huge page table, VAX and S/370 use a two level virtual
address system. (VAX has pagable page tables, S/370 has segments and
pages.) Continuing that, z/Architecture has five levels of tables. It
should take five table references to resolve a virtual address (that
isn't in the TLB) but there is a way to reduce that for current sized
systems.

x86 is similar, with 2* levels of page tables in 32-bit mode, 3* levels
in 36-bit mode, and 4* levels in 48-bit mode--the largest available on
current implementations. (56-bit and 64-bit modes, with 5 and 6 levels
of page tables, would be straightforward extensions.)

Most importantly, though, all pointers must be sign-extended, rather
than zero-extended, when stored in a 64-bit register. You could view
the result as unsigned, but that is counter-intuitive and results in an
address space with an enormous hole in the middle. OTOH, if you view
them as signed, the result is a single block of memory centered on zero,
with user space as positive and kernel space as negative. Sign
extension also has important implications for code that must work in
both x86 and x86-64 modes, e.g. an OS kernel--not coincidentally the
only code that should be working with negative pointers anyway.

(* Assuming 4kB pages; there are also 2MB, 4MB, and 1GB pages in various
modes, which remove the need for the last level or two of page tables,
but they're rarely used in practice.)

For x86 processors, the bottom of the (physical) address space is
interrupt vectors, and execution starts at the top. (I believe that
is still true.) That conveniently allows RAM at the bottom, and ROM
at the top.

x86's physical address space isn't terribly interesting unless you're in
real mode, which has its own unique challenges.

S

Stephen Sprunk · Aug 3, 2013

So, you're proposing hard-coding the size of various types, which
means the code would gratuitously break when ported to a system
that doesn't have pointers and/or integers of exactly the specified
size? How does that improve portability in any meaningful way?

Click to expand...

if i have one generic operation *:AxA->A and AcN [can be A=32
bit integers unsigned for example A=0..0xFFFFFFFF or 64 bit integers
etc]

if math operations * is not the same thru machines[the * has to
associate the same numbers], and number elements object A contain
are not the same thru machines, than the operation * can not be
portable thru machines that want use it

The only operations that C defines for pointers are:

pointer â† pointer + integer

and

integer â† pointer - pointer

so I'm not sure that your logic above applies.

Nor does C require the representation of pointers to be unsigned or even
integral, and there are examples of systems that are not so.

Addition and subtraction work identically with signed or unsigned
integers in a twos-complement system, but there's also no requirement
that systems use twos complement representation, and there are examples
of systems that do not.

and this for all operations this is the meaning of portability in my
way of see any other is error and origin for UB for me

we associate different meanings for the word "portability"

I use "portability" to refer to code working on different systems
without change. What do _you_ think it means?

S

Stephen Sprunk · Aug 3, 2013

I'm not sure that's quite accurate.

At the ISA levels, pointers on IPF, both code and data, are all 64
bit.

What the (mostly) standard ABI does is put entry points into a
structure that contains some other values the called routine might
need, like the pointer to it local data area. ...
Calls via (most) function pointers, tend to have to use to full
mechanism, since you don't really know where they're going.

If you need a 64-bit code pointer _and_ a 64-bit data pointer to call a
function, then one can reasonably (for the sake of brevity) call that a
128-bit function pointer.

S

Malcolm McLean · Aug 3, 2013

You *could* model the C abstract machine's memory as a single
continguous byte array, but I like the idea of a distinct array for each
top-level object, which could give you some nice error checking.
Compiling and running C code under such an implementation could be a
good way to track down certain kinds of bugs.

That turns malloc() into a magic function.
In standard C it's just something you can implement by allocating out of
an arena. In fact I have implemented such schemes, for real programs.

But malloc() is a bit special, because it's your Turing machine's infinite
tape. A bit-shuffling function can never fail, except in two situations,
a programming error, or malloc() returning zero.

Keith Thompson · Aug 3, 2013

Malcolm McLean said:
That turns malloc() into a magic function.
In standard C it's just something you can implement by allocating out of
an arena. In fact I have implemented such schemes, for real programs.

But malloc() is a bit special, because it's your Turing machine's infinite
tape. A bit-shuffling function can never fail, except in two situations,
a programming error, or malloc() returning zero.

There's nothing magic about it. For a system with a monolithic address
space, treating part of that space as an arena and allocating chunks of
it is the obvious way to implement malloc. For a system without such a
monolithic address space that has some mechanism for allocating
genuinely disjoint objects, that's the obvious way to implement malloc.

And the C standard says nothing implying that one method or the other is
preferred.

Rosario1903 · Aug 3, 2013

On 31-Jul-13 12:30, Rosario1903 wrote:
if people has need of 64 bit pointers to u32 somehting
p64u32 *p;

Click to expand...

....
if i have one generic operation *:AxA->A and AcN [can be A=32
bit integers unsigned for example A=0..0xFFFFFFFF or 64 bit integers
etc]
if math operations * is not the same thru machines[the * has to
associate the same numbers], and number elements object A contain
are not the same thru machines, than the operation * can not be
portable thru machines that want use it

Click to expand...

....
The only operations that C defines for pointers are:

pointer ? pointer + integer

and

integer ? pointer - pointer

so I'm not sure that your logic above applies.

Nor does C require the representation of pointers to be unsigned or even
integral, and there are examples of systems that are not so.

integer... i have seen somewhere pointeraddress/ unsigned too and
pointerADDRESS*unsigned too

Addition and subtraction work identically with signed or unsigned
integers in a twos-complement system, but there's also no requirement
that systems use twos complement representation, and there are examples
of systems that do not.

I use "portability" to refer to code working on different systems
without change. What do _you_ think it means?

so portability is to have the set of operator and function
that are the same for all machine, where the same means that
definition set of each function [or operator] is the same
and each function [or operator] associate the same elemtents

there is no other definition for portability...

Rosario1903 · Aug 3, 2013

integer... i have seen somewhere pointeraddress/ unsigned too and
pointerADDRESS*unsigned too

possibily i remember wrong

glen herrmannsfeldt · Aug 3, 2013

(snip, someone wrote)

(snip)
There's nothing magic about it. For a system with a monolithic address
space, treating part of that space as an arena and allocating chunks of
it is the obvious way to implement malloc. For a system without such a
monolithic address space that has some mechanism for allocating
genuinely disjoint objects, that's the obvious way to implement malloc.

And the C standard says nothing implying that one method or the other is
preferred.

But what about tagged memory systems, where you have to know the
type when allocating? (Like Burroughs, I believe used to make.)

Also, JVM requires that you know the type you are allocating.

-- glen

Rosario1903 · Aug 3, 2013

On 08/02/2013 02:33 PM, Stephen Sprunk wrote:
...

malloc() as found in K&RII book would be as above, right?
it return mem for each object right?

I see a positive advantage to breaking such code, so it can be fixed.

if malloc is one as above, for what can count, i disagree

Rosario1903 · Aug 3, 2013

possibily i remember wrong

i remember something as

(pointerADDRESS%x)!=0
where x=2,4,8 for see the align for pointerAddress

Rosario1903 · Aug 3, 2013

I use "portability" to refer to code working on different systems
without change. What do _you_ think it means?

Click to expand...

so portability is to have the set of operator and function
that are the same for all machine, where the same means that
definition set of each function [or operator] is the same
and each function [or operator] associate the same elemtents

there is no other definition for portability...

Click to expand...

But that *is* what C provides, so long as you do things within the
bounds of defined behavior (somewhat simplified). So you *can*
subtract two pointers to point to the same object, but *not* two
pointers that point to different objects. The latter is explicitly
undefined by the C standard. But yes, C lets you do many dangerous
and undefined (from the point of view of the standard) things, and by
definition those are *not* portable.

Arguably you'd want a language where there is no (or very little)
undefined behavior for most programming, and we have those, from Ada
to Java. C is deliberately low level and deliberately exposes some
things, because that suits the main purpose of C.

if you are happy with few UB and complex model memory definition...
if that is not so important, or the easy mathematical model has many
wrong results in concrete programs running etc.... ok

i prefer always search [the easiest if i see it] mathematical way of
resolve some problem, that one that use functions, set N R and their
subset etc

James Kuyper · Aug 3, 2013

(snip, someone wrote)

But what about tagged memory systems, where you have to know the
type when allocating? (Like Burroughs, I believe used to make.)

I've never used such systems, but I would expect that it has to be
possible to bypass the type tag. At the very least, it should be
possible to allocate some big block of some unsigned data type, and them
use software emulation to store signed, floating point, or pointer data
in that memory. It might be extremely slow depending upon how difficult
it is to perform the emulation, but it should be feasible on any machine
which can implement the equivalent of C's integer +, -, *, /, and %
operators on at least one such data type.

James Kuyper · Aug 3, 2013

On 02-Aug-13 03:34, Rosario1903 wrote: ....

if i have one generic operation *:AxA->A and AcN [can be A=32
bit integers unsigned for example A=0..0xFFFFFFFF or 64 bit integers
etc]

if math operations * is not the same thru machines[the * has to
associate the same numbers], and number elements object A contain
are not the same thru machines, than the operation * can not be
portable thru machines that want use it

Click to expand...

The only operations that C defines for pointers are:

pointer â† pointer + integer

and

integer â† pointer - pointer

I think you meant to restrict your statement more closely than you
actually did. It's perfectly true that there's a lot of operators that
you can apply to integer types that you cannot apply to pointers,
including all of the ones relevant to his claim: ^, ~, binary*,
pointer+pointer, /, %, >>, <<, binary &, and binary |.

However, all of the following operations are defined for pointers: (),
unary *, unary &, [], function call, ->, ++, --, !, sizeof, _Alignof(),
cast, <, >, <=, >=, ==, !=, &&, ||, ?=, =, ?:, +=, -=, and the comma
operator.

Tim Rentsch · Aug 3, 2013

Bart van Ingen Schenau said:
Can you give an example how the DS9000 could make a conversion between
pointers to structs fail, given that the pointers involved are required
to have the same representation and alignment (and that the intent of
that requirement is to allow for interchangeability)?

The pointers have the same alignment, but what they point to need
not have the same alignment. A simple example:

struct smaller { int x; }; // assume size == alignment == 4
struct larger { double z; }; // assume size == alignment == 8
union {
struct smaller foo[2];
struct larger bas;
} it;

(struct larger *) &it.foo[1]; // incorrect alignment

The last line exhibits undefined behavior, explicitly called out
as such under 6.3.2.3 p7. I expect most implemenations won't
misbehave in such cases, but the Standard clearly allows them to.

A DS9000 could simply choose to check the resulting alignment
directly, perhaps using a HCFBA instruction (that's the "halt and
catch fire on bad alignment" opcode).

Tim Rentsch · Aug 3, 2013

James Kuyper said:
Well, the designers of the DS9000 are notorious for ignoring the intent
of the standard; some have even claimed that they go out of their way
to violate the intent.

The way in which the conversions described above fail is considered
evidence in favor of that claim. They fail for no better reason than
the fact that the result of each such conversion is incremented by 1
from the value you might normally have expected to see. There's no
plausible reason why the DS9000 should do anything of the sort, but
there's nothing in the standard to prohibit it.

An incomprehensible explanation. Pointer conversions like those
described above have defined behavior if and only if the converted
pointer value is suitably aligned for the target type, detailed
very clearly in 6.3.2.3 p7.

Tim Rentsch · Aug 3, 2013

glen herrmannsfeldt said:
James Kuyper said:

On 08/01/2013 11:45 AM, Bart van Ingen Schenau wrote:
(snip)

Click to expand...

Well, the designers of the DS9000 are notorious for ignoring the intent
of the standard; some have even claimed that they go out of their way to
violate the intent.

Click to expand...

Some years ago I was wondering about the possibility of generating
JVM code as output of a C compiler. [snip elaboration]

Doesn't anyone bother to look this stuff up? There are a handful
of existing C-to-JVM compilers or translators, the oldest more
than 10 years old. Some suppport a full ANSI C runtime.

https://en.wikipedia.org/wiki/C_to_Java_Virtual_Machine_compilers

Different font sizes inside same div	2	Dec 3, 2023
Centering picture element for larger screen sizes	2	Sep 21, 2023
Can I use calc to change multiple parent sizes?	0	Nov 20, 2021
Pointers in python?	1	Feb 6, 2024
Different sizes of data and function pointers on a machine -- void*return type of malloc, calloc, an	23	Jun 25, 2012
Can I change the "root" value for rem sizes?	3	Jul 30, 2023
Help with pointers	1	Mar 13, 2022
Help fit different screen sizes	0	Dec 10, 2019

Sizes of pointers

James Kuyper

James Kuyper

Kenny McCormack

Keith Thompson

Stephen Sprunk

Stephen Sprunk

Stephen Sprunk

Malcolm McLean

Keith Thompson

Rosario1903

Rosario1903

glen herrmannsfeldt

Rosario1903

Rosario1903

Rosario1903

James Kuyper

James Kuyper

Tim Rentsch

Tim Rentsch

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads