Sizes of pointers

Keith Thompson · Aug 9, 2013

Lew Pitcher said:
Thanks, James

I learn something new every day. I'll have to get me a copy of the 2011
standard and read it asap

The latest draft is
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

One of the few differences between the draft and the published standard
is actually relevant here. The draft adds another exception to the rule
that an array expression is converted to a pointer, namely when it's the
operand of the new _Alignof operator. The published standard removes
this, because _Alignof (for unclear reasons) can only be applied to a
parenthesized type name, not to an expression

glen herrmannsfeldt · Aug 9, 2013

(snip on modulo and pointers)

No. YOU claim that I claimed the above. I said no such thing.
Reread what I wrote, and learn.

I don't understand your question. Perhaps you need to rephrase it.
(snip)

If you are referring to how the C code of an application program can
establish proper alignment of pointers, then there's a simple
answer: A C application should not attempt to *explicitly*
determine alignment.
Instead, it should depend on the C language to do that for it.
malloc()ed memory is guaranteed (by the C standard) to be
allocated such that it suits any system/compiler/runtime-imposed
alignment requirements.

As well as I understand it, it should satisfy the alignment
requirements, but isn't necessarily optimal. A processor might
slow down for some alignment, and the is no requirement that C
avoid such slow alignments.

I suppose that goes under quality of implementation, but some
might be disapointed with the results.

-- glen

James Kuyper · Aug 9, 2013

As well as I understand it, it should satisfy the alignment
requirements, but isn't necessarily optimal. A processor might
slow down for some alignment, and the is no requirement that C
avoid such slow alignments.

The C standard imposes no such requirement; but market forces favor
vendors who create C implementations that impose alignment requirements
that reasonably reflect the impact of mis-aligned access. Impose a
stricter alignment than necessary, and the generated code wastes memory;
impose too lenient of an alignment, and the generated code wastes time
on mis-aligned access. A vendor who doesn't pay attention to it's
customer's preferences with regard to time vs. memory space trade-offs
will lose business to one who does. That includes giving them
optimization options to adjust those preferences.

glen herrmannsfeldt · Aug 9, 2013

(snip)

Other sources describe x86-64 pointers in unsigned terms and do
so very simply and understandably.
(snip)

This may something about (some) kernel code, but not about x86-64.
The hardware itself is intrinsically neither signed nor unsigned.
If anything the hardware views addresses as unsigned, because of
how address translation works - page table lookup treats all bits
that it uses as value bits, and none as sign bits.

I was thinking about this some more. If a machine with sign
magnitude representation had signed addressing, you would not be
able to pretend that they were unsigned. The magnitude decreases
when you increment a negative address.

I don't remember any, but it would be possible in the early
years of computing, maybe in the decimal addressing days.

-- glen

glen herrmannsfeldt · Aug 9, 2013

(snip on alignment requirements)

The C standard imposes no such requirement; but market forces favor
vendors who create C implementations that impose alignment requirements
that reasonably reflect the impact of mis-aligned access. Impose a
stricter alignment than necessary, and the generated code wastes memory;
impose too lenient of an alignment, and the generated code wastes time
on mis-aligned access. A vendor who doesn't pay attention to it's
customer's preferences with regard to time vs. memory space trade-offs
will lose business to one who does. That includes giving them
optimization options to adjust those preferences.

And hardware can progress faster than software can keep up.

For the 8087, only two byte alignment mattered for floating point,
for the 80486, four byte alignment was faster, and, on the pentium
and later eight byte alignment for double was significantly faster.

Yet C implementations were doing four byte alignment well into
the pentium and later days. (Hopefully fixed by now.)

-- glen

James Kuyper · Aug 9, 2013

And hardware can progress faster than software can keep up.

For the 8087, only two byte alignment mattered for floating point,
for the 80486, four byte alignment was faster, and, on the pentium
and later eight byte alignment for double was significantly faster.

Yet C implementations were doing four byte alignment well into
the pentium and later days. (Hopefully fixed by now.)

If imposing stricter alignment requirements to achieve better processing
speeds would have won an implementor enough new customers to cover the
development costs, some implementor would have done so. A situation like
that can exist only if users are sufficiently apathetic about the issue
that they don't impose effective pressure on the implementors. And if
the users are that apathetic about the issue, does it really matter?

Eric Sosman · Aug 9, 2013

[...]
I was thinking about this some more. If a machine with sign
magnitude representation had signed addressing, you would not be
able to pretend that they were unsigned. The magnitude decreases
when you increment a negative address.

I don't remember any, but it would be possible in the early
years of computing, maybe in the decimal addressing days.

<off-topic>

Addresses on the Honeywell 8200[*] weren't signed, but all
the registers were -- including the program counter. So if you
set the PC's sign bit, the "PC = PC + 1" that advanced from one
instruction to the next actually *decremented* the instruction
address. The game, of course, was to concoct an instruction
sequence that would run forward for a while, set the sign bit,
run backwards through the same set of instructions, and somehow
regain sanity before crashing.

[*] Circa 1967. Dim memory: an H-800 48-bit word-oriented
compute unit (maybe more than one?) lashed to an H-200 character-
oriented machine to do the I/O. Three-address arithmetic, with
a lot of built-in masking to pack multiple fields in those wide
words and to deal with varying byte widths. More registers (it
seemed) than the checkout lines of the world's biggest WalMart.

</off-topic>

Keith Thompson · Aug 9, 2013

Rosario1903 said:
the unsigned is the easiest mathematical model for one pc
possibly i not remember well but operation+-*/ on unsigned have the
same result if one see it for signed in x86...

[...]

You've been saying, if I understand you correctly, that pointers have
or should have the same mathematical properties as unsigned integers.

There's a lot more to mathematics than integers, or even numbers.

C pointers can be described by a mathematical model that doesn't
define the results of certain operations (just as a model for real
numbers doesn't define division by zero or sqrt(-1)).

C pointers can be thought of as mathematical entities that share
some, but not all, characteristics with integers. Mathematics
abounds with such entities. Look into group theory if you're not
already familiar with it.

The strong distinction C makes between integers and pointers is
not the result of mathematical ignorance, as you've suggested.
It's the result of an understanding of which characteristics are
appropriate for an abstract model of pointers that can be implemented
on a wide variety of physical hardware.

Ike Naar · Aug 10, 2013

i would know what happen here: the result for example address%8
would be
1 if mem is char aligned

Any of {0,1,2,3,4,5,6,7} if mem is char aligned

2 if mem is short and char aligned

Any of {0,2,4,6} if mem is short and char aligned

4 if mem is int, short, char aligned

Any of {0,4} if mem is int,short,char aligned

James Kuyper · Aug 10, 2013

Any of {0,1,2,3,4,5,6,7} if mem is char aligned

Any of {0,2,4,6} if mem is short and char aligned

Any of {0,4} if mem is int,short,char aligned

I've got Rosario1903's nonsense killfiled, so I'm not sure what he's
said about what "address" is.

If "address" is a pointer, as seems consistent with the what I've seen
in responses to his messages, the only thing guaranteed for this code is
a diagnostic message about the constraint violation (6.5.5p2).

If "address" is the result of converting a pointer to uintptr_t, the
only thing guaranteed about the value of address%8 is that it is in the
range 0-7, and that's true regardless of how the pointer value is aligned.

Malcolm McLean · Aug 10, 2013

On 08/10/2013 01:52 AM, Ike Naar wrote:

I've got Rosario1903's nonsense killfiled, so I'm not sure what he's
said about what "address" is.

A pointer is a specific C variable which can be dereferenced, and on any
sane C implementation holds a memory address.
An address is any representation which can be converted ultimately to
impulses along the bus to read the memory. So it's meaningful to take
modulus or do other operations on a address which are forbidden in
pointers. It's also meaningful to talk about a "negative" address, though
ultimately, like all computer values, it's just set bits and unset bits
and any construction we put upon them is human.

Keith Thompson · Aug 10, 2013

Malcolm McLean said:
A pointer is a specific C variable which can be dereferenced, and on
any sane C implementation holds a memory address. An address is any
representation which can be converted ultimately to impulses along the
bus to read the memory. So it's meaningful to take modulus or do other
operations on a address which are forbidden in pointers. It's also
meaningful to talk about a "negative" address, though ultimately, like
all computer values, it's just set bits and unset bits and any
construction we put upon them is human.

If you're going to make this kind of distinction (which is a
perfectly valid one), I suggest not using terms that are synonyms
in the C standard. For example, the unary "&" operator yields
the *address* of its operand, which is a *pointer* value. Perhaps
"machine address" would be a better term for what you're calling
a memory address.

On the Cray T90 and related vector systems, a void* or char* value
consists of a 64-bit machine-level pointer to a 64-bit word, with
a 3-bit offset stored in the otherwise unused high-order 3 bits.
This is done entirely in software; there's no hardware support for
addressing anything smaller than a 64-bit word. Converting from
a pointer type to an integer type simply copies the bits. Given:

char arr[2];

both (unsigned)&arr[0] % 8 and (unsigned)&arr[1] % 8 would almost
certainly have the same value, which could be any value from 0 to 7.
(I'd use uintptr_t, but the C compiler didn't support C99.)

Keith Thompson · Aug 10, 2013

Rosario1903 said:
[...]

C pointers can be thought of as mathematical entities that share
some, but not all, characteristics with integers. Mathematics
abounds with such entities. Look into group theory if you're not
already familiar with it.

Click to expand...

i prefer unsigned the set of integer positive numbers: N

And how is what you prefer relevant if it's not supported either
by the C standard or by all real-world implementations?

Feel free to go off and invent your own language that works the
way you want it it.

Pointers. Are. Not. Integers.

glen herrmannsfeldt · Aug 10, 2013

Keith Thompson said:
(snip)

If you're going to make this kind of distinction (which is a
perfectly valid one), I suggest not using terms that are synonyms
in the C standard. For example, the unary "&" operator yields
the *address* of its operand, which is a *pointer* value. Perhaps
"machine address" would be a better term for what you're calling
a memory address.

I suppose, but maybe the standard shouldn't use some terms which
might not be right. Since a "pointer" might not be just an
"address", maybe there should be different wording for &.

-- glen

Keith Thompson · Aug 10, 2013

glen herrmannsfeldt said:
I suppose, but maybe the standard shouldn't use some terms which
might not be right. Since a "pointer" might not be just an
"address", maybe there should be different wording for &.

How can a "pointer" be anything other than an "address" (with
both terms used the way the C standard uses them)? I suppose you
could say that a null pointer isn't an address; is that what you're
referring to?

A C pointer / address might be something other than a machine-level
address, but there are plenty of cases where the standard uses terms
in more specific ways than they might be used in other contexts
("string", "byte", "object", etc.).

Malcolm McLean · Aug 11, 2013

How can a "pointer" be anything other than an "address" (with
both terms used the way the C standard uses them)? I suppose you
could say that a null pointer isn't an address; is that what you're
referring to?

We could implement a weird and wonderful system where pointers were random
words in a dictionary, then you looked them up in a hash table to resolve
to disk writes for dynamic memory or variable names for automatic memory.

So a pointer wouldn't be an address, in the sense I used the term (something
which can map to electrical impulses on a bus attached to memory).

glen herrmannsfeldt · Aug 11, 2013

Malcolm McLean said:
On Sunday, August 11, 2013 4:29:45 AM UTC+1, Keith Thompson wrote:
(snip)

We could implement a weird and wonderful system where pointers
were random words in a dictionary, then you looked them up in
a hash table to resolve to disk writes for dynamic memory or
variable names for automatic memory.

So a pointer wouldn't be an address, in the sense I used the
term (something which can map to electrical impulses on a
bus attached to memory).

Not so different from JVM object references. You aren't allowed
to look at the bits. (There are no operations to do it.)
If the reference is an array, the offset is supplied when it
is dereferenced.

-- glen

Malcolm McLean · Aug 11, 2013

Not so different from JVM object references. You aren't allowed
to look at the bits. (There are no operations to do it.)
If the reference is an array, the offset is supplied when it
is dereferenced.

You need to understand that a Java object reference is a pointer or you
can't really understand how Java works. But it's hard for programmers
who learn Java as a first language, because they can't manipulate the
pointer.

Also in C we can easily do memory management. If we've got flash pointers
and ram pointers, we can read and write to both, but writing to flash is
expensive, so we won't want the compiler producing code to erase the flash
on a simple pointer dereference. But we can provide a function flash_write,
then we can test that the destination pointer is in the flash area of
memory, and correctly aligned. We can do pretty much everything in C,
except maybe a few assembly instructions to actually start the erase cycle.

In Java you can't. You've got to patch the JVM.

Keith Thompson · Aug 11, 2013

Malcolm McLean said:
We could implement a weird and wonderful system where pointers were random
words in a dictionary, then you looked them up in a hash table to resolve
to disk writes for dynamic memory or variable names for automatic memory.

So a pointer wouldn't be an address, in the sense I used the term (something
which can map to electrical impulses on a bus attached to memory).

But it would be a address in the sense used by the C standard. If the
unary "&" operator yields a random word in a dictionary, then that's
what an address is.

Joe Pfeiffer · Aug 11, 2013

Malcolm McLean said:
We could implement a weird and wonderful system where pointers were random
words in a dictionary, then you looked them up in a hash table to resolve
to disk writes for dynamic memory or variable names for automatic memory.

So a pointer wouldn't be an address, in the sense I used the term (something
which can map to electrical impulses on a bus attached to memory).

The mapping would be odd, but it would be a mapping (I assume you don't
really mean "variable names", since if you did there would need to be
another mapping).

The Horror of pointers...	5	Jan 11, 2025
Different font sizes inside same div	2	Dec 3, 2023
Centering picture element for larger screen sizes	2	Sep 21, 2023
Can I use calc to change multiple parent sizes?	0	Nov 20, 2021
Pointers in python?	1	Feb 6, 2024
Different sizes of data and function pointers on a machine -- void*return type of malloc, calloc, an	23	Jun 25, 2012
Can I change the "root" value for rem sizes?	3	Jul 30, 2023
Help with pointers	1	Mar 13, 2022

Sizes of pointers

Keith Thompson

glen herrmannsfeldt

James Kuyper

glen herrmannsfeldt

glen herrmannsfeldt

James Kuyper

Eric Sosman

Keith Thompson

Ike Naar

James Kuyper

Malcolm McLean

Keith Thompson

Keith Thompson

glen herrmannsfeldt

Keith Thompson

Malcolm McLean

glen herrmannsfeldt

Malcolm McLean

Keith Thompson

Joe Pfeiffer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads