Sizes of pointers

Stephen Sprunk · Aug 4, 2013

so C can not be a portable language...
or portable with your meaning of portable

Again, I do not understand how you are using the term "portable".

If you stick to only operations that the C standard defines, then your
code is portable (i.e. works correctly without change) to all compliant
C implementations. That's the entire point of the standard.

Yes, there are some things that C does _not_ define. If your code
depends on undefined behavior, it is unportable. Therefore, if you want
your code to be portable, you must learn not to do that.

You seem to think that the only way to achieve portability is for the
standard to define everything--and in a way that happens to be the way
your system works. However, that would necessarily make C difficult or
impossible to implement on systems that happen to work differently,
making C _less_ portable as a language in the end.

S

Stephen Sprunk · Aug 4, 2013

Finally, the valid values of an x86-64 memory address are:

kernel: 0xffff8000,00000000 to 0xffffffff,ffffffff [-256TB to 0)
user: 0x00000000,00000000 to 0x00007fff,ffffffff [0 to +256TB)

which leads to an obvious reimagining of x86 addresses:

kernel: 0x80000000 to 0xffffffff [-2GB to 0)
user: 0x00000000 to 0x7fffffff [0 to +2GB)

Click to expand...

where is the problem?
mem and code user in: (0x00000000, 0x7fffffff]
code kernel in: [0x80000000, 0xffffffff]
data reserved in: {0}

0x00000000 is part of user space. C happens to assign a special meaning
to that address (NULL), but the CPU architecture does not.

Also, kernel data lives in kernel space, not just kernel code.

all that is unsigned...

A bit pattern is neither signed nor unsigned; that is a matter of how
you choose to interpret the bits.

i could say the kernel code is pointed from address that has the last
bit 1. [the bit 1 that one can find in 0x80000000]

If you meant the high bit, yes. However, 0x80000000 is -2GB in a signed
interpretation but +2GB in an unsigned interpretation.

So, if pointers are signed, kernel space is the "negative" half, but if
pointers are unsigned, kernel space is the "top" half.

if P=0x7 16 bit why do you would want to do signed extension operation
on that for 32 bit?

P is in the user space, why want its sign extention:0xffffffff?

If the highest bit is zero, then sign extension fills with zeros:

0x7fffffff -> 0x00000000,7fffffff

The resulting value in this case is the same whether interpreted as
signed or unsigned and whether sign-extended or zero-extended.

so in the user mode can not be negative address or address >=
0x80000000 that would be the same

User space is unaffected by this issue; it's [0,+2GB) either way.

which, when sign-extended, evinces an elegance that simply cannot be
explained away as mere perceptual bias:

kernel: 0xffffffff,80000000 to 0xffffffff,ffffffff [-2GB to 0)
user: 0x00000000,00000000 to 0x00000000,7fffffff [0 to +2GB)

The later invention of "canonical form", exactly equivalent to sign
extension but studiously avoiding use of the heretical term "sign",

Click to expand...

in how is useful sign extension
better than 0 extension?

If 32-bit pointers were zero-extended, then 32-bit kernel space would
overlap the second 2GB of 64-bit user space. That's bad.

With sign extension, 32-bit kernel space overlaps the last 2GB of 64-bit
kernel space. That's good.

S

Keith Thompson · Aug 4, 2013

Rosario1903 said:
so C can not be a portable language...
or portable with your meaning of portable

What do you mean by a language (as opposed to a program) being
"portable"?

The C language is defined in such a way that it's possible to write
extremely portable code. Such code merely has to avoid making
any assumptions that are not guaranteed by the language standard
(which is not always easy). It's then up to each implementation
to conform to the standard.

It's also possible to write C code that's *not* portable, code that
makes assumptions that happen to be satisfied for one particular
system but not for all conforming systems. That's actually
one of C's greatest strengths; not all code has to be portable.
(Try writing a device driver in Scheme.)

Well-written C code is as portable as possible, and keeps the
non-portable subset isolated.

Keith Thompson · Aug 4, 2013

Stephen Sprunk said:
Finally, the valid values of an x86-64 memory address are:

kernel: 0xffff8000,00000000 to 0xffffffff,ffffffff [-256TB to 0)
user: 0x00000000,00000000 to 0x00007fff,ffffffff [0 to +256TB)

which leads to an obvious reimagining of x86 addresses:

kernel: 0x80000000 to 0xffffffff [-2GB to 0)
user: 0x00000000 to 0x7fffffff [0 to +2GB)

Click to expand...

where is the problem?
mem and code user in: (0x00000000, 0x7fffffff]
code kernel in: [0x80000000, 0xffffffff]
data reserved in: {0}

Click to expand...

0x00000000 is part of user space. C happens to assign a special meaning
to that address (NULL), but the CPU architecture does not.

No, C assigns no particular meaning to the address 0x00000000. It does
assign a special meaning to the result of converting a constant 0 to a
pointer type (a null pointer), but the result of that conversion does
not necessarily have an all-bits-zero representation.

Most C implementations do use an all-bits-zero for null pointers.

Stephen Sprunk · Aug 4, 2013

No, C assigns no particular meaning to the address 0x00000000.

The system at hand is well known to use 0 for null pointers.

S

Keith Thompson · Aug 4, 2013

Stephen Sprunk said:
The system at hand is well known to use 0 for null pointers.

Then the system at hand, not C, applies that meaning.

glen herrmannsfeldt · Aug 5, 2013

Tim Rentsch said:
Stephen Sprunk <[email protected]> writes:

(snip, I wrote)

(snip)

x86 is similar, [snip stuff about page tables]
Most importantly, though, all pointers must be sign-extended,
rather than zero-extended, when stored in a 64-bit register. You
could view the result as unsigned, but that is counter-intuitive
and results in an address space with an enormous hole in the
middle. OTOH, if you view them as signed, the result is a single
block of memory centered on zero, with user space as positive and
kernel space as negative. Sign extension also has important
implications for code that must work in both x86 and x86-64
modes, e.g. an OS kernel--not coincidentally the only code that
should be working with negative pointers anyway. [snip unrelated]

Click to expand...

IMO it is more natural to think of kernel memory and user memory
as occupying separate address spaces rather than being part of
one combined positive/negative blob;

I agree, but ... being able to store a pointer in 32 bits
when you don't need to address more is reasonable. Having parts
of the OS addressable, for example shared libraries usable by all,
is also convenient.

Sign extending a 32 bit integer allows it to specify the lower 2GB
and top 2GB of a 64 bit address space.

That makes more sense than saying that it is a 32 bit number,
except the high bit has an unusual positive place value.

-- glen

glen herrmannsfeldt · Aug 5, 2013

Malcolm McLean said:
On Sunday, August 4, 2013 3:17:42 PM UTC+1, Stephen Sprunk wrote: (snip)

And of course neither was right. There's no fixed co-ordinate system
through which the planets move. So Galileo's view that the planets orbit
the Sun, and Pope Urban's view that they describe a rather more complicated
path around the Earth, are both equally correct.

Well, one is a much better approximation to an inertial reference
frame than the other.

But there were two things. One, the planets (almost) orbitting
the sun instead of the earth, and the other that orbits were
elliptical instead of circular. Kepler determined that the
orbits were elliptical, and Newton showed that his graviation
equation would generate elliptical orbits.

-- glen

glen herrmannsfeldt · Aug 5, 2013

Keith Thompson said:
(snip)

The x86, for x>=802, hardware does assign special meaning
to segment selector zero. You can't just load random values
into segment registers, so the hardware special cases zero.

But yes, offset zero is not special.

No, C assigns no particular meaning to the address 0x00000000. It does
assign a special meaning to the result of converting a constant 0 to a
pointer type (a null pointer), but the result of that conversion does
not necessarily have an all-bits-zero representation.

It is convenient to have it zero, simplifying the conversion.

Most C implementations do use an all-bits-zero for null pointers.

-- glen

Keith Thompson · Aug 5, 2013

glen herrmannsfeldt said:
It is convenient to have it zero, simplifying the conversion.

No argument there.

On the other hand, I've worked on a system (not C) that used address 1
for null pointers, which made null pointer checks much cheaper than if
null pointers were address 0.

Malcolm McLean · Aug 5, 2013

On the other hand, I've worked on a system (not C) that used address 1
for null pointers, which made null pointer checks much cheaper than if
null pointers were address 0.

Normally it's the other way round. Tests for zero are cheaper than tests for
equality to an arbitrary number.

James Kuyper · Aug 5, 2013

Well, one is a much better approximation to an inertial reference
frame than the other.

But there were two things. One, the planets (almost) orbitting
the sun instead of the earth, and the other that orbits were
elliptical instead of circular. Kepler determined that the
orbits were elliptical, and Newton showed that his graviation
equation would generate elliptical orbits.

Long before Copernicus' time, the orbits of the planets were known with
sufficient precision to rule out circles. You can approximate an ellipse
with arbitrarily high precision with a sufficiently large number of
epicycles, which is what they did. They had sufficiently precise
measurements to require several levels of epicycles.

It's an important insight that the shape of the orbit is an ellipse, and
even more, to derive how the speed of the planet varies as it moves
around it's orbit. However, no real orbit is exactly a Keplerian
ellipse, any more than any real orbit is an exact fit to a finite number
of epicycles. For real orbits, you have to consider the fact that
there's more than two massive bodies involved, and that the bodies are,
in general, not exactly spherically symmetrical. There are no exact
analytic solutions for the more general cases, though perturbation
analysis can be used to analytically derive low-order corrections. The
high-accuracy orbital predictions are done numerically, rather than
analytically. Of course, the data available at that time was
insufficiently accurate to measure any of those corrections, but the
need for them was implicit in the theory of Gravity.

That's the key difference between the epicycle approach and the
gravitational one. The theory of gravity can be used to determine what
the corrections to the simple approximations must be. The epicycle
approach is simply a curve fitting exercise - the more epicycles you
use, the better the fit to the curve, but there's no underlying theory
to predict the size, location, or period of the next epicycle.

Keith Thompson · Aug 5, 2013

Malcolm McLean said:
Normally it's the other way round. Tests for zero are cheaper than tests for
equality to an arbitrary number.

An attempt to dereference a null pointer would die with a bus error,
because address 1 is misaligned. (That wouldn't have been the case
for byte pointers, which were uncommon on the system in question.)

Rosario1903 · Aug 5, 2013

What do you mean by a language (as opposed to a program) being
"portable"?

for to be really portable, a computer language has to have
all its function that act the same...

for example for multiplication operation in unsigned *:AxA->A
than AcN [N is the set of natural number 0,1,2,3,4 etc]
has to have the same elements in all pc in where that language would
run and * has to associate the same elements

for example if A=[0, 0xFFFFFFFF], 0xFFFF * 0xFFFFF has to be the same
number of A in all machines (even if overflow)
or be indefinite for all machines [stop the program in all machines]

this imply that all base type of the language has to have the same
size in all machines too

there is no need for "byte code" portability, or the same machine...
only mathematic

The C language is defined in such a way that it's possible to write
extremely portable code. Such code merely has to avoid making
any assumptions that are not guaranteed by the language standard
(which is not always easy). It's then up to each implementation
to conform to the standard.

if C would be portable for my above definition it not would have
undefinite behaviour
the UB are the region of the language where functions are not definite
from standard i.e. not have the same result for some arg
this could be the answer to log(0) [what return on error] too

Rosario1903 · Aug 5, 2013

for to be really portable, a computer language has to have
all its function that act the same...

yes malloc() can not have the same result even if the same pc...
so would be for all function - malloc() or functions return memory
free mem, or functions like these

Eric Sosman · Aug 5, 2013

[...]
What do you mean by a language (as opposed to a program) being
"portable"?

Click to expand...

for to be really portable, a computer language has to have
all its function that act the same...

... with two consequences: (1) the language will not be
available on machines where achieving the mandated result is
more trouble than it's worth, and (2) the language will self-
obsolesce as soon as new machines offer capabilities outside
the limits of what the language's original machine offered.

Examples of (1): If C's semantics had been defined so as
to require the results given by its original machine, there
could never have been C implementations for three of the four
systems mentioned in the original K&R book (Honeywell 6000:
incompatible `char' range; IBM S/370 and InterData 8/32:
incompatible floating-point). That is, C would have been
available on the DEC PDP-11 and, er, and, um, nothing else
at the time, perhaps VAX later (if anyone had cared).

Examples of (2): If C's semantics blah blah blah, C would
have been abandoned as soon as IEEE floating-point became
available, because nobody would have been content to stick
with PDP-11 floating-point.

[...]
if C would be portable for my above definition it not would have
undefinite behaviour
the UB are the region of the language where functions are not definite
from standard i.e. not have the same result for some arg
this could be the answer to log(0) [what return on error] too

A more recent language made a pretty serious attempt to
produce exactly the same result on all machines; "write once,
run anywhere" was its slogan. Its designers had to abandon
that goal almost immediately, or else "anywhere" would have
been limited to "any SPARC." Moral: What you seek is *much*
more difficult than you realize.

Malcolm McLean · Aug 5, 2013

On 08/05/2013 12:47 AM, glen herrmannsfeldt wrote:

That's the key difference between the epicycle approach and the
gravitational one. The theory of gravity can be used to determine what
the corrections to the simple approximations must be. The epicycle
approach is simply a curve fitting exercise - the more epicycles you
use, the better the fit to the curve, but there's no underlying theory
to predict the size, location, or period of the next epicycle.

Yes, but the epicycle theory can be satisfied by having angels pushing
round the planets. Since you can get 42 angels to a pinhead, there are
plenty of them to power as many epicycles as you could reasonably want.

Gravity posits a force which acts at a distance, through no medium, and
which presumably every atom shares with every other atom in a reciprocal
arrangment. So you'd run out even of angels. It's also supposedly invisible,
intangible, inaudible, eternal, somehow held constant from one moment to
the next, and counterbalanced by an eternal miracle to prevent the whole
universe coalescing to a single blob.

glen herrmannsfeldt · Aug 5, 2013

(snip)
(snip, I wrote)
There is an interesting book called "Feynman's lost lecture."

Feynman once gave a special lectures (not part of the regular
class series) on Newton's derivation of elliptical orbits.
While Newton may have done it himself using Calculus, his
explanation at the time used conic sections (a popular subject
at the time). Feynman rederived Newton's explanation in his
lecture. Unlike all the other lectures, there was no transcript,
only his (very abbreviated) notes. Goodstein, then, rederived
Feynman's rederivation of Newton and wrote a book about it.

Long before Copernicus' time, the orbits of the planets were known with
sufficient precision to rule out circles. You can approximate an ellipse
with arbitrarily high precision with a sufficiently large number of
epicycles, which is what they did. They had sufficiently precise
measurements to require several levels of epicycles.

I don't remember by now how well it was known, and when. My first
thought learning about epicycles was that the first one should
obviously be from the planets (approximately) orbiting the sun
instead of the earth. If the orbits were otherwise circular,
that it should have been obvious.

Except for (the non-planet now) pluto, though, most are darn close
to circles.

Also, to get even farther off subject, note that with all the
measurements that Kepler and such made, they had no idea of the
actual distance involved. All the measurements were relative.
The recent transit of Venus revived discussion on the way the
first measurement of the astronomical unit was done.

It's an important insight that the shape of the orbit is an
ellipse, and even more, to derive how the speed of the planet
varies as it moves around it's orbit. However, no real orbit
is exactly a Keplerian ellipse, any more than any real orbit
is an exact fit to a finite number of epicycles.

And if the study of conic sections wasn't popular at the time,
he might not have figured that one. If the relation between
radius and orbit was a power like 17/9, he might not have
figured that one, either.

For real orbits, you have to consider the fact that there's
more than two massive bodies involved, and that the bodies
are, in general, not exactly spherically symmetrical.
There are no exact analytic solutions for the more general
cases, though perturbation analysis can be used to analytically
derive low-order corrections.

Fortunately, it is close enough most of the time.

The high-accuracy orbital predictions are done numerically,
rather than analytically. Of course, the data available at
that time was insufficiently accurate to measure any of
those corrections, but the need for them was implicit in
the theory of Gravity.

Yes, and the important part of Newton's theory was the universal
part. That the same law worked not just for falling apples, but
moons and planets.

But I always liked the Galileo orgument on why acceleration should
be independent of mass, unlike was commonly believed at the time.
If you take two large masses and tie them together with a thin
thread (so it is now one mass twice as big) should they then fall
twice as fast? If you still aren't convinced, use a thinner thread.

That's the key difference between the epicycle approach and the
gravitational one. The theory of gravity can be used to determine
what the corrections to the simple approximations must be.
The epicycle approach is simply a curve fitting exercise - the
more epicycles you use, the better the fit to the curve, but
there's no underlying theory to predict the size, location,
or period of the next epicycle.

Well, there is also that the circle was the perfect shape, and
planets should have perfect orbits. Why would god give them any
less than perfect shape? In addition, note that Kepler thought
that the orbital radius should be determined by the radii of
inscribed and circumscribed spheres on the five regular polyhedra.

-- glen

James Kuyper · Aug 5, 2013

Yes, but the epicycle theory can be satisfied by having angels pushing
round the planets. Since you can get 42 angels to a pinhead, there are
plenty of them to power as many epicycles as you could reasonably want.

Gravity posits a force which acts at a distance, through no medium, and
which presumably every atom shares with every other atom in a reciprocal
arrangment. So you'd run out even of angels. It's also supposedly invisible,
intangible, inaudible, eternal, somehow held constant from one moment to
the next, and counterbalanced by an eternal miracle to prevent the whole
universe coalescing to a single blob.

A meaningful response would be too far off-topic, I'm not going to
attempt it.

Rosario1903 · Aug 5, 2013

[...]

Click to expand...

Click to expand...

A more recent language made a pretty serious attempt to
produce exactly the same result on all machines; "write once,
run anywhere" was its slogan.
good

Its designers had to abandon
that goal almost immediately, or else "anywhere" would have
been limited to "any SPARC." Moral: What you seek is *much*
more difficult than you realize.

i see the only difficulty some computer has char not 8 bit...

all the remain can be in software not in hardware
it will be slow?
[from operation on 8 bit unsigned, one can definite operation on
16 bit, 32 bit etc unsigned
i don't know for signed but i think it will be the same]

can be difficult emulate float point hardware too
but could be easier fixed point software..

The Horror of pointers...	5	Jan 11, 2025
Different font sizes inside same div	2	Dec 3, 2023
Centering picture element for larger screen sizes	2	Sep 21, 2023
Can I use calc to change multiple parent sizes?	0	Nov 20, 2021
Pointers in python?	1	Feb 6, 2024
Different sizes of data and function pointers on a machine -- void*return type of malloc, calloc, an	23	Jun 25, 2012
Can I change the "root" value for rem sizes?	3	Jul 30, 2023
Help with pointers	1	Mar 13, 2022

Sizes of pointers

Stephen Sprunk

Stephen Sprunk

Keith Thompson

Keith Thompson

Stephen Sprunk

Keith Thompson

glen herrmannsfeldt

glen herrmannsfeldt

glen herrmannsfeldt

Keith Thompson

Malcolm McLean

James Kuyper

Keith Thompson

Rosario1903

Rosario1903

Eric Sosman

Malcolm McLean

glen herrmannsfeldt

James Kuyper

Rosario1903

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads