Sizes of pointers

Herbert Rosenau · Aug 12, 2013

Am 06.08.2013 15:06, schrieb Stephen Sprunk:

I'm pretty sure he was referring to Java, which originally mandated
SPARC behavior but soon relaxed that to allow x86 behavior as well.
Java may indeed be "write once, run anywhere", but you may not always
get the same results. Oops. Rosario1903 must be so disappointed.

Java was never designed to be compatible between 2 dOSes even on exactly
the same hardware.

I have seen that big fault myself in having one big JAVA application
written under OS/2 32 bit getting segfault in Windows NT on the same
mashine with both running exactly the same java version on both.

I've written some applications in C running on OS/2, sinix HPux, windows
and others without changing a little bit. Impossible with Java!

Keith Thompson · Aug 12, 2013

Herbert Rosenau said:
Am 06.08.2013 15:06, schrieb Stephen Sprunk:

Java was never designed to be compatible between 2 dOSes even on exactly
the same hardware.

This is off topic, but my understanding is that Java was *designed* to
be compatible across different operating systems. Whether it actually
met that design goal is another question

I've written some applications in C running on OS/2, sinix HPux, windows
and others without changing a little bit. Impossible with Java!

I suppose that depends on the application.

Malcolm McLean · Aug 12, 2013

This is off topic, but my understanding is that Java was *designed* to
be compatible across different operating systems. Whether it actually
met that design goal is another question

That's what I understood too.
The idea was that a program would have exactly the same output, regardless
of the OS. That wasn't possible in an absolute sense, because you could
tell whether you were on Windows or Unix by the path separator, so a
malicious programmer could write code deliberately designed to break on
one OS. But that was only a minor caveat.
Then floating point conformity had to be abandoned, for efficiency reasons.
Again this didn't matter much in the real world, but it did mean that if
you had sloppily-written code, you might get different rounding artifacts.

èµŸ å¼ · Aug 13, 2013

åœ¨ 2013å¹´7æœˆ30æ—¥æ˜ŸæœŸäºŒUTC+8ä¸‹åˆ6æ—¶02åˆ†45ç§’ï¼ŒJames Harriså†™é“ï¼š

Am I right that there is no guarantee in C that pointers to different types

can be stored in one another except that a pointer of any type can be stored

in a void * and later recovered without loss of info?

What is the rationale for distinguishing the convertability of pointers

based on the type of the object they point to? Are there machines on which a

pointer to a char would have different *size* from a pointer to a float,

say, or that a pointer representation might be changed when converted? I can

imagine that the two addresses may need different alignments (perhaps floats

needing 4-byte alignment, chars needing only 1-byte alignment). This would

mean that the rules for the *lower* bits of a pointer could be different.

This would then be nothing to do with pointer size. The rule about void

pointers would simply mean that void pointers had no alignment restrictions.

Is that the reason why void pointers can be used in a special way?

On some computers it makes sense to distinguish a pointer's size based not

on the type of the object pointed at but on *where* that object could be

stored. On old hardware such as 16-bit x86 there were near or far pointers.

On modern multi-machine clusters it might make sense to allow pointers to

local memory to be short while pointers to remote memory are longer. On old

and new hardware, then, it's not the type of the object pointed at but the

location of the object pointed at which would determine the requirement for

pointer size.

Some compilers allow the user to specify that all pointers will be short or

all pointers will be long - e.g. the awful old memory models. Wouldn't itbe

better for a compiler to choose pointer sizes depending on where the object

to be referred to might be placed? Basically, all pointers would default to

long but where the compiler could prove that a given pointer can be used for

only local references that pointer could be made short.

James

in generally , it is 4-byte

James Kuyper · Aug 13, 2013

On 08/13/2013 03:16 AM, èµŸ å¼ wrote:
[Re: pointers]

in generally , it is 4-byte

On many systems, that's true, though it would be stupid to build that
assumption into code intended to be portable. There's a reason why
sizeof() exists - use it.
But this thread is not about such systems - it's about systems where
pointers can have two or more different sizes, or more generally, two or
more different representations.

Stephen Sprunk · Aug 13, 2013

I'm happy to agree that it's reasonable to describe how x86-64
pointers work using sign extension, etc, if others are willing to
agree that it's equally reasonable to describe how those pointers
work without using sign extension, etc. In short, I think what
you're saying is pretty much the same as what I'm saying... but
I'm not sure that other people in the discussion agree on that.

I deny that it's "equally reasonable" to describe something universally
known as "sign extension" with an invented term that serves only to
remove the term "sign" from an explanation of pointers, especially when
the same documents call that same something "sign extension" when
applied to any other type of value.

Yes, it can be done, but that doesn't make it "equally reasonable".

S

Philip Lantz · Aug 17, 2013

Tim said:
That has no bearing on my comment. Having separate address
spaces doesn't preclude any variety of representations that
might address memory in either address space.

Yes, _if_ you think of the two address spaces as a combined
positive/negative blob. I prefer to think of those address
spaces as separate.

There's nothing wrong with having that opinion, as long
as you understand that it is nothing more than opinion.

In the following example, assume that the array 'arr' is at address
0xffffffff80001000, in what I would call the negative part of the
address space, which you say is a matter of opinion.

Note that the exact same instruction and encoding are used for the two
examples. In both cases, the 32-bit constant part of the address is
sign-extended as part of address calculation. Is it your contention that
calling -200 negative is also a matter of opinion?

extern char arr[];

void f(char *p)
{
p[-200] = 0; // movb -200[rdi], 0 c6 87 38 ff ff ff 00
}

void g(size_t i)
{
a = 0; // movb a[rdi], 0 c6 87 00 10 00 80 00
}

(I did the instruction selection and encoding by hand; I hope I didn't
make an error. But even if I did, the point remains valid.)

Philip

Stephen Sprunk · Aug 17, 2013

I don't share this opinion. If the inventors said that some values
were red and other values were green, that doesn't mean the values
necessarily acquire the property of being colored.

Given a suitable definition of color, sure they do.

The reason appeal to authority is not valid, or more accurately
relevant, for what I'm saying is that the property I'm concerned with
is not legislated by any authority.

We'll have to agree to disagree on that, then, because IMHO inventors
_do_ have the authority to define their inventions as they see fit.

Are you now saying the documentation does NOT describe the
transformation as sign extension?

Some documentation, but not all.

So we have a different authority saying something different?

IMHO, if it contradicts the inventors, it is not a valid authority.

For the property I've been talking about, it doesn't matter whether
the instruction is called Sign Extend, Arithmetic Shift, Replicate
High bit, Adjust Pointer Size, or anything else -- what matters how
things behave, not what they are called.

You well know that, on a twos-complement system, if the top bit of a
value is replicated when copied to a larger register, that is called
"sign extension". To call it anything else is an blatant attempt to
obfuscate the behavior.

You also know that sign extension (or whatever you want to call it) is
only used on signed values. If a value is unsigned, you would use zero
extension (or whatever you want to call it) instead.

Is that so? Just what physical equipment do you have in mind to
measure the simplicity, elegance, and self-consistency of a point of
view, to back up your claim?

Compare these two explanations:

1. Pointers are signed.

2. Pointers are unsigned, but when copying them to a larger register you
use sign extension, though it's not called sign extension, rather than
zero extension, which is used for unsigned values of other types.

I dare you to come up with _any_ measure of simplicity, elegance or
self-consistency that does not favor the first option.

S

Tim Rentsch · Aug 18, 2013

Robert Wessel said:
Robert Wessel said:

On Sat, 03 Aug 2013 09:35:46 -0700, Tim Rentsch

On 08/01/2013 11:45 AM, Bart van Ingen Schenau wrote:

(snip)
Can you give an example how the DS9000 could make a conversion
between pointers to structs fail, given that the pointers
involved are required to have the same representation and
alignment (and that the intent of that requirement is to allow
for interchangeability)?

Well, the designers of the DS9000 are notorious for ignoring the
intent of the standard; some have even claimed that they go out
of their way to violate the intent.

Some years ago I was wondering about the possibility of generating
JVM code as output of a C compiler. [snip elaboration]

Doesn't anyone bother to look this stuff up? There are a handful
of existing C-to-JVM compilers or translators, the oldest more than
10 years old. Some suppport a full ANSI C runtime.

https://en.wikipedia.org/wiki/C_to_Java_Virtual_Machine_compilers

AFAIK, all of those take the approach of faking the C program's
address space, which makes the C programs odd little isolated
islands in a Java system.

Click to expand...

I don't see the point of your comment. If the implementation is
conforming that's all that should matter. As far as that goes
the above characterization applies in many cases of emulators or
abstract machines, including real hardware - an underlying paging
and cache memory system "fakes" a C program's address space, and
the generated "instructions" are interpreted by a microengine.
There really isn't that much difference between that and running
in a JVM.

Click to expand...

Presumably the point of porting some C code to the JVM is so that
it can be used. The approach being discussed very much isolates
the C code from the rest of the environment. You could not
easily have the C code call other objects, or have other objects
call the C code.

Ahh, I see what you're driving at. You want the C code and Java
code to be able to interoperate.

My assumption is different from yours. I assume the point of
compiling C code to a JVM is so the C code can execute in the JVM
environment, which might not otherwise have a readily usable C
compiler. If one wanted C and Java to able to interoperate, and
do so tolerably well, that really needs some degree of changes to
the language(s), not just the compilers.

Tim Rentsch · Aug 18, 2013

glen herrmannsfeldt said:
OK, how about something else that people like to disagree on, and
that is bit numbering. While endianness of many processors is
fixed by design, on most the numbering of bits within bytes is
not physically part of the hardware. It is, however, often part
of the documentation for the hardware.

Some people might like a different numbering than that in the
documentation, but doing so will result in confusion. That said,
using different numbering is still confusing when interfacing
between different systems. (I once knew someone using an Intel
UART in an Apple II, and, after wiring it up noticed the
different bit numbering conventions.)

Now, the authority could require a specific notation as part of
the license or other agreement. (SPARC, for example, requires
their trademark to be in bold font.)

If the authority isn't reasonable, then people won't follow it.
(Colored bits might not be reasonable.) If it is reasonable and
consistent, though, it will often be followed.

None of these comments has any bearing on what I was
saying.

Tim Rentsch · Aug 19, 2013

glen herrmannsfeldt said:
I might even say more reasonable, but less convenient.

Reminds me of the ways PL/I and (newer) Fortran pass arrays.
(being an addressing question, it might be applicable).

Both languages allow one to declare arrays giving lower and
upper bounds for dimensions. For PL/I, when passed to a called
procedure, the bounds information is also passed, both lower and
upper bound.

With Fortran assumed shape, where dimension information is
passed to a called routine, only the extent is passed. (As seen
in the called routine, the origin is 1, by default, or can be
specified other than 1.)

The former seems more natural, but the latter is sometimes more
convenient, especially when rewriting old routines.

They are both reasonably, but I don't know that you can say
equally reasonable. (As with addressing, it is difficult to
measure reasonableness.)

The two situations have nothing to do with each other. In the
first case the scheme being described is fixed but different
descriptions are offered. In the second case two different
schemes are considered (compared, contrasted, etc). Trying
to draw parallels between them is nonsensical.

Tim Rentsch · Aug 20, 2013

Stephen Sprunk said:
I deny that it's "equally reasonable" to describe something
universally known as "sign extension" with an invented term that
serves only to remove the term "sign" from an explanation of
pointers, especially when the same documents call that same
something "sign extension" when applied to any other type of
value.

Obviously how pointers work on x86-64 is not universally known
as sign extension, as there are numerous examples of documents
that describe it using different terms.

More importantly, the point of my comment was not about the
term "sign extension" but the concept of sign extension. By
analogy, inverting all the bits of a computer word can be seen
either as logical complementation or as negation under a ones
complement representation. Whatever name the operation code
happens to have, we conceptualize the operation in different
ways, depending how we view the spaces of values represented.
Similarly, just because how pointer expansion works happens to
match a mechanism named "sign extension", that doesn't mean we
must necessarily impute the concepts of signed values to the
bits being operated on, and different conceptualizations will
naturally lead to different descriptions.

Yes, it can be done, but that doesn't make it "equally
reasonable".

I don't see any support for this position other than your
repeated assertion that it must be so.

Tim Rentsch · Aug 21, 2013

Philip Lantz said:
In the following example, assume that the array 'arr' is at
address 0xffffffff80001000, in what I would call the negative
part of the address space, which you say is a matter of opinion.

If you read the previous posting again, I believe you will see
that my comment is about which view makes more sense, not about
whether believing addresses are signed is an opinion. I don't
think there is any debate about whether addresses are "truly
signed", only about whether it's more convenient to think of them
that way.

extern char arr[];

void f(char *p)
{
p[-200] = 0; // movb -200[rdi], 0 c6 87 38 ff ff ff 00
}

void g(size_t i)
{
a = 0; // movb a[rdi], 0 c6 87 00 10 00 80 00
}

[.. code and text paragraph swapped to help reading flow ..]

Note that the exact same instruction and encoding are used for
the two examples. In both cases, the 32-bit constant part of the
address is sign-extended as part of address calculation. Is it
your contention that calling -200 negative is also a matter of
opinion?

The abstract value -200 is certainly negative; that is true by
definition.

The operand part of the instruction (ie, the bits corresponding
to the '38 ff ff ff' byte values) is certainly not negative.
Neither is it positive. Only numbers are positive or negative;
the operand portion of the instruction is just a collection of
bits, not a number.

The question then becomes what do we consider those bits as
representing? We might consider them as representing a signed
value (ie, -200); or, we might consider them as representing a
large unsigned value, which has the same effect under the rules
of unsigned arithmetic as subtracting 200. The second view is
like what happens in C when we add a negative signed value to an
unsigned value, eg

unsigned u = 1000;
u = -200 + u;

After the assignment u will have the value 800. However, what is
being added to u is not (under the semantic rules of the C
abstract machine) a negative value, but rather a large unsigned
value.

Moreover, note that the instructions and formats being used here
are regular x86 instructions (ie, not specific to x86-64).
Before the x86-64, addresses in x86 were not thought of as
being signed. So there is some precedent for the idea that
what is being represented in the operand field is a large
unsigned value rather than a signed value.

(I did the instruction selection and encoding by hand; I hope I
didn't make an error. But even if I did, the point remains
valid.)

Click to expand...

Everyone agrees that the mechanism of address expansion in x86-64
is isomorphic to the mechanism of sign extension. Does this mean
these addresses _must_ be seen as being signed? Of course it
doesn't. Lots of people prefer that view, but certainly there
are equally consistent views that do not treat addresses as being
signed. Which view is more convenient? Obviously different
people have different opinions on that question. All I'm saying
is the last question has no absolute answer.

Tim Rentsch · Aug 22, 2013

Stephen Sprunk said:
Stephen Sprunk said:

On 08-Aug-13 06:13, Tim Rentsch wrote:

Click to expand...

[snip]

Sounds to me like you're the one being dogmatic.

I acknowledged that there are two views. That one of them is
simpler, more elegant and more self-consistent is objectively
true.

Click to expand...

Is that so? Just what physical equipment do you have in mind to
measure the simplicity, elegance, and self-consistency of a point of
view, to back up your claim?

Click to expand...

Compare these two explanations:

1. Pointers are signed.

2. Pointers are unsigned, but when copying them to a larger register you
use sign extension, though it's not called sign extension, rather than
zero extension, which is used for unsigned values of other types.

I dare you to come up with _any_ measure of simplicity, elegance or
self-consistency that does not favor the first option.

In the first place, it was you who made the claim, not me. If
there a burden to be borne here, such as coming up with a
measure, it is you who bear it -- if you can't support your own
claim, it is simply an unsupported claim, and no disproof
required. I have made no claims about which view is simpler,
etc, either objectively or otherwise.

In the second place, it is trivial to come up with a measure that
favors the second option over the first, eg, "longer explanations
are more elegant than shorter explanations".

In the third place, what you apparently fail to understand is
that the issue here is not to define a _measure_, but to describe
a means to perform an objective _measurement_, where the result
of the measurement mechanism is both objective and corresponds
to what most people would agree is simple, elegant, and
self-consistent. The claim about which view is simpler, etc,
has meaning only if those words have a previously agreed upon
meaning -- if someone is going to define their meaning after
the fact, any claim about which view is simpler, etc, is a
completely empty statement.

The Horror of pointers...	5	Jan 11, 2025
Different font sizes inside same div	2	Dec 3, 2023
Centering picture element for larger screen sizes	2	Sep 21, 2023
Can I use calc to change multiple parent sizes?	0	Nov 20, 2021
Pointers in python?	1	Feb 6, 2024
Different sizes of data and function pointers on a machine -- void*return type of malloc, calloc, an	23	Jun 25, 2012
Can I change the "root" value for rem sizes?	3	Jul 30, 2023
Help with pointers	1	Mar 13, 2022

Sizes of pointers

Herbert Rosenau

Keith Thompson

Malcolm McLean

èµŸ å¼

James Kuyper

Stephen Sprunk

Philip Lantz

Stephen Sprunk

Tim Rentsch

Tim Rentsch

Tim Rentsch

Tim Rentsch

Tim Rentsch

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads