sizeof pointers

J

James Brown

All,

I have a quick question regarding the size of pointer-types:

I believe that the sizeof(char *) may not necessarily be the same as
sizeof(int *) ? But how about multiple levels of pointers to the same type?
Would sizeof(char **) be the same as sizeof(char *)? And if it is, would the
internal representation be the same in both cases?

background on this: I'm writing a simple IDL compiler that produces 'C'
code, and am trying to get array/pointer marshalling to be 'safe' across
architectures. Any good literature/references on the subject (from a C
perspective) would be appreciated.

thanks,
James
 
G

Guest

James said:
All,

I have a quick question regarding the size of pointer-types:

I believe that the sizeof(char *) may not necessarily be the same as
sizeof(int *) ? But how about multiple levels of pointers to the same type?
Would sizeof(char **) be the same as sizeof(char *)? And if it is, would the
internal representation be the same in both cases?

They could be the same, but this is not guaranteed. Even for systems
where the size and representation are the same, compilers'
optimisations may cause your code to not function the way you want.
background on this: I'm writing a simple IDL compiler that produces 'C'
code, and am trying to get array/pointer marshalling to be 'safe' across
architectures. Any good literature/references on the subject (from a C
perspective) would be appreciated.

If you could be more specific about what you're trying to do,
preferably using a short code snippet, someone may be able to suggest
a way to avoid the issue.
 
K

Keith Thompson

James Brown said:
I have a quick question regarding the size of pointer-types:

I believe that the sizeof(char *) may not necessarily be the same as
sizeof(int *) ? But how about multiple levels of pointers to the same type?
Would sizeof(char **) be the same as sizeof(char *)? And if it is, would the
internal representation be the same in both cases?

The standard guarantees, for historical reasons, that void*, char*,
signed char*, and unsigned char* have the same representation and
alignment. It also guarantees, if I recall correctly, that all
pointers to structs and unions have the same representation. Beyond
that, all bets are off.
background on this: I'm writing a simple IDL compiler that produces 'C'
code, and am trying to get array/pointer marshalling to be 'safe' across
architectures. Any good literature/references on the subject (from a C
perspective) would be appreciated.

Any value of any pointer-to-object type can be converted to void* and
back again, yielding the same value. You can probably make use of
that.
 
M

Malcolm McLean

James Brown said:
I have a quick question regarding the size of pointer-types:

I believe that the sizeof(char *) may not necessarily be the same as
sizeof(int *) ? But how about multiple levels of pointers to the same
type? Would sizeof(char **) be the same as sizeof(char *)? And if it is,
would the internal representation be the same in both cases?

background on this: I'm writing a simple IDL compiler that produces 'C'
code, and am trying to get array/pointer marshalling to be 'safe' across
architectures. Any good literature/references on the subject (from a C
perspective) would be appreciated.
A char ** is not very similar to to a char *.

A char * points to a list of characters, a char ** to a list of pointers. If
the hardware has 64 bit bytes, and 8 bit chars are supported by bit
twiddling, then char *s will need extra bits to represent the offset. ints
will, naturally, be 64 bits on such a system, so an int * is a raw address.

However a char ** is probably just a pointer to an internal structure - the
character pointers. It would be surprising, though not forbidden, for it to
be different to an int **.
 
E

Eric Sosman

Keith said:
The standard guarantees, for historical reasons, that void*, char*,
signed char*, and unsigned char* have the same representation and
alignment. It also guarantees, if I recall correctly, that all
pointers to structs and unions have the same representation. Beyond
that, all bets are off.

Small clarification: All pointers to all kinds of structs
have the same representation; we'll call it representation S.
Also, all pointers to all kinds of unions have the same
representation; we'll call it representation U. It is possible,
in theory at least, that S and U could be different.
 
K

Keith Thompson

Eric Sosman said:
Small clarification: All pointers to all kinds of structs
have the same representation; we'll call it representation S.
Also, all pointers to all kinds of unions have the same
representation; we'll call it representation U. It is possible,
in theory at least, that S and U could be different.

Quite correct, thank you.
 
C

CBFalconer

James said:
I have a quick question regarding the size of pointer-types:

I believe that the sizeof(char *) may not necessarily be the same
as sizeof(int *) ? But how about multiple levels of pointers to
the same type? Would sizeof(char **) be the same as sizeof(char *)?
And if it is, would the internal representation be the same in both
cases?

No. What you are guaranteed is that any pointer can be converted
to a void* and back again TO THE ORIGINAL TYPE. You are also
guaranteed that char* and void* have the same representation.
char** is none of these. Neither is int*. Your shortcuts may work
on many machines, but are not guaranteed, and not portable.
 
C

christian.bau

I believe that the sizeof(char *) may not necessarily be the same as
sizeof(int *) ? But how about multiple levels of pointers to the same type?
Would sizeof(char **) be the same as sizeof(char *)? And if it is, would the
internal representation be the same in both cases?

The usual reason why char* would have different size and
representation than say int* is that usually there are many more char
objects addressable than int objects. Say you had an architecture that
supports accessing 2^32 thirty-two bit quantities, and the compiler
decided that int = 32 bit, then int* would have to support 2^32
different values and fit into 32 bit, but char* would have to support
4 * 2^32 = 2^34 different values and could not fit into 32 bit; it
would likely be made 64 bit.

In such an implementation, a pointer to int* would have to support
2^32 different values. A pointer to char* would have to support 2^31
different values, same as a pointer to long long, so it would easily
fit into 32 bit. So if sizeof (char *) != sizeof (int *), then it is
much more likely that sizeof (char **) == sizeof (int *), and not
sizeof (char **) == sizeof (char *).
 
J

James Brown

Malcolm McLean said:
A char ** is not very similar to to a char *.

A char * points to a list of characters, a char ** to a list of pointers.
If the hardware has 64 bit bytes, and 8 bit chars are supported by bit
twiddling, then char *s will need extra bits to represent the offset. ints
will, naturally, be 64 bits on such a system, so an int * is a raw
address.

However a char ** is probably just a pointer to an internal structure -
the character pointers. It would be surprising, though not forbidden, for
it to be different to an int **.

understood, thankyou for the clarification.

James
 
C

Cesar Rabak

christian.bau escreveu:
The usual reason why char* would have different size and
representation than say int* is that usually there are many more char
objects addressable than int objects. Say you had an architecture that
supports accessing 2^32 thirty-two bit quantities, and the compiler
decided that int = 32 bit, then int* would have to support 2^32
different values and fit into 32 bit, but char* would have to support
4 * 2^32 = 2^34 different values and could not fit into 32 bit; it
would likely be made 64 bit.

I think this is an /unsual/ reason. Pointers are intended to store
memory addresses. If one such reasoning of yours were taken to an
implementation, how would you return the pointer difference (for the
sake or argument lets make the two positions contiguous, and stick to a
32 bit architecture):

a) of two pointers to char
b) of two pointers to int
In such an implementation, a pointer to int* would have to support
2^32 different values. A pointer to char* would have to support 2^31
different values, same as a pointer to long long, so it would easily
fit into 32 bit. So if sizeof (char *) != sizeof (int *), then it is
much more likely that sizeof (char **) == sizeof (int *), and not
sizeof (char **) == sizeof (char *).

Do you know of an _existing_ architecture plus C compiler implementation?
 
J

James Brown

Harald van Dijk said:
They could be the same, but this is not guaranteed. Even for systems
where the size and representation are the same, compilers'
optimisations may cause your code to not function the way you want.


If you could be more specific about what you're trying to do,
preferably using a short code snippet, someone may be able to suggest
a way to avoid the issue.

Thankyou for your interest. Your comment about 'optimization' is appreciated
also. I will try to explain what I am attempting, but I have no actual
'code' yet - I am still in the 'is this possible' stage, hence my original
question.

I guess what I am trying to do is 'flatten' arbitrary types and maintain
their type information 'out of band'. For example:

1. A (char *) pointer (to an array of characters) would be represented
as-is.
2. An array of pointers-to-char ( char *argv[] for example) would have each
string in the array 'flattened' in turn.
3. A three-level pointer (char ***) would be treated similarly.

My current intention is to write a function that takes a generic pointer
type (void* I guess), along with an array of type-information that describes
each level of indirection in terms of it's size and length. This generic
function would then flatten the specified array/pointer/type/whatever
according to the type information. There might be one function per 'base
type' - i.e. one that handled chars,char*,char**, one that handled int,int*
etc.

For example (note this is not a complete/compilable fragment).

enum TYPE { NONE, ARRAY, POINTER };
struct TYPEINFO
{
enum TYPE type;
int elements;
};

void marshall(struct TYPEINFO *ti, void *ptr);

int main(int argc, char *argv[])
{

/* describe the argv[] array for marshalling purposes */
struct TYPEINFO ti[] = { { ARRAY, argc }, { POINTER, -1 }, { NONE } };

marshall(ti, argv);
return 0;
}

So I 'just' need to implement the marshall function. I would also have a
corresponding 'unmarshall' function that would take the same
type-information and reconstruct the original information (using potentially
different pointer values of course):

char **argv = unmarshall(ti);

I am coming to believe that it will not be possible, at least in a
language-safe manner.

James
 
J

James Brown

christian.bau said:
The usual reason why char* would have different size and
representation than say int* is that usually there are many more char
objects addressable than int objects. Say you had an architecture that
supports accessing 2^32 thirty-two bit quantities, and the compiler
decided that int = 32 bit, then int* would have to support 2^32
different values and fit into 32 bit, but char* would have to support
4 * 2^32 = 2^34 different values and could not fit into 32 bit; it
would likely be made 64 bit.

In such an implementation, a pointer to int* would have to support
2^32 different values. A pointer to char* would have to support 2^31
different values, same as a pointer to long long, so it would easily
fit into 32 bit. So if sizeof (char *) != sizeof (int *), then it is
much more likely that sizeof (char **) == sizeof (int *), and not
sizeof (char **) == sizeof (char *).

thankyou - that is a nice example that has helped to me understand. I think
my biggest problem is that most of my experience is on 32bit platforms. But
anyway, thanks for your time.

James
 
J

James Brown

CBFalconer said:
No. What you are guaranteed is that any pointer can be converted
to a void* and back again TO THE ORIGINAL TYPE. You are also
guaranteed that char* and void* have the same representation.
char** is none of these. Neither is int*. Your shortcuts may work
on many machines, but are not guaranteed, and not portable.

This alone (void * conversion) is valuable - thankyou.
 
C

CBFalconer

Cesar said:
.... snip ...

I think this is an /unsual/ reason. Pointers are intended to store
memory addresses. If one such reasoning of yours were taken to an
implementation, how would you return the pointer difference (for
the sake or argument lets make the two positions contiguous, and
stick to a 32 bit architecture):

This is elementary. There is no such thing as a pointer difference
unless both pointers point within the same object. There is no
limit on the size of a pointer. In particular, it need not fit
into any size of integer. The only relational operaters available
for all arbitrary pointers are == and !=.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews
 
C

christian.bau

Do you know of an _existing_ architecture plus C compiler implementation?

I don't, but then I wouldn't want to predict what architectures will
look like twenty years from now.
 
G

Guest

James said:
Harald van Dijk said:
They could be the same, but this is not guaranteed. Even for systems
where the size and representation are the same, compilers'
optimisations may cause your code to not function the way you want.


If you could be more specific about what you're trying to do,
preferably using a short code snippet, someone may be able to suggest
a way to avoid the issue.

Thankyou for your interest. Your comment about 'optimization' is appreciated
also. I will try to explain what I am attempting, but I have no actual
'code' yet - I am still in the 'is this possible' stage, hence my original
question.

I guess what I am trying to do is 'flatten' arbitrary types and maintain
their type information 'out of band'. For example:

1. A (char *) pointer (to an array of characters) would be represented
as-is.
2. An array of pointers-to-char ( char *argv[] for example) would have each
string in the array 'flattened' in turn.
3. A three-level pointer (char ***) would be treated similarly.

My current intention is to write a function that takes a generic pointer
type (void* I guess), along with an array of type-information that describes
each level of indirection in terms of it's size and length. This generic
function would then flatten the specified array/pointer/type/whatever
according to the type information. There might be one function per 'base
type' - i.e. one that handled chars,char*,char**, one that handled int,int*
etc.

Okay, in that case, sorry, it may be possible if you can assume some
system specifics, but in standard C, as you say you suspect, it's not
possible. At least, not for arbitrary depth. Since compilers are
allowed to give char *, char **, char ***, char ****, char *****, etc.
all different representations, portable C code cannot be written that
handles all of them. What you can do is write one version for char *,
one for char **, one for char ***, etc. but it would become a
maintenance nightmare, and it's always possible that you would run
into the need for one more level of indirection too late.

<OT> If you are not limited to C, other languages (including C++)
support type-generic user functions, which may make what you're
looking for possible. </OT>
 
K

Keith Thompson

Cesar Rabak said:
christian.bau escreveu:

I think this is an /unsual/ reason. Pointers are intended to store
memory addresses.

Pointers are intended to point to objects (or to functions, but I
think we're talking about object pointers), and to follow the
semantics defined in the C standard. On many systems, this is most
easily done by using simple memory addresses. On others, it isn't.
If one such reasoning of yours were taken to an
implementation, how would you return the pointer difference (for the
sake or argument lets make the two positions contiguous, and stick to
a 32 bit architecture):

a) of two pointers to char
b) of two pointers to int

Pointer difference (which is defined only for pointers into, or just
past the end of, the same object) is implemented however it needs to
be implemented to yield the correct results.
Do you know of an _existing_ architecture plus C compiler implementation?

The IBM AS/400 is often mentioned as an example of an architecture
that has a conforming C implementation while violating a lot of the
usual assumptions about how pointers work. I don't know the details
off the top of my head.

Another example is the various models of Cray vector machines. A
memory address can only address 64-bit words, but the C compiler has
CHAR_BIT==8 (the OS is Unix). An int* pointer is just a 64-bit memory
address. A char* or void* pointer is a memory address with an
additional 3-bit offset stored in the otherwise unused high-order 3
bits of the word. The compiler has to emit additional code to
implement arithmetic on char* pointers. All pointers are the same
size, but only because there happen to be unused bits available; it's
easy to imagine a similar system that would need to make char* bigger
than int*.
 
K

Keith Thompson

James Brown said:
I guess what I am trying to do is 'flatten' arbitrary types and
maintain their type information 'out of band'. For example:

1. A (char *) pointer (to an array of characters) would be represented
as-is.
2. An array of pointers-to-char ( char *argv[] for example) would have
each string in the array 'flattened' in turn.
3. A three-level pointer (char ***) would be treated similarly.

A char** pointer is a pointer to a pointer to char. That's not enough
information to determine what kind of data it points to. In the case
of the argv parameter to main(), it happens to point to the first
element of an array of char*, each of which either is a null pointer
or points to the first element of a null-terminated string. But
that's only one possibility. If you want to flatten the data
structure, you need to know what the data structure is, and that
information may not be available from the C source code (at least not
without a lot of extra analysis of what the code does, which I suspect
would be beyond the scope of your project).

One approach might be to manually annotate the C declarations with
information about how they're used.
 
C

Cesar Rabak

Keith Thompson escreveu:
The IBM AS/400 is often mentioned as an example of an architecture
that has a conforming C implementation while violating a lot of the
usual assumptions about how pointers work. I don't know the details
off the top of my head. OK

Another example is the various models of Cray vector machines. A
memory address can only address 64-bit words, but the C compiler has
CHAR_BIT==8 (the OS is Unix). An int* pointer is just a 64-bit memory
address. A char* or void* pointer is a memory address with an
additional 3-bit offset stored in the otherwise unused high-order 3
bits of the word. The compiler has to emit additional code to
implement arithmetic on char* pointers. All pointers are the same
size, but only because there happen to be unused bits available; it's
easy to imagine a similar system that would need to make char* bigger
than int*.

I cannot grok this: memory addresses are 64 bit words. What are the
"unused bits available"?
 
K

Keith Thompson

Cesar Rabak said:
Keith Thompson escreveu: [...]
Another example is the various models of Cray vector machines. A
memory address can only address 64-bit words, but the C compiler has
CHAR_BIT==8 (the OS is Unix). An int* pointer is just a 64-bit memory
address. A char* or void* pointer is a memory address with an
additional 3-bit offset stored in the otherwise unused high-order 3
bits of the word. The compiler has to emit additional code to
implement arithmetic on char* pointers. All pointers are the same
size, but only because there happen to be unused bits available; it's
easy to imagine a similar system that would need to make char* bigger
than int*.

I cannot grok this: memory addresses are 64 bit words. What are the
"unused bits available"?

A memory address is 64 bits, but no system actually has 16 exawords
(2**64 words) of memory, and the hardware probably wouldn't be capable
of addressing it even if it existed. The high-order bits of an
address are always going to be zero, and thus are available for other
purposes.

I don't know what would happen if you set the high-order bits to
non-zero and then attempted to use it as a word pointer, but any
attempt to do so would invoke undefined behavior anyway.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Help with pointers 1
Why sizeof(main) = 1? 8
Sizes of pointers 233
Sizeof query 45
differentiating between pointers - "primary"? 9
The question regarding type of pointers 17
void pointers 36
Sizeof for pointers 15

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,270
Latest member
TopCryptoTwitterChannels_

Latest Threads

Top