sizeof pointers

B

Ben Bacarisse

James Brown said:
Harald van Dijk said:
They could be the same, but this is not guaranteed. Even for systems
where the size and representation are the same, compilers'
optimisations may cause your code to not function the way you want.


If you could be more specific about what you're trying to do,
preferably using a short code snippet, someone may be able to suggest
a way to avoid the issue.

Thankyou for your interest. Your comment about 'optimization' is
appreciated also. I will try to explain what I am attempting, but I
have no actual 'code' yet - I am still in the 'is this possible'
stage, hence my original question.

I guess what I am trying to do is 'flatten' arbitrary types and
maintain their type information 'out of band'. For example:

1. A (char *) pointer (to an array of characters) would be represented
as-is.
2. An array of pointers-to-char ( char *argv[] for example) would have
each string in the array 'flattened' in turn.
3. A three-level pointer (char ***) would be treated similarly.

My current intention is to write a function that takes a generic
pointer type (void* I guess), along with an array of type-information
that describes each level of indirection in terms of it's size and
length. This generic function would then flatten the specified
array/pointer/type/whatever according to the type information. There
might be one function per 'base type' - i.e. one that handled
chars,char*,char**, one that handled int,int* etc.

For example (note this is not a complete/compilable fragment).

enum TYPE { NONE, ARRAY, POINTER };
struct TYPEINFO
{
enum TYPE type;
int elements;
};

void marshall(struct TYPEINFO *ti, void *ptr);

int main(int argc, char *argv[])
{

/* describe the argv[] array for marshalling purposes */
struct TYPEINFO ti[] = { { ARRAY, argc }, { POINTER, -1 }, { NONE } };

marshall(ti, argv);
return 0;
}

If this needs to portable you will have trouble as has been discussed
elsethread -- but you have a way out. You have an IDL (is the design
under your control?) and you generate C, so you can generate short
stubs that do the work for specific pointer types.

To marshal[1] a char *** that your IDL tells you is an array you can
generate:

void marshal_char_ppp(char ***p, size_t n)
{
while (n--) marshal_char_pp(*p++);
}

and so on.

I did something much like this for a portable RPC mechanism in the
days before ANSI C (not that the '89 standard would have helped all
that much in this case).
 
J

James Brown

Ben Bacarisse said:
James Brown said:
Harald van Dijk said:
James Brown wrote:
All,

I have a quick question regarding the size of pointer-types:

I believe that the sizeof(char *) may not necessarily be the same as
sizeof(int *) ? But how about multiple levels of pointers to the
same type?
Would sizeof(char **) be the same as sizeof(char *)? And if it is,
would the
internal representation be the same in both cases?

They could be the same, but this is not guaranteed. Even for systems
where the size and representation are the same, compilers'
optimisations may cause your code to not function the way you want.

background on this: I'm writing a simple IDL compiler that produces 'C'
code, and am trying to get array/pointer marshalling to be 'safe'
across
architectures. Any good literature/references on the subject (from a C
perspective) would be appreciated.

If you could be more specific about what you're trying to do,
preferably using a short code snippet, someone may be able to suggest
a way to avoid the issue.

Thankyou for your interest. Your comment about 'optimization' is
appreciated also. I will try to explain what I am attempting, but I
have no actual 'code' yet - I am still in the 'is this possible'
stage, hence my original question.

I guess what I am trying to do is 'flatten' arbitrary types and
maintain their type information 'out of band'. For example:

1. A (char *) pointer (to an array of characters) would be represented
as-is.
2. An array of pointers-to-char ( char *argv[] for example) would have
each string in the array 'flattened' in turn.
3. A three-level pointer (char ***) would be treated similarly.

My current intention is to write a function that takes a generic
pointer type (void* I guess), along with an array of type-information
that describes each level of indirection in terms of it's size and
length. This generic function would then flatten the specified
array/pointer/type/whatever according to the type information. There
might be one function per 'base type' - i.e. one that handled
chars,char*,char**, one that handled int,int* etc.

For example (note this is not a complete/compilable fragment).

enum TYPE { NONE, ARRAY, POINTER };
struct TYPEINFO
{
enum TYPE type;
int elements;
};

void marshall(struct TYPEINFO *ti, void *ptr);

int main(int argc, char *argv[])
{

/* describe the argv[] array for marshalling purposes */
struct TYPEINFO ti[] = { { ARRAY, argc }, { POINTER, -1 }, {
NONE } };

marshall(ti, argv);
return 0;
}

If this needs to portable you will have trouble as has been discussed
elsethread -- but you have a way out. You have an IDL (is the design
under your control?) and you generate C, so you can generate short
stubs that do the work for specific pointer types.

To marshal[1] a char *** that your IDL tells you is an array you can
generate:

void marshal_char_ppp(char ***p, size_t n)
{
while (n--) marshal_char_pp(*p++);
}

and so on.

I did something much like this for a portable RPC mechanism in the
days before ANSI C (not that the '89 standard would have helped all
that much in this case).

Yes the IDL is under my control - so I like your suggestion of the 'stubs'

thanks,
James
 
J

James Brown

Keith Thompson said:
James Brown said:
I guess what I am trying to do is 'flatten' arbitrary types and
maintain their type information 'out of band'. For example:

1. A (char *) pointer (to an array of characters) would be represented
as-is.
2. An array of pointers-to-char ( char *argv[] for example) would have
each string in the array 'flattened' in turn.
3. A three-level pointer (char ***) would be treated similarly.

A char** pointer is a pointer to a pointer to char. That's not enough
information to determine what kind of data it points to. In the case
of the argv parameter to main(), it happens to point to the first
element of an array of char*, each of which either is a null pointer
or points to the first element of a null-terminated string. But
that's only one possibility. If you want to flatten the data
structure, you need to know what the data structure is, and that
information may not be available from the C source code (at least not
without a lot of extra analysis of what the code does, which I suspect
would be beyond the scope of your project).

One approach might be to manually annotate the C declarations with
information about how they're used.

I guess I should have mentioned - but yes, the 'C' declarations are all
annotated such that I will always know if things are NULL/single
element/variable length etc.
 
D

Dik T. Winter

>
> A memory address is 64 bits, but no system actually has 16 exawords
> (2**64 words) of memory, and the hardware probably wouldn't be capable
> of addressing it even if it existed. The high-order bits of an
> address are always going to be zero, and thus are available for other
> purposes.
>
> I don't know what would happen if you set the high-order bits to
> non-zero and then attempted to use it as a word pointer, but any
> attempt to do so would invoke undefined behavior anyway.

Within the hardware there is no problem. It uses only 24 or 32
low-order bits (depending on the model).
 
R

Richard Tobin

Within the hardware there is no problem. It uses only 24 or 32
low-order bits (depending on the model).

This has been true on other architectures, and come back to bite those
who relied on it. Unlikely in this case, admittedly.

-- Richard
 
K

Keith Thompson

This has been true on other architectures, and come back to bite those
who relied on it. Unlikely in this case, admittedly.

In this case, it's the compiler that depends on it, and the compiler
and hardware are provided by the same vendor.
 
C

Christopher Layne

CBFalconer said:
No. What you are guaranteed is that any pointer can be converted
to a void* and back again TO THE ORIGINAL TYPE. You are also
guaranteed that char* and void* have the same representation.
char** is none of these. Neither is int*. Your shortcuts may work
on many machines, but are not guaranteed, and not portable.

This actually brings up a question of mine. I have simple stack code that
consists of the following:

/* code */

typedef struct list_s__ list_s;
struct list_s__ {
list_s *n;
const void *d;
};

extern list_s *ls_push(list_s *lp, const void *d)
{
list_s *lpn;

lpn = ec_malloc(sizeof *lpn);
lpn->n = lp;
lpn->d = d;

return lpn;
}

extern list_s *ls_pop(list_s *lp, void **d)
{
list_s *lpn;

if (lp == NULL) {
if (d) *d = NULL;
return NULL;
}

if (d) *d = (void *)lp->d;
lpn = lp->n;
free(lp);

return lpn;
}

Now the following will always issue a strict aliasing violation w/ gcc if I
cast to void **, rather than void *.

void a(void)
{
list_s *l = NULL;
int *ip, i = 47;

l = ls_push(l, &i);
#ifdef VVP
/* strict aliasing violation, but still "works" */
l = ls_pop(l, (void **)&ip);
#else
/* works, but doesn't feel correct to me */
l = ls_pop(l, (void *)&ip);
#endif
return;
}

$ cc -DVVP -Os -W -Wall -pedantic -g3 -c li.c
li.c: In function 'a':
li.c:11: warning: dereferencing type-punned pointer will break strict-aliasing
rules
$ cc -Os -W -Wall -pedantic -g3 -c li.c
$

Alternatively, I could just change the code to return the data, on pop, rather
than return the head - but it's an idiom I typically use often (checking head
for NULL in a loop construct, basic list stuff, etc). From past google
searches, people have mentioned that casting to void * "fixes" this
diagnostic. My question is "why?" as it expects an argument of void **, and
casting to void * does seem like monkeying around. I do realize that void **
is not the same as void *, as well.

I realize this may be a gcc-ish question, but perhaps someone has some input.
 
G

Guest

Christopher said:
This actually brings up a question of mine. I have simple stack code that
consists of the following:

/* code */

typedef struct list_s__ list_s;
struct list_s__ {
list_s *n;
const void *d;
};

extern list_s *ls_push(list_s *lp, const void *d)
{
list_s *lpn;

lpn = ec_malloc(sizeof *lpn);
lpn->n = lp;
lpn->d = d;

return lpn;
}

extern list_s *ls_pop(list_s *lp, void **d)
{
list_s *lpn;

if (lp == NULL) {
if (d) *d = NULL;
return NULL;
}

if (d) *d = (void *)lp->d;
lpn = lp->n;
free(lp);

return lpn;
}

Now the following will always issue a strict aliasing violation w/ gcc if I
cast to void **, rather than void *.

void a(void)
{
list_s *l = NULL;
int *ip, i = 47;

l = ls_push(l, &i);
#ifdef VVP
/* strict aliasing violation, but still "works" */
l = ls_pop(l, (void **)&ip);
#else
/* works, but doesn't feel correct to me */
l = ls_pop(l, (void *)&ip);
#endif
return;
}

$ cc -DVVP -Os -W -Wall -pedantic -g3 -c li.c
li.c: In function 'a':
li.c:11: warning: dereferencing type-punned pointer will break strict-aliasing
rules
$ cc -Os -W -Wall -pedantic -g3 -c li.c
$

Alternatively, I could just change the code to return the data, on pop, rather
than return the head - but it's an idiom I typically use often (checking head
for NULL in a loop construct, basic list stuff, etc). From past google
searches, people have mentioned that casting to void * "fixes" this
diagnostic. My question is "why?" as it expects an argument of void **, and
casting to void * does seem like monkeying around. I do realize that void **
is not the same as void *, as well.

You are correct. As far as GCC is concerned, casting to void * means
"trust me, I know what I'm doing". It will not disable any
optimisations, it will break exactly as much as a direct cast to void
**. It will simply tell GCC not to let you know about it. To fix the
code without changes to the interface, declare ip as void *, and
convert it to int * when you want to use it.
 
D

Dik T. Winter

> >
> > This has been true on other architectures, and come back to bite those
> > who relied on it. Unlikely in this case, admittedly.
>
> In this case, it's the compiler that depends on it, and the compiler
> and hardware are provided by the same vendor.[/QUOTE]

You will find that also utilities tend to depend on such features,
yielding a less portable utility. One of the most difficult to port
culprits I have seen was the Unix Bourne shell and Korn shell. They
used the fact that in a word pointer the low order bit was 0, so that
bit was used for other purposes.
 
C

Christopher Layne

Harald said:
You are correct. As far as GCC is concerned, casting to void * means
"trust me, I know what I'm doing". It will not disable any
optimisations, it will break exactly as much as a direct cast to void
**. It will simply tell GCC not to let you know about it. To fix the
code without changes to the interface, declare ip as void *, and
convert it to int * when you want to use it.

This made me think of something else. Is the following valid and portable?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

typedef enum { ALLOC = 1 } func_fl;
const char tstr[] = "test string";

int func(void *p, size_t l, func_fl flags)
{
char **q;

if (flags == ALLOC) {
p = *(q = p) = malloc((l = sizeof tstr));
if (p == NULL) abort();
memcpy(p, tstr, l);
}

fprintf(stderr, "f(): l == %ld, p == %.*s\n", (long)l, l, (char *)p);

return l;
}

int main(int argc, char **argv)
{
char *a;
size_t l;
func_fl f;

if (argc <= 2) return EXIT_FAILURE;

f = *argv[1] == '1' ? ALLOC : 0;
a = argv[2];

if (f)
l = func(&a, 0, f);
else
l = func(a, strlen(a), f);

fprintf(stderr, "m(): l == %ld, b == %.*s\n", (long)l, l, a);

if (f) free(a);

return EXIT_SUCCESS;
}

$ cc -O3 -g3 -W -Wall -pedantic -o tf tf.c
$
$ ./tf 0 "dude this seems a bit voodoo"
f(): l == 28, p == dude this seems a bit voodoo
m(): l == 28, b == dude this seems a bit voodoo
$
$ ./tf 1 "dude this seems a bit voodoo"
f(): l == 12, p == test string
m(): l == 12, b == test string
 
G

Guest

pete said:
No.
Assignment is not a sequence point.
Which of q or p or *q is written to first,
is not defined by the standard.

It was a long argument a while back. I don't think there ever was a
definitive answer.
- q is only written to. It is never read. The expression (q = p) is
not an lvalue; it has a side effect, but its value is the result of p,
converted to the type of q. There is no problem with q.
- *q is only written to. It is never read. See above. There is no
problem with *q.
- p is only written to once, and only read in the expression that
determines the value to be stored. p = p + 1 would be well-defined.
The big question for this part of the code is whether p itself is
considered read only to determine the value to be stored. Personally,
I don't think so. The value of the complete right-hand side of the
assignment to p depends on *(q = p)'s type (see above), not its value,
so it can be evaluated before *(q = p) is. If that logic is correct,
the code has undefined behaviour. However, q and *q are not relevant
to that.
 
G

Guest

Christopher said:
Harald said:
You are correct. As far as GCC is concerned, casting to void * means
"trust me, I know what I'm doing". It will not disable any
optimisations, it will break exactly as much as a direct cast to void
**. It will simply tell GCC not to let you know about it. To fix the
code without changes to the interface, declare ip as void *, and
convert it to int * when you want to use it.

This made me think of something else. Is the following valid and portable?
[...]
What exactly do you think might be wrong with it?
 
C

Christopher Layne

Harald said:
Christopher said:
Harald said:
You are correct. As far as GCC is concerned, casting to void * means
"trust me, I know what I'm doing". It will not disable any
optimisations, it will break exactly as much as a direct cast to void
**. It will simply tell GCC not to let you know about it. To fix the
code without changes to the interface, declare ip as void *, and
convert it to int * when you want to use it.

This made me think of something else. Is the following valid and portable?
[...]
What exactly do you think might be wrong with it?

Well.. I have doubts about the passing of a char ** through a void * and then
converting it back to a char **. Technically I should not have doubts as this
is supposed to be supported by the standard. But does the standard specify
that one can pass pointers to pointers of type X through (or as a) void *?

Obviously char * -> void * and back no issues.
But char *** -> void * -> char *** and back. Legal, no conversion and/or
alignment issues?
 
G

Guest

Christopher said:
Harald said:
Christopher said:
Harald van D?k wrote:
You are correct. As far as GCC is concerned, casting to void * means
"trust me, I know what I'm doing". It will not disable any
optimisations, it will break exactly as much as a direct cast to void
**. It will simply tell GCC not to let you know about it. To fix the
code without changes to the interface, declare ip as void *, and
convert it to int * when you want to use it.

This made me think of something else. Is the following valid and portable?
[...]
What exactly do you think might be wrong with it?

Well.. I have doubts about the passing of a char ** through a void * and then
converting it back to a char **. Technically I should not have doubts as this
is supposed to be supported by the standard. But does the standard specify
that one can pass pointers to pointers of type X through (or as a) void *?

Obviously char * -> void * and back no issues.
But char *** -> void * -> char *** and back. Legal, no conversion and/or
alignment issues?

That's fine. Any pointer to any object type can be converted to void *
and back without loss of information.
 
Q

quarkLore

background on this: I'm writing a simple IDL compiler that produces 'C'
code, and am trying to get array/pointer marshalling to be 'safe' across
architectures. Any good literature/references on the subject (from a C
perspective) would be appreciated.
Just an info because you are creating IDL compiler which will take in
any valid C code. Apart from from data pointers and arrays functions
pointers can also be given. And sizeof operator should not be used
with function pointers.
i found this in K&R section A.7.4.8
 
B

Ben Bacarisse

quarkLore said:
Just an info because you are creating IDL compiler which will take in
any valid C code. Apart from from data pointers and arrays functions
pointers can also be given. And sizeof operator should not be used
with function pointers.
i found this in K&R section A.7.4.8

I don't have a recent edition to check but it seems unlikely that they
are at odds with the C standard. Are you, perhaps, confusing
"function pointers" with an "expression that has function type"?

It is easy to confuse the two, because a function name (which is
simple expressions with function type) is converted to a pointer to
that function *except* when it is the operand of sizeof (or of &).
 
B

Barry Schwarz

I don't have a recent edition to check but it seems unlikely that they
are at odds with the C standard. Are you, perhaps, confusing
"function pointers" with an "expression that has function type"?

It is easy to confuse the two, because a function name (which is
simple expressions with function type) is converted to a pointer to
that function *except* when it is the operand of sizeof (or of &).

A function name cannot be the operand of sizeof.


Remove del for email
 
B

Ben Bacarisse

Barry Schwarz said:
A function name cannot be the operand of sizeof.

No indeed. Nor can any expression that has function type (6.5.3.4.1).

Section 6.3.2.1.4 explains the two cases when a "function designator"
will not get converted to a pointer -- one of these being when it is
the operand of sizeof. Are there *any* cases when you can pass a
function designator to sizeof and not fall foul of 6.5.3.4.1?

I kept the two issues separate (as the standard seems to) just in case!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top