Distance between struct members

L

lovecreatesbea...

1. The following code snippet uses minus operation on two pointers to
calculate the distance between struct members. This is illegal, right?

2. s1 and s2 are type of the same struct S. Can the distance of s1.i4
between i3 be used to deduce the distance between s2.i4 and s2.i3?

Thank you for your time.


#include <stdio.h>
#include <stddef.h>

struct S {
/*...*/
int i3;
/*...*/
int i4;
};

int main(void)
{
struct S s1, s2;
ptrdiff_t distance;

distance = &s1.i4 - &s1.i3;
s1.i3 = 11;
s1.i4 = 12;
s2.i3 = 13;
s2.i4 = 14;
printf("%d, %d\n", s2.i3, *(&s2.i3 + distance));
return 0;
}
 
E

Eric Sosman

(e-mail address removed) wrote On 10/18/07 12:47,:
1. The following code snippet uses minus operation on two pointers to
calculate the distance between struct members. This is illegal, right?

Yes. To see why (or one reason why, anyhow), remember
that pointer arithmetic operates in units of the pointed-to
type. Now consider what might lie in the /*...*/ between
members i3 and i4. If the size of what's there is not an
exact multiple of the size of an int, i3 and i4 are separated
by something-and-a-fraction units. Pointer arithmetic can't
handle the -and-a-fraction part.
2. s1 and s2 are type of the same struct S. Can the distance of s1.i4
between i3 be used to deduce the distance between s2.i4 and s2.i3?

Yes, but let's tighten up what "distance" means. If
you express everything in units of bytes (rather than ints
or whatever), all will be well. C guarantees that

(char*)&s1.i4 - (char*)&s1.i3
== (char*)&s2.i4 - (char*)&s2.i3

However, there are no guarantees about

(char*)&s1.i3 - (char*)s2.i3
 
K

Keith Thompson

Eric Sosman said:
(e-mail address removed) wrote On 10/18/07 12:47,:

Yes. To see why (or one reason why, anyhow), remember
that pointer arithmetic operates in units of the pointed-to
type. Now consider what might lie in the /*...*/ between
members i3 and i4. If the size of what's there is not an
exact multiple of the size of an int, i3 and i4 are separated
by something-and-a-fraction units. Pointer arithmetic can't
handle the -and-a-fraction part.
[...]

Yes, but that's just one reason, and it depends on what you mean by
"illegal".

The real reason is that pointer subtraction invokes undefined behavior
if the two pointers point to distinct objects. See C99 6.5.6p9. This
applies even to subtraction of char* pointers, which are not affected
by alignment.

(In a typical implementation, the subtraction is likely to give you a
somewhat meaningful result. If the the difference is not a multiple
of the size of the pointed-to object, the remainder is likely to be
quitely ignored. But there are, of course, absolutely no guarantees.)
 
J

Jack Klein

Eric Sosman said:
(e-mail address removed) wrote On 10/18/07 12:47,:

Yes. To see why (or one reason why, anyhow), remember
that pointer arithmetic operates in units of the pointed-to
type. Now consider what might lie in the /*...*/ between
members i3 and i4. If the size of what's there is not an
exact multiple of the size of an int, i3 and i4 are separated
by something-and-a-fraction units. Pointer arithmetic can't
handle the -and-a-fraction part.
[...]

Yes, but that's just one reason, and it depends on what you mean by
"illegal".

The real reason is that pointer subtraction invokes undefined behavior
if the two pointers point to distinct objects. See C99 6.5.6p9. This
applies even to subtraction of char* pointers, which are not affected
by alignment.

I disagree about using pointer to char, specifically pointer to
unsigned.

Any object, including the structure in the OP's post, can be accessed
as a suitably sized array of unsigned char. It is legal, therefore,
to subtract the addresses of two members of the same structure,
provided of course they are cast to pointers to unsigned char.

The result will be a ptrdiff_t representing the number of bytes
between the first byte in the representation of the first member and
the first byte in the representation of the second member.

I do agree about using pointers to int, regardless of alignment
issues, because clearly two different int members of a structure are
not elements of the same array of ints.
(In a typical implementation, the subtraction is likely to give you a
somewhat meaningful result. If the the difference is not a multiple
of the size of the pointed-to object, the remainder is likely to be
quitely ignored. But there are, of course, absolutely no guarantees.)

Now the question is, can anybody find wording in the standard
(probably scattered abound in multiple places) that definitively makes
doing this with pointer to char or pointer to signed char well-defined
because it is well-defined for pointer to unsigned char?

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
K

Keith Thompson

Jack Klein said:
Eric Sosman said:
(e-mail address removed) wrote On 10/18/07 12:47,:
1. The following code snippet uses minus operation on two pointers to
calculate the distance between struct members. This is illegal, right?

Yes. To see why (or one reason why, anyhow), remember
that pointer arithmetic operates in units of the pointed-to
type. Now consider what might lie in the /*...*/ between
members i3 and i4. If the size of what's there is not an
exact multiple of the size of an int, i3 and i4 are separated
by something-and-a-fraction units. Pointer arithmetic can't
handle the -and-a-fraction part.
[...]

Yes, but that's just one reason, and it depends on what you mean by
"illegal".

The real reason is that pointer subtraction invokes undefined behavior
if the two pointers point to distinct objects. See C99 6.5.6p9. This
applies even to subtraction of char* pointers, which are not affected
by alignment.

I disagree about using pointer to char, specifically pointer to
unsigned.

Any object, including the structure in the OP's post, can be accessed
as a suitably sized array of unsigned char. It is legal, therefore,
to subtract the addresses of two members of the same structure,
provided of course they are cast to pointers to unsigned char.

Certainly.

I may have misread your comments above. I thought you were talking
about subtracting pointers to members of distinct objects, rather than
pointers to members of the same object.

Subtracting two char* pointers, if they both point into the same
object (or just past its end) is valid. Subtracting two pointers of
any type that point to distinct objects invokes undefined behavior.
Subtracting two pointers to non-char types, both of which point into
the same structure, probably invokes undefined behavior because of the
alignment issues you mentioned above.
 
E

Eric Sosman

Keith said:
Eric Sosman said:
(e-mail address removed) wrote On 10/18/07 12:47,:
Yes. To see why (or one reason why, anyhow), remember
that pointer arithmetic operates in units of the pointed-to
type. Now consider what might lie in the /*...*/ between
members i3 and i4. If the size of what's there is not an
exact multiple of the size of an int, i3 and i4 are separated
by something-and-a-fraction units. Pointer arithmetic can't
handle the -and-a-fraction part.
[...]

Yes, but that's just one reason, and it depends on what you mean by
"illegal".

The real reason is that pointer subtraction invokes undefined behavior
if the two pointers point to distinct objects. See C99 6.5.6p9. This
applies even to subtraction of char* pointers, which are not affected
by alignment.

Well, that's no "reason" at all: It just states the Law
and offers no argument for why the Law should be as it is.
The most famous example of that particular argument is surely
"I am that I am," which few mortals can bring off believably.

In the example you snipped, the subtraction of int* pointers
was not well-defined but the subtraction of char* pointers was.
 
K

Keith Thompson

Eric Sosman said:
In the example you snipped, the subtraction of int* pointers
was not well-defined but the subtraction of char* pointers was.

Yes, because I misread it.
 
K

Kenneth Brody

Keith said:
Eric Sosman said:
(e-mail address removed) wrote On 10/18/07 12:47,:

Yes. To see why (or one reason why, anyhow), remember
that pointer arithmetic operates in units of the pointed-to
type. Now consider what might lie in the /*...*/ between
members i3 and i4. If the size of what's there is not an
exact multiple of the size of an int, i3 and i4 are separated
by something-and-a-fraction units. Pointer arithmetic can't
handle the -and-a-fraction part.
[...]

Yes, but that's just one reason, and it depends on what you mean by
"illegal".

The real reason is that pointer subtraction invokes undefined behavior
if the two pointers point to distinct objects. See C99 6.5.6p9. This
applies even to subtraction of char* pointers, which are not affected
by alignment.

But &s1.i4 and &s1.i3 are both pointers within s1, and therefore are
not "distinct objects". (I suppose the typical "IMO" disclaimer may
apply?)

Plus, as I understand it, it is perfectly legal to overlay an array
of unsigned chars on any object, and access any and all bytes within
that object through this array. How is casting &s1.i4 and &s1.i3 to
"unsigned char *" any different than overlaying an unsigned char
array?

On second thought, however, I can see that taking the addresses of
the two as their native "int *", you can say that the two ints are
not part of the same object, as they are not part of an array of
ints. (Which is why they may not be a multiple-of-sizeof-int bytes
apart.) It is the casting to "unsigned char *" which means that the
addresses can be treated "as-if" they were part of an array of
unsigned chars the size of the struct.

Perhaps we're both right?
(In a typical implementation, the subtraction is likely to give you a
somewhat meaningful result. If the the difference is not a multiple
of the size of the pointed-to object, the remainder is likely to be
quitely ignored. But there are, of course, absolutely no guarantees.)

I think offsetof() is the way to go here. The offset of s1.i3 is
guaranteed to be the same as the offset of s2.i3, assuming that s1
and s2 are the same type, and any arithmetic which arrives at that
offset is guaranteed to be properly aligned.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
L

lovecreatesbea...

(e-mail address removed) wrote On 10/18/07 12:47,:


Yes. To see why (or one reason why, anyhow), remember
that pointer arithmetic operates in units of the pointed-to
type. Now consider what might lie in the /*...*/ between
members i3 and i4. If the size of what's there is not an
exact multiple of the size of an int, i3 and i4 are separated
by something-and-a-fraction units. Pointer arithmetic can't
handle the -and-a-fraction part.


Yes, but let's tighten up what "distance" means. If
you express everything in units of bytes (rather than ints
or whatever), all will be well. C guarantees that

(char*)&s1.i4 - (char*)&s1.i3
== (char*)&s2.i4 - (char*)&s2.i3

However, there are no guarantees about

(char*)&s1.i3 - (char*)s2.i3

Thank you.

So, the extra casts make the code in the original post legal and
portable, doesn't it?

#include <stdio.h>
#include <stddef.h>

struct S {
/*...*/
int i3;
/*...*/
int i7;
};

int main(void)
{
struct S s1 = {11, 12}, s2 = {13, 14};
ptrdiff_t distance;

distance = (char *)&s1.i7 - (char *)&s1.i3;
printf("%d, %d\n", s2.i3, (int)*((char *)&s2.i3 + distance));
return 0;
}
 
L

lovecreatesbea...

I think offsetof() is the way to go here. The offset of s1.i3 is
guaranteed to be the same as the offset of s2.i3, assuming that s1
and s2 are the same type, and any arithmetic which arrives at that
offset is guaranteed to be properly aligned.

But the offsetof() uses size_t other than "char *" or "unsigned char
*" to designate the type of the addresses, why?
 
K

Keith Thompson

Kenneth Brody said:
Keith Thompson wrote: [...]
The real reason is that pointer subtraction invokes undefined behavior
if the two pointers point to distinct objects. See C99 6.5.6p9. This
applies even to subtraction of char* pointers, which are not affected
by alignment.

But &s1.i4 and &s1.i3 are both pointers within s1, and therefore are
not "distinct objects". (I suppose the typical "IMO" disclaimer may
apply?)

Yes; as I've acknowledged, my statement above was the result of my
misreading the previous material.
Plus, as I understand it, it is perfectly legal to overlay an array
of unsigned chars on any object, and access any and all bytes within
that object through this array. How is casting &s1.i4 and &s1.i3 to
"unsigned char *" any different than overlaying an unsigned char
array?

On second thought, however, I can see that taking the addresses of
the two as their native "int *", you can say that the two ints are
not part of the same object, as they are not part of an array of
ints. (Which is why they may not be a multiple-of-sizeof-int bytes
apart.) It is the casting to "unsigned char *" which means that the
addresses can be treated "as-if" they were part of an array of
unsigned chars the size of the struct.

The standard's requirement isn't really that they point to "the same
object". The actual wording, in C99 6.5.6p9, is:

When two pointers are subtracted, both shall point to elements of
the same array object, or one past the last element of the array
object; the result is the difference of the subscripts of the two
array elements.

It's stated elsewhere that any object of type T can be treated as an
array of type T[1], and that any object can be treated as an array of
unsigned char. The latter lets you get away with converting the
pointers &s1.i4 and &s1.i3 to ``unsigned char*'' before subtracting
them. Other rules, which I'm too lazy to look up, allow you to do the
same thing with ``char*'' or ``signed char*''. But if i4 and i3 are
both of type int, there's no rule that lets you treat them as elements
of the same array. (If the required alignment for type int is the
same as its size, you're very likely to get away with it unless the
implementation goes out of its way to stop you, but it's still
undefined behavior.)

[...]
I think offsetof() is the way to go here. The offset of s1.i3 is
guaranteed to be the same as the offset of s2.i3, assuming that s1
and s2 are the same type, and any arithmetic which arrives at that
offset is guaranteed to be properly aligned.

Agreed. If you care about the distince in bytes between the members
i3 and i4 of some struct type, then
offsetof(struct foo, i4) - offsetof(struct foo, i3)
is a clearer way to express it.
 
K

Keith Thompson

But the offsetof() uses size_t other than "char *" or "unsigned char
*" to designate the type of the addresses, why?

No, offsetof() uses size_t for the offset; it doesn't express any
address as a size_t.

Here's the standard's definition (C99 7.17p3):

offsetof(type, member-designator)

which expands to an integer constant expression that has type
size_t, the value of which is the offset in bytes, to the
structure member (designated by member-designator), from the
beginning of its structure (designated by type). The type and
member designator shall be such that given

static type t;

then the expression &(t.member-designator) evaluates to an address
constant. (If the specified member is a bit-field, the behavior is
undefined.)
 
E

Eric Sosman

(e-mail address removed) wrote On 10/19/07 14:43,:
So, the extra casts make the code in the original post legal and
portable, doesn't it?

The code in the original post was illegal and non-
portable. The revised code in this post is almost all
right: It will print 13 and something else (because the
third argument to printf should be `*(int*)((char*)...)'
instead of what you wrote).
#include <stdio.h>
#include <stddef.h>

struct S {
/*...*/
int i3;
/*...*/
int i7;
};

int main(void)
{
struct S s1 = {11, 12}, s2 = {13, 14};
ptrdiff_t distance;

distance = (char *)&s1.i7 - (char *)&s1.i3;
printf("%d, %d\n", s2.i3, (int)*((char *)&s2.i3 + distance));
return 0;
}

... but I must ask: WHY do you want to do this?
If you want to print the value of s2.i7, just do it:
don't fool around with all this pointer-bashing. Even
if it is *possible* to perform an appendectomy with two
teaspoons and an eggbeater, that doesn't make it a
good idea.
 
L

lovecreatesbea...

No, offsetof() uses size_t for the offset; it doesn't express any
address as a size_t.

Here's the standard's definition (C99 7.17p3):

offsetof(type, member-designator)

which expands to an integer constant expression that has type
size_t, the value of which is the offset in bytes, to the
structure member (designated by member-designator), from the
beginning of its structure (designated by type). The type and
member designator shall be such that given

static type t;

then the expression &(t.member-designator) evaluates to an address

Thank you.

Why it's not in this form

(char *) &(t.member-designator)

I read it from other posts, some peopoe said that the standard
definition

#define offsetof(type, memb) ((size_t) &((type *) 0)-> memb)

implies

#define offsetof(type, memb) ((size_t) &((type *) 0)-> memb -
&((type *) 0))


Isn't the following one better?

#define offsetof(type, memb) \
((size_t) ((char *) &((type *) 0)-> memb - (char *) &((type
*) 0)))
 
L

lovecreatesbea...

(e-mail address removed) wrote On 10/19/07 14:43,:




The code in the original post was illegal and non-
portable. The revised code in this post is almost all
right: It will print 13 and something else (because the
third argument to printf should be `*(int*)((char*)...)'
instead of what you wrote).

Thank you.

I wrote the third argument wrongly, thanks for the correction.
... but I must ask: WHY do you want to do this?
If you want to print the value of s2.i7, just do it:
don't fool around with all this pointer-bashing. Even
if it is *possible* to perform an appendectomy with two
teaspoons and an eggbeater, that doesn't make it a
good idea.

I didn't know the knowledge of these details about structs before. and
locating of struct members by offset.

Some people said the offsetof macro in this way

#define offsetof(type, memb) ((size_t) &((type *) 0)-> memb)

dereferences NULL /* 0 */ pointer and it's undefined behavior. I'm
even more anxious on this. And it's not put in this form

#define offsetof(type, memb) \
((size_t) ((char *) &((type *) 0)-> memb - (char *) &((type *)
0)))

Could you please talk about this more?
 
K

Keith Thompson

I restored the last two lines of the above (you quoted them at the
bottom of your followup).
Thank you.

Why it's not in this form

(char *) &(t.member-designator)

Why should it be? The point of mentioning ``&(t.member-designator)''
in the description is *only* that it must be an address constant; it's
used to specify which arguments to offsetof() are legal.
I read it from other posts, some peopoe said that the standard
definition

#define offsetof(type, memb) ((size_t) &((type *) 0)-> memb)

implies

#define offsetof(type, memb) ((size_t) &((type *) 0)-> memb -
&((type *) 0))


Isn't the following one better?

#define offsetof(type, memb) \
((size_t) ((char *) &((type *) 0)-> memb - (char *) &((type
*) 0)))

There is no "standard definition" of offsetof().

It's *commonly* defined as you write above. That definition invokes
undefined behavior but the implementation is allowed to take advantage
of the vagaries of the implementation. Subtracting
(char *) &((type*) 0)
from something that already works is useful or necessary.
 
K

Keith Thompson

Some people said the offsetof macro in this way

#define offsetof(type, memb) ((size_t) &((type *) 0)-> memb)

dereferences NULL /* 0 */ pointer and it's undefined behavior. I'm
even more anxious on this. And it's not put in this form

#define offsetof(type, memb) \
((size_t) ((char *) &((type *) 0)-> memb - (char *) &((type *)
0)))

Could you please talk about this more?

Have you read question 2.14 in the comp.lang.c FAQ,
<http://c-faq.com/>?
 
E

Eric Sosman

I didn't know the knowledge of these details about structs before. and
locating of struct members by offset.

Fine, but don't forget that the elements have names for
a reason: to make it easy to refer to them. Refer to struct
elements by their names whenever you know them -- that is,
whenever you know the type of the struct. It is possible to
get at a struct's elements using offsets and pointers and
casts, but this is best done only when you *don't* know the
struct type and hence the element names. It works, but it
has various drawbacks:

- As you have seen, it is easy to make misteaks that the
compiler will not detect.

- You lose the convenience of having the compiler keep
track of the element types. Refer to `s1.i7' and the
compiler knows it's an int, but use casts and offsets
and the compiler just has to trust your casting.

- It makes maintenance and debugging harder. If you found
that `s1.i7' is being set to a garbage value, you might
search your source for references to `i7' and put assert()
macros at each site. But you won't find cast-and-offset
references this way.

- It may make your code slower. Derive an `int*' from a
bunch of other data and store through it, and the compiler
may need to assume that every `int' variable it knows of is
a potential target. It may move register-resident values
back to memory, do your store, and then reload everything --
whereas if you'd just said `s1.i7 = 42' it would have known
that `i' and `j' and `k' were unaffected and could remain
safely and conveniently in their CPU registers.

In short, it's a technique that's available to you when it's
needed, but it's a technique of last resort. On a desert island
you might have to perform that appendectomy with two teaspoons
and an eggbeater, but it's not the method of choice.
Some people said the offsetof macro in this way

#define offsetof(type, memb) ((size_t) &((type *) 0)-> memb)

dereferences NULL /* 0 */ pointer and it's undefined behavior. I'm
even more anxious on this. And it's not put in this form

#define offsetof(type, memb) \
((size_t) ((char *) &((type *) 0)-> memb - (char *) &((type *)
0)))

Could you please talk about this more?

Keith Thompson has explained this elsethread. Briefly, it's
perfectly all right for the implementation's own code to rely on
things the Standard does not guarantee, because the implementation
can rely on its own behavior.
 
M

Martin Golding

On Fri, 19 Oct 2007 11:43:35 -0700, (e-mail address removed) wrote:
[much snippage]
So, the extra casts make the code in the original post legal and
portable, doesn't it?

Not quite. There appears to be a bug, which invokes undefined behavior,
which is never portable. I don't know if it's a mere typo or an actual
misunderstanding.
#include <stdio.h>
#include <stddef.h>

struct S {
/*...*/
int i3;
/*...*/
int i7;
};

int main(void)
{
struct S s1 = {11, 12}, s2 = {13, 14};
ptrdiff_t distance;

distance = (char *)&s1.i7 - (char *)&s1.i3;
printf("%d, %d\n", s2.i3,

This reads the first byte of s1.i7, likely not what you intended:
(int)*((char *)&s2.i3 + distance));
and, because it reads an int object using a char pointer, the
behavior is undefined. (It will, mostly, work.)
You probably wanted
*(int *)((char *)&s2.i3 + distance));
ie, cast the pointer to pointer-to-int before the dereference.
If you wanted to extract the lowest addressed byte of the int,
you would use unsigned char
*((unsigned char *)&s2.i3 + distance));
the behavior of which is defined by the standard.
return 0;
}


With the example numbers, the code is likely to appear to have worked.
struct S s1 = {11, 12}, s2 = {13, 0x01020304};
will demonstrate the problem.


Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,056
Members
48,769
Latest member
Clifft

Latest Threads

Top