Hi,
Can somebody please help me grok the offsetof() macro?
I've found an explanation on
http://www.embedded.com/shared/printableArticle.jhtml?articleID=18312031
but I'm afraid it still doesn't make sense to me.
The sticking point seems to be:
((s *)0) takes the integer zero and casts it as a pointer to s.
To my untrained eye that would basically result in a null pointer. Does
this expression result in some special behaviour or am I missing
something?
Thanks.
Simon
The offsetof() macro is "special", as are many other things in the
standard C library and implementation. Just for example, the fopen()
function cannot itself be written in standard, portable C. It must
either interface with a lower-level API function provided by the
platform's operating system, or it must contain non-standard, hardware
specific code of its own to access a device that stores files.
Let's start with an example:
struct xyz { int x; int y; int z; };
The point is that once the compiler has processed the definition of a
structure type, it knows the offset of each of the members of the
structure. Once a human being (describes most programmers) has read
the structure definition, they know the offset of the first member,
'x', because it is at offset 0. They can guess or assume the offsets
of 'y' and 'z', but they can't really know.
There are times when it is quite useful to know the offset of a member
of a structure from the beginning of the structure. And the compiler
already has this information. So what we need is a way to write
source code that allows the compiler to provide this information to a
program when the program needs it.
Now, given our example definition, above, let's see how we can get the
offset of 'z' in a C program without using the offsetof() macro:
#include <stdio.h>
#include <stddef.h>
struct xyz { int x; int y; int z; };
int main(void)
{
struct xyz x_y_z = { 0 };
struct xyz *xyz_ptr = &x_y_z;
int *ip = &x_y_z.z;
ptrdiff_t diff = (char *)ip - (char *)xyz_ptr;
printf("%d\n", (int)diff);
return 0;
}
This is all perfectly valid, legal C code. You can cast any valid
pointer to any type of object to a pointer to char, and it will point
to the same address in memory as the lowest addressed byte of the
object. You can subtract two pointers of the same type, as long as
they both point within the same object. The result of the valid
subtraction between two pointers of the same type is the signed
integer type ptrdif_t, defined in <stddef.h>. Note that the
offsetof() macro is defined as yielding a type size_t, not ptrdif_t,
but that's a simple cast.
But to do this, we used an actual struct xyz object. What if we want
to do this without having one handy?
So let's try another version of the program which works with a pointer
to a struct xyz:
#include <stdio.h>
#include <stddef.h>
struct xyz { int x; int y; int z; };
size_t my_offsetof(struct xyz *xyz_ptr)
{
size_t result =
(size_t)((char *)&xyz_ptr->z - (char *)xyz_ptr);
return result;
}
int main(void)
{
struct xyz x_y_z = { 0 };
printf("%d\n", (int)my_offsetof(&x_y_z));
return 0;
}
This produces the same result, which happens to be 8 on the particular
compiler that I am using. But note that even though the function
my_offsetof() does not define and create a struct xyz object, the
caller must have one handy to provide a pointer to the function.
If we want to do this without having a struct xyz object anywhere, we
could be tempted to change this line in main() from:
printf("%d\n", (int)my_offsetof(&x_y_z));
....to:
printf("%d\n", (int)my_offsetof(0));
....and on many compilers this will work just fine, and perhaps on some
few it will crash the program or produce unpredictable results. That
is because passing 0 or the macro NULL causes my_offsetof() to be
called with a null pointer. And the highlighted subexpression:
(size_t)((char *)&xyz_ptr->z - (char *)xyz_ptr);
^^^^^^^^^^
....dereferences that null pointer, although in actuality the compiler
does not need to dereference the pointer, since the value of
xyz_ptr->z is not used.
So on implementations where the compiler writer knows that his
compiler will not generate code to dereference the pointer, he can
define offsetof(s,m) like this:
#define offset(s, m) ((size_t)&((s *)0)->m)
....or something similar.
This code is not legal for you to write, but the implementer is not
constrained by the rules that apply to legal programs. The
implementer is allowed to bend or break the rules in standard library
functions and macros, so long as they deliver the proper results.
On the other hand, there are some compilers that do it differently. I
have seen definitions that look like this:
#define offset(s, m) __builtin_offset__(s, m)
....that cause the compiler to look up the results in its symbol table
directly without going through the clumsy and technically undefined
operation on a null pointer.
The point is that there is a well-defined and perfectly legal sequence
that works properly on all compilers to get this information if there
is an object of the structure type available, but there is no legal,
defined method to get it without such an object. If there were, there
would have been no need for the language standard to require that the
implementation provides the macro.
So the point is, use the macro instead of trying to do the
calculations yourself. Even if you have an object of the type around,
the expression:
offsetof(struct xyz, z)
....is much more readable in your source code than:
(char *)&x_y_z->z - (char *)&x_y_z
....and the name of the macro makes the reason for its use
self-documenting.