different way of finding out offsetof a member in structure

A

abhimanyu.v

Hi Guys,

I have one doubt. The test program is given below. It uses two way of
finding out the offset of a variable in structure. I executed the
program and found the same result.

My question is what is difference between

1) (unsigned long) &((struct foobar *)0)->foo
and
2) (unsigned long)((char*)&tmp.boo - (char*)&tmp)

And why the second option is not used for offsetof macro.

What is obvious advantage of the first syntax? Anything wrong with the
second syntax?

Thanks
Abhimanyu

=================================

#include <stdio.h>
#include <stdlib.h>

struct foobar{
unsigned int foo;
char bar;
char boo;
};

int main()
{
struct foobar tmp;

printf("address of &tmp is= %p\n\n", &tmp);
printf("address of tmp->foo= %p \t offset of tmp->foo= %lu\n",
&tmp.foo, (unsigned long) &((struct foobar *)0)->foo);
printf("address of tmp->bar= %p \t offset of tmp->bar= %lu\n",
&tmp.bar, (unsigned long) &((struct foobar *)0)->bar);
printf("address of tmp->boo= %p \t offset of tmp->boo= %lu\n\n",
&tmp.boo, (unsigned long) &((struct foobar *)0)->boo);

printf("address of tmp->foo= %p \t offset of tmp->foo= %lu\n",
&tmp.foo, (unsigned long)((char*)&tmp.foo - (char*)&tmp) );
printf("address of tmp->bar= %p \t offset of tmp->bar= %lu\n",
&tmp.bar, (unsigned long)((char*)&tmp.bar - (char*)&tmp) );
printf("address of tmp->boo= %p \t offset of tmp->boo= %lu\n\n",
&tmp.boo, (unsigned long)((char*)&tmp.boo - (char*)&tmp) );

printf("Hello world!\n");
return 0;
}


Result
==================
address of &tmp is= 0022FF70

address of tmp->foo= 0022FF70 offset of tmp->foo= 0
address of tmp->bar= 0022FF74 offset of tmp->bar= 4
address of tmp->boo= 0022FF75 offset of tmp->boo= 5

address of tmp->foo= 0022FF70 offset of tmp->foo= 0
address of tmp->bar= 0022FF74 offset of tmp->bar= 4
address of tmp->boo= 0022FF75 offset of tmp->boo= 5

Hello world!

Press ENTER to continue.
 
K

karthikbalaguru

Hi Guys,

I have one doubt. The test program is given below. It uses two way of
finding out the offset of a variable in structure. I executed the
program and found the same result.

My question is what is difference between

1) (unsigned long) &((struct foobar *)0)->foo
and
2) (unsigned long)((char*)&tmp.boo - (char*)&tmp)

And why the second option is not used for offsetof macro.

What is obvious advantage of the first syntax? Anything wrong with the
second syntax?

Thanks
Abhimanyu

=================================

#include <stdio.h>
#include <stdlib.h>

struct foobar{
unsigned int foo;
char bar;
char boo;

};

int main()
{
struct foobar tmp;

printf("address of &tmp is= %p\n\n", &tmp);
printf("address of tmp->foo= %p \t offset of tmp->foo= %lu\n",
&tmp.foo, (unsigned long) &((struct foobar *)0)->foo);
printf("address of tmp->bar= %p \t offset of tmp->bar= %lu\n",
&tmp.bar, (unsigned long) &((struct foobar *)0)->bar);
printf("address of tmp->boo= %p \t offset of tmp->boo= %lu\n\n",
&tmp.boo, (unsigned long) &((struct foobar *)0)->boo);

printf("address of tmp->foo= %p \t offset of tmp->foo= %lu\n",
&tmp.foo, (unsigned long)((char*)&tmp.foo - (char*)&tmp) );
printf("address of tmp->bar= %p \t offset of tmp->bar= %lu\n",
&tmp.bar, (unsigned long)((char*)&tmp.bar - (char*)&tmp) );
printf("address of tmp->boo= %p \t offset of tmp->boo= %lu\n\n",
&tmp.boo, (unsigned long)((char*)&tmp.boo - (char*)&tmp) );

printf("Hello world!\n");
return 0;

}

Result
==================
address of &tmp is= 0022FF70

address of tmp->foo= 0022FF70 offset of tmp->foo= 0
address of tmp->bar= 0022FF74 offset of tmp->bar= 4
address of tmp->boo= 0022FF75 offset of tmp->boo= 5

address of tmp->foo= 0022FF70 offset of tmp->foo= 0
address of tmp->bar= 0022FF74 offset of tmp->bar= 4
address of tmp->boo= 0022FF75 offset of tmp->boo= 5

Hello world!

Press ENTER to continue.

Good Question.

But, i think that (unsigned long) &((struct foobar *)0)->bar is
internally implemented as
(unsigned long)((char*)&tmp.boo - (char*)&tmp).

I think, both mean the same(I am not sure). !!

Karthik Balaguru
 
A

abhimanyu.v

Good Question.

But, i think that (unsigned long) &((struct foobar *)0)->bar is
internally implemented as
(unsigned long)((char*)&tmp.boo - (char*)&tmp).

I think, both mean the same(I am not sure). !!

Karthik Balaguru

No the (unsigned long) &((struct foobar *)0)->bar is not same as
(unsigned long)((char*)&tmp.boo - (char*)&tmp).

The (unsigned long) &((struct foobar *)0)->bar is basically doing the
following thing:

1) Typecast the ZEROth memory with the structure.
2) Now assuming that ZEROth location is indeed 0, then pointing to the
member variable will give the memory location of the variable.

Now what if ZEROth location is not present at 0 internally? Then this
construct will fail!

Regards,
Abhimanyu
 
K

Keith Thompson

I have one doubt. The test program is given below. It uses two way of
finding out the offset of a variable in structure. I executed the
program and found the same result.

My question is what is difference between

1) (unsigned long) &((struct foobar *)0)->foo
and
2) (unsigned long)((char*)&tmp.boo - (char*)&tmp)

And why the second option is not used for offsetof macro.

What is obvious advantage of the first syntax? Anything wrong with the
second syntax?
[...]

The first form invokes undefined behavior. Note that this doesn't
mean that it doesn't work, or that it blows up; the behavior just
isn't defined by the standard. Implementations can use something
similar to your first example to implement offsetof, taking advantage
of the behavior of the particular compiler. (You can't reliably do
that in portable code, which is why offsetof is part of the
implementation.)

The second form doesn't invoke undefined behavior as far as I can
tell, but it can't be used to implement offsetof; the first argument
to offsetof is a struct type, not a struct object.
 
M

MisterE

Hi Guys,

I have one doubt. The test program is given below. It uses two way of
finding out the offset of a variable in structure. I executed the
program and found the same result.

My question is what is difference between

1) (unsigned long) &((struct foobar *)0)->foo
and
2) (unsigned long)((char*)&tmp.boo - (char*)&tmp)

? I assume you know the difference. The 0 one is just assigning the pointer
value 0 (Address 0) and the compiler does the offset from the struct.

The second one require a subtraction.
And why the second option is not used for offsetof macro.

What is obvious advantage of the first syntax? Anything wrong with the
second syntax?

The first one can load the value 0 to a reigster as a direct value. The 2nd
one cannot load its values directly because they are variable.
The second one also uses a subtraction operation.
The difference is that the first one is going to require less machine
instructions and will execute faster.
 
A

abhimanyu.v

? I assume you know the difference. The 0 one is just assigning the pointer
value 0 (Address 0) and the compiler does the offset from the struct.

The second one require a subtraction.



The first one can load the value 0 to a reigster as a direct value. The 2nd
one cannot load its values directly because they are variable.
The second one also uses a subtraction operation.
The difference is that the first one is going to require less machine
instructions and will execute faster.

Thanks a lot everyone!!

It indeed help me to understand the difference.

Regards,
Abhimanyu
 
M

Mark Bluemel

Hi Guys,

I have one doubt. The test program is given below. It uses two way of
finding out the offset of a variable in structure. I executed the
program and found the same result.

Which proves that for your particular compiler/platform combination the
two are equivalent. This is not guaranteed.
My question is what is difference between

1) (unsigned long) &((struct foobar *)0)->foo

This assumes that an address can meaningfully be cast to an integer
value. This is not always true.

It does not require an instance of the structure to be created...
and
2) (unsigned long)((char*)&tmp.boo - (char*)&tmp)
And why the second option is not used for offsetof macro.

This requires an instance of the structure...

See q 2.14 of the FAQ at http://www.c-faq.com which combines the two
techniques...
 
J

Jack Klein

? I assume you know the difference. The 0 one is just assigning the pointer
value 0 (Address 0) and the compiler does the offset from the struct.

I have to assume that you don't know much about C. Assigning 0 to a
pointer creates a null pointer, which does not point to address 0, and
may not be all bits 0 in its representation.
The first one can load the value 0 to a reigster as a direct value. The 2nd
one cannot load its values directly because they are variable.
The second one also uses a subtraction operation.
The difference is that the first one is going to require less machine
instructions and will execute faster.

....the real difference is that the first one produces undefined
behavior and is completely non-portable.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
J

Jack Klein

Which proves that for your particular compiler/platform combination the
two are equivalent. This is not guaranteed.

Absolutely nothing about the first one is guaranteed, since the
behavior is undefined. Not because the pointer is dereferenced,
because it is not, but because evaluating the expression performs
addition to a null pointer, which is undefined.
This assumes that an address can meaningfully be cast to an integer
value. This is not always true.

It also assumes that you can add an offset to a null pointer, which is
not defined.
It does not require an instance of the structure to be created...



This requires an instance of the structure...

See q 2.14 of the FAQ at http://www.c-faq.com which combines the two
techniques...

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
J

James Kuyper

Jack Klein wrote:
....
I have to assume that you don't know much about C. Assigning 0 to a
pointer creates a null pointer, which does not point to address 0, and

More accurately, it need not point to address 0; however, it's also
allowed to point address 0, but only if no C object is also allocated at
that same address. I've used systems where null pointers did indeed
point to address 0; when code had undefined behavior due to
dereferencing null pointers, the actual behavior involved actually
reading or writing starting at address 0. Depending upon the system,
this could be catastrophic for your program, or (for instance, under
DOS) catastrophic for the entire operating system.
 
R

Richard

Jack Klein said:
I have to assume that you don't know much about C. Assigning 0 to a
pointer creates a null pointer, which does not point to address 0, and
may not be all bits 0 in its representation.

But more often than not appears to try just that and is indeed all 0
bits. It might not work. But it mostly tries.

e.g in gdb

,---- code sample ---
| char *p="hello";
`--------------------

set variable p=0
p *p

"Cannot access memory at address 0x0"

but in this case its a restricted memory architecture.

Can one legally access a char at memory access 0 (assuming not protected) thus?

*(char*)0; ?

...the real difference is that the first one produces undefined
behavior and is completely non-portable.

Well, not portable to a tiny minority of systems and certainly not the
right way to do it.
 
S

santosh

But more often than not appears to try just that and is indeed all 0
bits. It might not work. But it mostly tries.

e.g in gdb

,---- code sample ---
| char *p="hello";
`--------------------

set variable p=0
p *p

"Cannot access memory at address 0x0"

but in this case its a restricted memory architecture.

Can one legally access a char at memory access 0 (assuming not
protected) thus?

*(char*)0; ?

No. I believe in Standard C you cannot deference address zero.

<OT>

Outside Standard C, this depends on the architecture. For the Intel x86
architecture you can do so only from ring 0 protection level.

Under the same architecture under segmented addressing mode a pointer
pointing to address zero may not actually point to the start of
system's memory, but merely to the start of a segment anywhere in
memory.

</OT>
 
M

Mark Bluemel

santosh said:
No. I believe in Standard C you cannot deference address zero.

Bzzt! Watch the terminology here. I suspect Richard has lured you into
the "addresses are integers" trap.

I'm not sure the standard forbids you dereferencing a null pointer. The
paragraph (6.3.2.3) I just reviewed doesn't have such an injunction and
Q 5.19 of the FAQ suggests that it can be a valid (in some sense) action.
 
R

Richard

santosh said:
No. I believe in Standard C you cannot deference address zero.

<OT>

This is perfectly On Topic. Since it involves issues with "standard C"
in the real world.
Outside Standard C, this depends on the architecture. For the Intel x86
architecture you can do so only from ring 0 protection level.

Under the same architecture under segmented addressing mode a pointer
pointing to address zero may not actually point to the start of
system's memory, but merely to the start of a segment anywhere in
memory.

This is still address 0. No difference IMO. A 0 pointer (pointer=0) is a "null"
pointer whether segmented or not.
 
R

Richard

Mark Bluemel said:
Bzzt! Watch the terminology here. I suspect Richard has lured you into
the "addresses are integers" trap.

I'm not sure the standard forbids you dereferencing a null
pointer. The paragraph (6.3.2.3) I just reviewed doesn't have such an

I would be surprised if the standard didn't forbid just that. But a 0
pointer?
 
J

James Kuyper

santosh said:
article <[email protected]>: ....

No. I believe in Standard C you cannot deference address zero.

A pointer which refers to address 0 is not necessarily a null pointer.

In standard C, dereferencing a null pointer has undefined behavior,
which makes it technically meaningless to talk about the location it
points at. However, if the undefined behavior for a particular platform
takes the form of accessing a particular piece of memory, that piece of
memory might or might not start at address 0. Just because you created
the pointer by using (char*)0 doesn't guarantee anything.
 
K

Keith Thompson

Mark Bluemel said:
I'm not sure the standard forbids you dereferencing a null
pointer. The paragraph (6.3.2.3) I just reviewed doesn't have such an
injunction and Q 5.19 of the FAQ suggests that it can be a valid (in
some sense) action.

It doesn't forbid it, but the behavior is undefined.

C99 6.3.2.3p3:

If a null pointer constant is converted to a pointer type, the
resulting pointer, called a _null pointer_, is guaranteed to
compare unequal to a pointer to any object or function.

C99 6.5.3.2p4:

The unary * operator denotes indirection. If the operand points to
a function, the result is a function designator; if it points to
an object, the result is an lvalue designating the object.

Since a null pointer doesn't point to an object, the standard doesn't
define the behavior of an attempt to dereference it.

Question 5.19 of the FAQ is:

How can I access an interrupt vector located at the machine's
location 0? If I set a pointer to 0, the compiler might translate
it to some nonzero internal null pointer value.

This is a very machine-specific thing. The standard does not define
the behavior of any of the proposed solutions.
 
K

Keith Thompson

James Kuyper said:
A pointer which refers to address 0 is not necessarily a null pointer.

In standard C, dereferencing a null pointer has undefined behavior,
which makes it technically meaningless to talk about the location it
points at. However, if the undefined behavior for a particular
platform takes the form of accessing a particular piece of memory,
that piece of memory might or might not start at address 0. Just
because you created the pointer by using (char*)0 doesn't guarantee
anything.

It guarantees that it's a null pointer.

The term "address 0" isn't necessarily meaningful. As far as C is
concerned, addresses are not numbers. An address must have a
numerical component, perhaps indirectly, in order for pointer
arithmetic to work, but the address as a whole is just an address.

On almost all modern implementations (that I know of):

Addresses can sensibly be represented as numbers.

All object pointers have the same size.

A null pointer is represented as all-bits-zero.

If you attempt to dereference a null pointer, you're attempting to
access memory at address 0. The results of this attempt are
machine-specific; most likely it will either fail horribly or
actually access whatever happens to be stored at address 0 (the
latter might cause further bad things to happen). A C
implementation must avoid storing any C-visible object at address
0.

Conversion between a pointer type and an integer type of the same
size, or between two pointer types, just reinterprets the bits;
there's no change of representation.

*None* of this is guaranteed, and there are (or have been, or perhaps
will be) real-world implementations that violate one or more of these
assumptions.

Conversion of an integer constant expression with the value 0 to a
pointer type is guaranteed to yield a null pointer value (which may or
may not be all-bits-zero). Conversion of a non-constant integer
expression with the value 0 yields some implementation-defined pointer
value, possibly a trap representation; this may or may not be
all-bits-zero (i.e., the conversion might be non-trivial), and it
might or might not be a null pointer value. In other words, the
following:

A null pointer value;

The result of converting a non-constant value 0 to a pointer type; and

A pointer whose representation is all-bits-zero

could possibly be three distinct pointer values.

(It's been argued that converting a non-constant value 0 to a pointer
type must yield the same result as converting a constant value 0 to a
pointer type, i.e., a null pointer. If that's the case, the three
cases above can only yield at most two distinct pointer values. I
disagree, but once you're writing code that cares one way or the
other, you're well beyond what's guaranteed by the standard anyway.)

If you really need to access memory at "address 0", assuming that's
meaningful, you need to do something very low-level and
system-specific. Question 5.19 in the FAQ provides several plausible
(but blatantly non-portable) suggestions for how to do this.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top