redefining, and casting pointers to, structs

G

goldfita

I saw some code that appeared to do something similar to this

struct foo {
char offset[XXX];
int d;
};

struct foo {
int a;
int b;
char c;
int d; //same as other d
};

And the structs are defined differently in different files. I tried
doing this myself by just casting pointers to these structs from one
type to the other. I want the members after offset to align, so I
tried setting XXX = sizeof(int)*2+sizeof(char). But the compiler
padded that char with 3 more bytes; so, obviously it didn't work. Then
I tried embedding a struct defined as

struct inner_foo {
int a;
int b;
char c;
};

And I used that inside foo. Then I set XXX to sizeof(inner_foo). And
that worked. So my question is, is that safe and portable? If no,
please make a suggestion. Also, does the compiler always layout the
struct members in the same order you declare them?

thank you

-Todd
 
M

Mark McIntyre

On 3 Jan 2006 15:25:46 -0800, in comp.lang.c ,
I saw some code that appeared to do something similar to this

struct foo {
char offset[XXX];
int d;
};

struct foo {
int a;
int b;
char c;
int d; //same as other d
};
....
So my question is, is that safe and portable?

No. The width of types varies from system to system and the compiler
is allowed to add padding between objects in a struct. It could pad
around your inner_foo differently on different implementations.
If no, please make a suggestion.

I'm not sure why one would do this - if you have a definition of foo
why not use it?
Also, does the compiler always layout the
struct members in the same order you declare them?

Yes, but not necessarily all snuggled up together.
Mark McIntyre
 
F

Flash Gordon

I saw some code that appeared to do something similar to this

struct foo {
char offset[XXX];
int d;
};

struct foo {
int a;
int b;
char c;
int d; //same as other d
};

And the structs are defined differently in different files.

Either you are misreading the code or the code is at *best* highly
platform specific and at worst completely broken.
> I tried
doing this myself by just casting pointers to these structs from one
type to the other. I want the members after offset to align, so I
tried setting XXX = sizeof(int)*2+sizeof(char). But the compiler
padded that char with 3 more bytes; so, obviously it didn't work.

As, indeed, it is allowed to. The compiler can insert padding after any
member of a struct for any reason.
> Then
I tried embedding a struct defined as

struct inner_foo {
int a;
int b;
char c;
};

And I used that inside foo. Then I set XXX to sizeof(inner_foo). And
that worked. So my question is, is that safe and portable?

No. The compiler could choose to place padding after struct inner_foo.
See comment above.
> If no,
please make a suggestion.

I suggest not doing it.
> Also, does the compiler always layout the
struct members in the same order you declare them?

Yes.

If you tell us what problem you are trying to solve, together with the
code you have written (cut down to a sensible size but still complete
and compilable), we might then be able to suggest alternative approaches.
 
G

goldfita

I'm not sure why one would do this - if you have a definition of foo
why not use it?


I think the reason is because you may not have a definition. Suppose
you pass a pointer around to code that does not care what's in offset.
That code would only need to access member d. It may be receiving
pointers to structs of various definitions that it does not know about.
Nor could it konw if the definition can change.

-Todd
 
G

goldfita

If you tell us what problem you are trying to solve, together with the
code you have written (cut down to a sensible size but still complete
and compilable), we might then be able to suggest alternative approaches.


I'm just trying to understand some code I've seen in the past and why
you might want to do it that way. There isn't really a problem. But
thanks anyway.
 
F

Flash Gordon

(e-mail address removed) wrote:

Please leave the attribution in so we can see who said what.
Attributions being the lines like the above saying things like, "fred
wrote".
I think the reason is because you may not have a definition. Suppose
you pass a pointer around to code that does not care what's in offset.
That code would only need to access member d. It may be receiving
pointers to structs of various definitions that it does not know about.
Nor could it konw if the definition can change.

If you are doing that, then there is a special dispensation for the
*initial* members of a struct when they appear in a union. I.e.

struct foo {
int type;
char name[NAME_SIZE];
double val;
};

struct bar {
int type;
char name[NAME_SIZE];
int val;
}

union foobar {
struct foo foomess;
struct bar barmess;
}

In the above, you are allowed to use either foo or bar to access the
type and name fields, but from val on you have to use the correct struct
type. So for the type of thing you are discussing the correct method is
to put the common elements at the start of the struct and the elements
that vary at the end. It may simplify getting it right to put all the
initial members in to a struct and doing something like:

struct head {
int type;
char name[NAME_SIZE];
};

struct foo {
struct head messhead;
double val;
};

struct bar {
struct head messhead;
int val;
};

This also avoids the technical requirement to have the structs in a
union because you are allowed to convert a pointer to a struct to a
pointer to its first element, which is now itself a struct containing
all of the common elements. This is how I am generally inclined to do it.
 
G

goldfita

Flash said:
If you are doing that, then there is a special dispensation for the
*initial* members of a struct when they appear in a union. I.e.

struct foo {
int type;
char name[NAME_SIZE];
double val;
};

struct bar {
int type;
char name[NAME_SIZE];
int val;
}

union foobar {
struct foo foomess;
struct bar barmess;
}

In the above, you are allowed to use either foo or bar to access the
type and name fields, but from val on you have to use the correct struct
type. So for the type of thing you are discussing the correct method is
to put the common elements at the start of the struct and the elements
that vary at the end. It may simplify getting it right to put all the
initial members in to a struct and doing something like:

Yes, it seems to make more sense in this example to put the common
members first. I'm confused about this union example though. (I've
never used one.) Are you saying the compiler will ensure the top
members of foo and bar that are the same will be aligned? Does that
mean the compiler has to be able to see the definitions to make the
union? What if foo and bar were declared elsewhere (with different
compiler options or on different compilers)?

-Todd
 
F

Flash Gordon

Flash said:
If you are doing that, then there is a special dispensation for the
*initial* members of a struct when they appear in a union. I.e.

struct foo {
int type;
char name[NAME_SIZE];
double val;
};

struct bar {
int type;
char name[NAME_SIZE];
int val;
}

union foobar {
struct foo foomess;
struct bar barmess;
}

In the above, you are allowed to use either foo or bar to access the
type and name fields, but from val on you have to use the correct struct
type. So for the type of thing you are discussing the correct method is
to put the common elements at the start of the struct and the elements
that vary at the end. It may simplify getting it right to put all the
initial members in to a struct and doing something like:

Yes, it seems to make more sense in this example to put the common
members first. I'm confused about this union example though. (I've
never used one.) Are you saying the compiler will ensure the top
members of foo and bar that are the same will be aligned?
Yes.

> Does that
mean the compiler has to be able to see the definitions to make the
union?

Of course. That's a bit like asking if a builder needs access to the
bricks to build a brick wall.
> What if foo and bar were declared elsewhere (with different
compiler options or on different compilers)?

If you change compiler options (or compiler) for different translation
units then potentially all bets are off. After all, some compilers have
options specifically to change how things will be aligned.

Also, from the way you are talking I think you need to read up on header
files and how they are used. If you are using the struct in more than
one C file then it should be defined in a header file and all C files
that use it then include that header file to get the definition.

Note that object (variables) should not be defined in header files, only
declared with extern.
 
G

goldfita

Flash said:
Flash said:
If you are doing that, then there is a special dispensation for the
*initial* members of a struct when they appear in a union. I.e.

struct foo {
int type;
char name[NAME_SIZE];
double val;
};

struct bar {
int type;
char name[NAME_SIZE];
int val;
}

union foobar {
struct foo foomess;
struct bar barmess;
}

In the above, you are allowed to use either foo or bar to access the
type and name fields, but from val on you have to use the correct struct
type. So for the type of thing you are discussing the correct method is
to put the common elements at the start of the struct and the elements
that vary at the end. It may simplify getting it right to put all the
initial members in to a struct and doing something like:

Yes, it seems to make more sense in this example to put the common
members first. I'm confused about this union example though. (I've
never used one.) Are you saying the compiler will ensure the top
members of foo and bar that are the same will be aligned?
Yes.

Does that
mean the compiler has to be able to see the definitions to make the
union?

Of course. That's a bit like asking if a builder needs access to the
bricks to build a brick wall.
What if foo and bar were declared elsewhere (with different
compiler options or on different compilers)?

If you change compiler options (or compiler) for different translation
units then potentially all bets are off. After all, some compilers have
options specifically to change how things will be aligned.

Also, from the way you are talking I think you need to read up on header
files and how they are used. If you are using the struct in more than
one C file then it should be defined in a header file and all C files
that use it then include that header file to get the definition.

Note that object (variables) should not be defined in header files, only
declared with extern.


I'm not sure we're talking about the same thing here. The issue was, I
don't want to define foo and bar because I may not know what's going to
be in them except for type and name. That's why I wasn't sure about
the union version. But I think your struct version can handle that
case.

-Todd
 
F

Flash Gordon

Flash said:
Flash Gordon wrote:
If you are doing that, then there is a special dispensation for the
*initial* members of a struct when they appear in a union. I.e.

struct foo {
int type;
char name[NAME_SIZE];
double val;
};

struct bar {
int type;
char name[NAME_SIZE];
int val;
}

union foobar {
struct foo foomess;
struct bar barmess;
}

In the above, you are allowed to use either foo or bar to access the
type and name fields, but from val on you have to use the correct struct
type. So for the type of thing you are discussing the correct method is
to put the common elements at the start of the struct and the elements
that vary at the end. It may simplify getting it right to put all the
initial members in to a struct and doing something like:
Yes, it seems to make more sense in this example to put the common
members first. I'm confused about this union example though. (I've
never used one.) Are you saying the compiler will ensure the top
members of foo and bar that are the same will be aligned?

I'm not sure we're talking about the same thing here. The issue was, I
don't want to define foo and bar because I may not know what's going to
be in them except for type and name. That's why I wasn't sure about
the union version. But I think your struct version can handle that
case.

OK, then in your situation I would agree the struct version I mentioned
else-thread (a struct defining the common initial sequence then used as
the first element of all other structs) is easier. To use the above the
mechanism the user of your code would have to declare things correctly
and declare the union. With a struct defined in your header file for the
common bit, all they need to do is use that struct as the first member
of their struct.
 
G

goldfita

Flash said:
Flash said:
(e-mail address removed) wrote:
Flash Gordon wrote:
If you are doing that, then there is a special dispensation for the
*initial* members of a struct when they appear in a union. I.e.

struct foo {
int type;
char name[NAME_SIZE];
double val;
};

struct bar {
int type;
char name[NAME_SIZE];
int val;
}

union foobar {
struct foo foomess;
struct bar barmess;
}

In the above, you are allowed to use either foo or bar to access the
type and name fields, but from val on you have to use the correct struct
type. So for the type of thing you are discussing the correct method is
to put the common elements at the start of the struct and the elements
that vary at the end. It may simplify getting it right to put all the
initial members in to a struct and doing something like:
Yes, it seems to make more sense in this example to put the common
members first. I'm confused about this union example though. (I've
never used one.) Are you saying the compiler will ensure the top
members of foo and bar that are the same will be aligned?

I'm not sure we're talking about the same thing here. The issue was, I
don't want to define foo and bar because I may not know what's going to
be in them except for type and name. That's why I wasn't sure about
the union version. But I think your struct version can handle that
case.

OK, then in your situation I would agree the struct version I mentioned
else-thread (a struct defining the common initial sequence then used as
the first element of all other structs) is easier. To use the above the
mechanism the user of your code would have to declare things correctly
and declare the union. With a struct defined in your header file for the
common bit, all they need to do is use that struct as the first member
of their struct.

This just occured to me. Ignoring the example above, say you just had
a simple struct with some members. You compile it into a shared
object, and you access it in another program using dlopen (or link
against it - I don't think it matters). Both source files had access
to the same definition of the struct. But if you compiled them at
different times, how can you be sure both compilations mapped the
structs using the same memory scheme? Do you have to recompile using
the same compiler and flags? (Remember the struct definition was
available in both compilations.) If you don't, isn't it possible for
the struct memory to be different in separate executable segments? I
just did something like this recently -- seems like a very common thing
to do. Sorry if this OT.

-Todd
 
F

Flash Gordon

(e-mail address removed) wrote:

This just occured to me. Ignoring the example above, say you just had
a simple struct with some members. You compile it into a shared
object, and you access it in another program using dlopen (or link
against it - I don't think it matters).

The C language knows nothing about dlopen
<OT>
The issues are probably the same as linking the separate translation
units together.
> Both source files had access to the same definition of the struct.

Then, as far as the C language is concerned, if you compile and link
everything together with the same implementation all is well.
> But if you compiled them at
different times, how can you be sure both compilations mapped the
structs using the same memory scheme?

As far as the C language is concerned, if it is the same implementation
is has to use the same memory scheme.
> Do you have to recompile using
the same compiler and flags? (Remember the struct definition was
available in both compilations.)

Here we get on to where it becomes awkward, and we start heading out of
the C language and could easily stray beyond what is topical here.

If changing a compiler option changes the memory layout, thus breaking
the otherwise legal code (legal because the same struct definition is
used in both translation units) then, as far as the C language is
concerned, it is a *different* implementation.

So, in other words, you do have to ensure you that you don't change any
options that will break it.

<OT>
Most systems have a defined ABI which specifies how things should be
laid out, although compilers sometimes have switches that break the ABI.
> If you don't, isn't it possible for
the struct memory to be different in separate executable segments? I
just did something like this recently -- seems like a very common thing
to do.

The C standard knows nothing about separate segments, so as far as the C
standard is concerned it has to make things work or it is a broken
implementation. Although see the comments earlier, changing options that
change where the struct is stored could make it a different implementation.
> Sorry if this OT.

You are, IMHO, on the edge. How specific compilers work is off topic,
but what the C standard guarantees to work is on topic.
 
G

goldfita

Flash said:
(e-mail address removed) wrote:



The C language knows nothing about dlopen
<OT>
The issues are probably the same as linking the separate translation
units together.


Then, as far as the C language is concerned, if you compile and link
everything together with the same implementation all is well.


As far as the C language is concerned, if it is the same implementation
is has to use the same memory scheme.


Here we get on to where it becomes awkward, and we start heading out of
the C language and could easily stray beyond what is topical here.

If changing a compiler option changes the memory layout, thus breaking
the otherwise legal code (legal because the same struct definition is
used in both translation units) then, as far as the C language is
concerned, it is a *different* implementation.

So, in other words, you do have to ensure you that you don't change any
options that will break it.

<OT>
Most systems have a defined ABI which specifies how things should be
laid out, although compilers sometimes have switches that break the ABI.


The C standard knows nothing about separate segments, so as far as the C
standard is concerned it has to make things work or it is a broken
implementation. Although see the comments earlier, changing options that
change where the struct is stored could make it a different implementation.


You are, IMHO, on the edge. How specific compilers work is off topic,
but what the C standard guarantees to work is on topic.

Does the above mean the standard implies one of the following?

1) A conforming compiler must compile all executable code in a single
compilation (or surely you could break the executable). You cannot
have external executable code unless it was simultaneously compiled.

2) A conforming compiler must always layout it's structs (memory in
general) in the same way regardless of how it's configured. (And
obviously you could not link with executable code from another
compiler, even if it were conforming as well.)

3) The header files are not enough. The compiler needs to have access
to the executable/object or some other meta information. In other
words, a conforming implementation needs both the header files and
information about the memory layout of prior compilations if it will
allow you to build in pieces.


If at least one of those isn't true, then I think the compiler cannot
be conforming to the standard. No? By the way, what group would you
suggest for more about implementation details?



-Todd
 
R

Richard Bos

Flash Gordon said:
As far as the C language is concerned, if it is the same implementation
is has to use the same memory scheme.

Wherever did you get that? If it's the same _program_ it has to use the
same memory scheme - obviously, or one function couldn't call another.
But if it's two separate programs, there is not a single way that either
program could possibly rely on the other's memory layout without
straying well beyond the borders of the Standard. There's just nothing
in the Standard that requires consistency beyond a single execution.
Even apparently simple values such as the ones in <limits.h> could
change between compiles - remember that there is nothing which requires
the Standard headers to be actual _files_, for example.

Richard
 
F

Flash Gordon

Richard said:
Wherever did you get that? If it's the same _program_ it has to use the
same memory scheme - obviously, or one function couldn't call another.
But if it's two separate programs, there is not a single way that either
program could possibly rely on the other's memory layout without
straying well beyond the borders of the Standard.

The standard does not require that you compile all translation units at
the same time, so if the program contains more than one translation
unit, then the compiler cannot change how structures are layed out
because you might link different translation units together.

Also, all quotes being from N1126:
| 3.12
| implementation
| particular set of software, running in a particular translation
| environment under particular control options, that performs
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
| translation of programs for, and supports execution of functions in, a
| particular execution environment
> There's just nothing
in the Standard that requires consistency beyond a single execution.
Even apparently simple values such as the ones in <limits.h> could
change between compiles


What stops you from using the offsetof macro to do conditional
compilation? How is it to detect at run time that it is meant to have
compiled some completely different code, possible requiring a diagnostic?

Also, when it comes to binary streams, we have section 7.19.2 which says:
| 3 A binary stream is an ordered sequence of characters that can
| transparently record internal data. Data read in from a binary
| stream shall compare equal to the data that were earlier written out
| to that stream, under the same implementation. Such a stream may,
| however, have an implementation-defined number of null characters
| appended to the end of the stream.

So you are allowed to have one program write the contents of a struct to
a binary stream and a completely different program read it back and get
the *same* value. That is only possible if the padding is required to be
consistent for an implementation for all compilations and all runs of
all programs that use binary streams.

As far as I can see the standard is littered with things that assume
that any given implementation uses consistent layout, although, of
course, the padding bits/bytes can contain completely different data.
> - remember that there is nothing which requires
the Standard headers to be actual _files_, for example.

True but irrelevant.
 
E

Eric Sosman

Does the above mean the standard implies one of the following?

1) A conforming compiler must compile all executable code in a single
compilation (or surely you could break the executable). You cannot
have external executable code unless it was simultaneously compiled.

No. Section 5.1.1.1 "Program structure" says that the
components of a single C program can be translated at
different times.
2) A conforming compiler must always layout it's structs (memory in
general) in the same way regardless of how it's configured. (And
obviously you could not link with executable code from another
compiler, even if it were conforming as well.)

One single conforming compiler must arrange compatible
structs the same way, even in different translation units.
Two points:

- There are no guarantees about struct types that are not
compatible, except for the special dispensation regarding
structs that appear in the same union. The compiler is
allowed to be whimsical, applying different rules to
different structs.

- "How it's configured:" From the point of view of the
Standard, changing a compiler's "configuration" is the
same as changing to a different compiler. Compiling one
module with the "-align=8" flag and another with "-align=4"
means you've translated the two modules with different
compilers, and they're not guaranteed to work together.
3) The header files are not enough. The compiler needs to have access
to the executable/object or some other meta information. In other
words, a conforming implementation needs both the header files and
information about the memory layout of prior compilations if it will
allow you to build in pieces.

The header files (and the compiler's built-in knowledge about
things like the sizes and alignment requirements of various data
types) *are* enough. If module A and module B include identical
declarations of some type, the compiler will compute the same
representation for that type in each compilation. Note again that
different compiler options/configurations mean different compilers.
 
N

Netocrat

(e-mail address removed) wrote:
[on the implications of the Standard's requirement that each program must
have the same struct representation, especially in regard to linking]
To summarise: (2) is required, (1) may be legal but is unnecessary anyhow
due to (2), and (3) is totally unnecessary from the Standard's
perspective, but for an eccentric inter-compiler linking system, beyond
the scope of the C Standard, it's a possibility (probably too
impractical for anyone to want to implement it though).
No. Section 5.1.1.1 "Program structure" says that the
components of a single C program can be translated at
different times.

It it legal for a compiler to require a single compilation though? The
Standard uses the term "may" in this section in a way that to me is a
little ambiguous - is it intended as permission for the programmer (in
which case it's an implementation requirement) or for the implementation
(in which case it's optional)?

If legal, it seems that goldfita's suggestion is indeed one possible way
for a compiler to satisfy the Standard's requirement that each program has
the same struct representation (which as Flash Gordon has fairly
convincingly argued elsewhere, is almost certainly intended to be a
requirement on the entire implementation).

I think that here goldfita's alluding to operating system requirements: a
common ABI for linking, which goes beyond the C Standard since other
compilers must also conform to this ABI. So if you want to be able to
dynamically (or statically) link with files created by a different or
non-C compiler, then you need to rely on vendor guarantees about the
compatibility of generated library files. And if you invoke your compiler
as a "different implementation" using a set of options incompatible with
the ABI used for linking, then the Standard (and your operating system)
washes its hands of the matter.
One single conforming compiler must arrange compatible
structs the same way, even in different translation units. Two points:

- There are no guarantees about struct types that are not
compatible, except for the special dispensation regarding structs
that appear in the same union. The compiler is allowed to be
whimsical, applying different rules to different structs.

- "How it's configured:" From the point of view of the
Standard, changing a compiler's "configuration" is the same as
changing to a different compiler. Compiling one module with the
"-align=8" flag and another with "-align=4" means you've
translated the two modules with different compilers, and they're
not guaranteed to work together.

That's a good point - minor option changes are unlikely to change the
implementation into a different one from the Standard's point of view,
they would have to change the compiler's output in a significant and
incompatible way.
 
D

Dave Thompson

Richard Bos wrote:


What stops you from using the offsetof macro to do conditional
compilation? How is it to detect at run time that it is meant to have
compiled some completely different code, possible requiring a diagnostic?
Although the offsetof macro itself can be available at preprocessing
time, the (symbol-table from declaration) information it needs to be
useful is not; for the common implementation of offsetof() this will
be manifested as a preprocessor expression trying to access the
mythical object at null when it isn't allowed to access any object at
all. Similarly #if size(int_var /* or int */ ) == 2 doesn't work.

Now, you can write code that can be recognized at compile time as
'dead' (unreachable) because of offsetof(), size(), and similar, and a
decent compiler will probably optimize it away.


- David.Thompson1 at worldnet.att.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top