struct and union alignment

K

Keith Thompson

S.Tobias said:
Right.
extern char *pc;
void free(void *p);
free(pc); //correct

(Note: we don't need same representation requirement in the above
example. This requirement is necessary eg. for variadic arguments:
printf("%p", pc);
- no cast to void* required.)


No. I see no reason to require same alignment for types void* and char*,
because in function arguments they're passed by *value* (think conversion).
Similarly, long and char do not have to have same alignment, but we may
always use them in arguments "interchangeably":
extern long l;
extern char c;
int f_char(unsigned char c);
int f_long(unsigned long l);
f_char(l); //both calls correct
f_long(c);

No, we can't use long and char in arguments interchangeably. If
there's a prototype, the value is converted to the expected type;
you're not passing a char to f_long, you're passing a long. If
there's no visible prototype, the call f_long(c) invokes undefined
behavior.

Examples involving implicit conversions aren't relevant to the point.
In the call free(pc) above, you're not passing a char* to free(),
you're passing a void* value, the result of the implicit conversion of
the value of pc.

The interchangeability referred to in the footnote in C99 6.2.5
involves cases where something of one type is interpreted as another,
without a conversion (either explicit or implicit), such as a function
call with no prototype in scope, a call to a variadic function, or a
reference to a union member.

Suppose void* required 4-byte (word) alignment, but char* only
required 1-byte alignment (I'm referring to the required alignment for
a pointer object). Suppose the compiler has to generate different
code to access a byte-aligned char* than a word-aligned void*. Now
consider your call printf("%p", pc), where pc is a char*. The calling
code pushes the value of pc onto the stack at an odd address (assume
that there is a stack). The code inside printf that accesses the
argument expects a void*, so it uses an instruction that only works on
word-aligned values. Kaboom.

That's how differing alignments for void* and char* can cause problems
in parameter passing.

[snip]

(I'm not responding to the rest of your message, at least for now,
mostly because I don't have time to do it justice.)
 
K

Keith Thompson

Christian Kandeler said:
struct {
int a;
int b;
} *pointer_to_anonymous_struct;

I suspect (but I'm only guessing) that the reference to "anonymous
structs" was actually meant to refer to incomplete types, such as

#include <stdio.h>
int main(void)
{
struct incomplete *ptr;
printf("sizeof ptr = %d\n", (int)sizeof ptr);
return 0;
}

The compiler has to determine the representation of ptr without
knowing anything about "struct incomplete" except that it's a struct.
 
K

Keith Thompson

Chris Torek said:
The last manner is also found in: 6.3.2.3#7 (Pointers) and 7.20.3#1
(Memory management functions). See for yourself: in both cases "pointer
alignment" refers to the pointer *value*, not pointer type.

I am not going to address most of this (due to lack of time), but
I want to make several points here.

First, the value of a pointer to some type "T" -- i.e., a value of
type "T *" -- indeed possesses this "alignment" characteristic, so
that it is possible to ask whether such a pointer is correctly
aligned. (This question is inherently machine-dependent, however,
and on *some* machines it is meaningless.)

But note that if we store this pointer value in an object:

T *pointer_object;

we then have an *object*, not a variable. This object can have an
address:

T **p2 = &pointer_object;[/QUOTE]
[...]

Chris, did you mean to write that we have an object, not a *value*?
(T is both an object and a variable.)
 
C

CBFalconer

S.Tobias said:
.... snip ...

(Note: we don't need same representation requirement in the above
example. This requirement is necessary eg. for variadic arguments:
printf("%p", pc);
- no cast to void* required.)

On the contrary, that statement specifically requires a cast to
void*, unless pc is already of type void*.
 
K

Keith Thompson

CBFalconer said:
On the contrary, that statement specifically requires a cast to
void*, unless pc is already of type void*.

That's questionable, IMHO.

As a matter of style, I would always cast the argument to void* rather
than depend on an ambiguous guarantee of compatibility. But C99
6.2.5p26 says:

A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type.

and a footnote says:

The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values
from functions, and members of unions.

I'm not convinced that requiring the same representation and alignment
requirements is really enough to guarantee that printf("%p", pc); will
actually work (where pc is a char*), but that does seem to be the
intent -- and I'd be surprised if there were an actual C90 or C99
implementation on which it doesn't work. (Making it break would
require some perversity on the part of the implementer.)

On the other hand, there are no guarantees regarding
printf("%p", pi);
where pi is of type int*; void* and int* could have different
representations.

(I assume that pc and pi have valid values.)
 
F

Flash Gordon

I suspect (but I'm only guessing) that the reference to "anonymous
structs" was actually meant to refer to incomplete types, such as

Yes. I was having a bad day.
The compiler has to determine the representation of ptr without
knowing anything about "struct incomplete" except that it's a struct.

Indeed.
 
S

S.Tobias

Keith Thompson said:
Examples involving implicit conversions aren't relevant to the point.

True, I missed it. Now that you later mention it, I guess (I'm not
a historian) that long ago when there was no void type, the function
declarations didn't have prototypes either, so the declarations for
free and memcpy were:
free();
free(p) char *p; {...}
char *memcpy();
char *memcpy(d, s, n) char *d, *s; size_t n; {...}
And this is probably the interchangeability in arguments that the
Rationale refers to, ie. that we may may pass the newer void* arguments
to old library functions which expect char*, right?
Therefore internally it is required that newer void* type have at
least the alignment requirement of older char* type in function calls
and returns (it still means nothing in context of our discussion, the
programmer need not know about this), and of course same representation.


[ I think CBFalconer rightfully noticed that the argument in the call:
extern char *pc;
printf("%p", pc);
should be cast to type void* (strictly interpreting the description at
the fprintf() function).
My intention was to discuss argument passing in variadic functions. So,
for the purpose of the discussion, let's assume that printf is actually a
user defined function which duplicates standard library printf behaviour,
and which internally uses va_* macros from stdarg.h. (We can rename it
to my_printf() if you insist.) ]
The interchangeability referred to in the footnote in C99 6.2.5
involves cases where something of one type is interpreted as another,
without a conversion (either explicit or implicit), such as a function
call with no prototype in scope, a call to a variadic function, or a
reference to a union member.

For unions there is no problem: both types have same representation (we
agree to it), and the union itself accommodates alignment requirements
of all its members.
Suppose void* required 4-byte (word) alignment, but char* only
required 1-byte alignment (I'm referring to the required alignment for
a pointer object). Suppose the compiler has to generate different
code to access a byte-aligned char* than a word-aligned void*. Now
consider your call printf("%p", pc), where pc is a char*. The calling
code pushes the value of pc onto the stack at an odd address (assume
that there is a stack). The code inside printf that accesses the
argument expects a void*, so it uses an instruction that only works on
word-aligned values. Kaboom.

It's a nice explanation (really). But:
7.15.1#2 for variadic functions (and similarly 6.5.2.2#6 for functions
without prototype) makes special provision for void* and char* types.
In order to fulfill it the compiler, when creating a function call, *must*
push char* (void*) argument in such a way, that it may be accessed as
void* (char*) argument. (In practice that would mean taking at least
the least common multiple of both types' alignments.)

So, your example still doesn't prove that char* and void* types must
have the same alignment, or that the programmer needs know about it.

Moreover, passing and argument which is incompatible with parameter
(or expected) type, is *explicitly* deemed UB. There is no discussion
that we may pass "struct s1*" to where we expect "struct s2*", because
their alignments are the same (as you interpret it) and so are their
representations. They are incompatible types and it is undefined, period.
(If your interpretation is really right, then there must be a flaw in
the Standard here.)

Therefore, again, there is nothing a programmer gains here by knowing
that "pointer to struct" types have same alignment.

(I'm not responding to the rest of your message, at least for now,
mostly because I don't have time to do it justice.)

No hurry, I'll be waiting. You'll do me a favour.

But maybe there is no point in discussing everything at once. In order
to convince me to what is after "Similarly", first you have to convince
me to what the Standard means before the word. I still don't see why
void* and char* types should have the same alignment requirements,
and that the Standard actually requires it at all.
 
K

Keith Thompson

S.Tobias said:
But maybe there is no point in discussing everything at once. In order
to convince me to what is after "Similarly", first you have to convince
me to what the Standard means before the word. I still don't see why
void* and char* types should have the same alignment requirements,
and that the Standard actually requires it at all.

Just a quick thought. It's entirely possible that the standard is
over-specified in this area, and that there's no real advantage in
requiring void* and char* to have the same representation and
alignment requirements. (I'd have to spend some time thinking about
it to decide whether that's the case or not.) Even so, C99 6.2.5p26
does require them to have the same representation and alignment
requirements, at least according to my interpretation of the text.

In order for this to refer to the alignment of what they point to,
rather than the alignment of the pointer objects themselves, we'd have
to assume that in the phrase "the same representation and alignment
requirements", "representation" refers to the pointers (since there is
no representation for void), and "alignment requirements" refers to
the alignment of what's being pointed to. I suppose that's not
entirely unreasonable, since even in this sense the "alignment
requirements" of a pointer are arguably a property of the pointer
itself.

But we already know the alignment requirements of what void* and char*
point to: both types have to be able to refer to any byte in any
object. An additional explicit statement to that effect in 6.2.5p26
would be logically redundant (rather than just possibly unnecessary).
The only way 6.2.5p26 makes sense to me is if it refers to the
alignment requirements for char* and void* objects, not for what they
point to.

Here's a (rather silly) program that, I believe, is guaranteed to work
given the requirement in 6.2.5p26, and wouldn't be guaranteed to work
without it:

#include <stdio.h>
int main(void)
{
char obj;
char *c_ptr = &obj;
void *v_ptr;

v_ptr = *((void**)&c_ptr);

printf("c_ptr = %p\n", (void*)c_ptr);
printf("v_ptr = %p\n", v_ptr);
return 0;
}

It assigns the value of c_ptr to v_ptr by pretending that there's a
void* pointer object in the location of c_ptr. There's no char* to
void* conversion, implicit or explicit, so the compiler doesn't have a
chance to allow for any difference in representation or alignment. If
we used an int* rather than a char*, 6.2.5p26 wouldn't apply and we'd
have undefined behavior; int* and void* needn't even be the same size.

(gcc with too many warnings enabled complains that "dereferencing
type-punned pointer will break strict-aliasing rules".)
 
M

Mark F. Haigh

Keith Thompson said:
Flash Gordon said:
I would like to check if I understand the following excerpt correctly:

6.2.5#26 (Types):
All pointers to structure types shall have the same representation
and alignment requirements as each other. All pointers to union
types shall have the same representation and alignment requirements
as each other.

Does it mean that *all* structure (or union) types have the same
alignment?
Eg. type
struct { char c; }
and
struct { long double ldt[11]; }
have the same alignment requirements?

Yes. Think about what what has to be done to implement pointers to
anonymous structs. When the compiler is compiling a piece of code that
increments a pointer to an anonymous struct all it knows is that it is a
pointer to a struct, so how else could this have been done?

When 6.2.5 discusses the alignment requirements of various types, it's
clear that it's referring to the alignment of objects of the type
itself. It uses similar wording in 6.2.5p26, so I think it's
referring to the alignment of a pointer object, not the alignment of
what it points to.

Correct.

I believe that an implementation in which pointer values are simple
byte addresses, small structs (struct { char c; }) have, say, 1-byte
alignment, and larger structs have, say, 4-byte alignment could be
conforming. Can you think of a concrete example where this would
cause problems?

No, that will not cause any problems. Many implementations do this,
including the one I'm typing this response from.

Assume in the following that sizeof(int) == 4.

On some hardware, an int can be accessed more efficiently if it's
aligned on a 4-byte boundary, but can still be accessed if it's merely
byte-aligned. An implementation could consistently choose to align
all declared int objects on 4-byte boundary, but use byte alignment
for struct members that are of type int (to save space). Or vice
versa.

I'm not saying that this would be a sensible thing to do, but it's
legal.

Right. It's an ABI issue, and many compilers have command line
switches that determine which ABI they target.

A quick example:

-malign-double
-mno-align-double
Control whether GCC aligns double, long double, and long long
variables on a two word boundary or a one word boundary. Aligning
double variables on a two word boundary will produce code that
runs somewhat faster on a Pentium at the expense of more memory.

Warning: if you use the -malign-double switch, structures cont-
aining the above types will be aligned differently than the
published application binary interface specifications for the 386
and will not be binary compatible with structures in code compiled
without that switch.
I suppose we're talking about two different meanings of "alignment
requirements": the alignment the compiler chooses to use for a given
type, and the alignment that's actually required by the hardware. The
former needs to be at least as strict as the latter, but it needn't be
the same.

Not necessarily, because the compiler is free to do byte loads of
unaligned types. For example, you may be able to tell your compiler
to use no padding in any struct type.


Mark F. Haigh
(e-mail address removed)
 
S

S.Tobias

Keith Thompson said:
In order for this to refer to the alignment of what they point to,
rather than the alignment of the pointer objects themselves, we'd have
to assume that in the phrase "the same representation and alignment
requirements", "representation" refers to the pointers (since there is
no representation for void), and "alignment requirements" refers to
the alignment of what's being pointed to. I suppose that's not
entirely unreasonable, since even in this sense the "alignment
requirements" of a pointer are arguably a property of the pointer
itself.
But we already know the alignment requirements of what void* and char*
point to: both types have to be able to refer to any byte in any
object. An additional explicit statement to that effect in 6.2.5p26
would be logically redundant (rather than just possibly unnecessary).
The only way 6.2.5p26 makes sense to me is if it refers to the
alignment requirements for char* and void* objects, not for what they
point to.

I think there is sense in talking about pointed to type requirements.

When we say that (the values of) "pointer to TYPE" have alignment
requirements, we establish set of valid values of the pointer type,
ie. some are valid pointer values (aligned), some are not (not aligned).
This automatically implies same requirement for placement of objects
of type TYPE, and also includes cases where a pointer might not point
to an object (NULL, end of array, malloc(0)), or TYPE is not complete.
(It is simply easier to speak of "pointer to TYPE (values)" alignment,
that "TYPE (object)" alignment, because it is more inclusive.)

I think that what the Standard does in 6.2.5p26 is that it formulates
set of valid values for type void* : they are valid/invalid, whenever
same value (thanks to same representation) of type char* is valid/invalid.

Invalid value of type void* could be created by casting from an integer.

And yes, I think char type can have alignment issues. Consider this
setup: Type char is composed of 4 octets. Type char* or void* is
the memory index of an octet where an object starts. The C language
conceptual model requires char objects to be on 4-octet boundary, and
so for char* values. I think it is clear, that not requiring the same
for void* values would break everything.
 
S

S.Tobias

S.Tobias said:
The C language
conceptual model requires char objects to be on 4-octet boundary, and

Well, not exactly C language, but for any purpose we can assume
that this alignment is required throughout the whole memory core
(by hardware, or merely by the implementation itself).
 
C

Chris Torek

Chris Torek said:
... a pointer to some type "T" ...
But note that if we store this pointer value in an object:
T *pointer_object;
we then have an *object*, not a variable. This object can have an
address:
T **p2 = &pointer_object;
[...]
Chris, did you mean to write that we have an object, not a *value*?
(T is both an object and a variable.)

T is a type-name (e.g., "typedef int T;" for instance -- or just
replace T with some simple type like char or int or double). Did
you mean "p2" rather than T here? I deliberately wanted p2 to be
an object holding a value of type "T **", so that I could then take
its address as well.
 
K

Keith Thompson

Chris Torek said:
Chris Torek said:
... a pointer to some type "T" ...
But note that if we store this pointer value in an object:
T *pointer_object;
we then have an *object*, not a variable. This object can have an
address:
T **p2 = &pointer_object;
[...]
Chris, did you mean to write that we have an object, not a *value*?
(T is both an object and a variable.)

T is a type-name (e.g., "typedef int T;" for instance -- or just
replace T with some simple type like char or int or double). Did
you mean "p2" rather than T here? I deliberately wanted p2 to be
an object holding a value of type "T **", so that I could then take
its address as well.

D'oh! Yes, I meant to say that p2 is both an object and a variable.

So, did you mean "we then have an *object*, not a variable", or did
you mean "we then have an *object*, not a value"?
 
P

pete

S.Tobias wrote:
And yes, I think char type can have alignment issues. Consider this
setup: Type char is composed of 4 octets.

What are you talking about?
sizeof(char) is one, always.
 
S

S.Tobias

What are you talking about?
sizeof(char) is one, always.

Yes, I didn't say it's not. sizeof(char)==1, CHAR_BIT==32, and on
C level you can access objects only with byte-resolution (ie. char).
The implementation must hide the sub-byte world from the programmer.
But I think it would be allowed to use hardware representation for
char*, which happens to be pointer to (index of) the first octet of the
pointed-to object (char in our case). When we add one to char* value,
this octet pointer (index) jumps by four. Conversions between char* and
an integer type just preserve and copy this value. Valid values of the
type char* might be 0x0, 0x4, 0x8, 0xc... etc. (char*)3 is a pointer
to char, but is invalid, because is not aligned properly to char type.

I think there is no problem with above implementation. The Standard
I believe doesn't put any restrictions on pointer representation,
and never says that char shall have no alignment.
 
C

Chris Torek

[I wrote, in part]
But note that if we store this pointer value in an object:
T *pointer_object;
we then have an *object*, not a variable. This object can have an
address:
T **p2 = &pointer_object;
[...]

After a bit of confusion on both our parts, in
article said:
D'oh! Yes, I meant to say that p2 is both an object and a variable.

So, did you mean "we then have an *object*, not a variable", or did
you mean "we then have an *object*, not a value"?

Ah, I see my error now. Yes, the second phrase is what I meant to
type in, but even that is not quite right either: what we have is
an object with no initial value. :) The variable named "pointer_object"
is indeed an object.
 
P

pete

S.Tobias said:
Yes, I didn't say it's not. sizeof(char)==1, CHAR_BIT==32, and on
C level you can access objects only with byte-resolution (ie. char).
The implementation must hide the sub-byte world from the programmer.
But I think it would be allowed to use hardware representation for
char*, which happens to be pointer to (index of)
the first octet of the
pointed-to object (char in our case). When we add one to char* value,
this octet pointer (index) jumps by four.
Conversions between char* and
an integer type just preserve and copy this value.
Valid values of the
type char* might be 0x0, 0x4, 0x8, 0xc... etc. (char*)3 is a pointer
to char, but is invalid, because is not aligned properly to char type.

I think there is no problem with above implementation. The Standard
I believe doesn't put any restrictions on pointer representation,
and never says that char shall have no alignment.

I'm still not understanding what you're saying.
If a char has the same representation as char*,
and sizeof(char*) is 4, then,
is each byte of the four bytes of your char*,
also composed of 4 octets?

Do you think a pointer to char should have any trouble
stepping through all the bytes of an object of type pointer to char?

/* BEGIN new.c */

#include <stdio.h>

int main(void)
{
char *object, *pointer;
size_t n;

n = sizeof object;
object = NULL;
pointer = (char *)&object;
while (n-- != 0) {
printf("byte = 0x%x\n", *(unsigned char *)pointer);
pointer++;
}
return 0;
}

/* END new.c */

I don't understand what alignment problems
you think that there can be.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,207
Latest member
Best crypto consultant

Latest Threads

Top