Array assignment via struct

Netocrat · Aug 6, 2005

Netocrat said:
Netocrat said:

Jack Klein wrote:
<snip>
/* Assign array via struct */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define LEN 20

typedef struct {
char a[LEN];
} S;

int main(void) {
S sa;
char A[LEN];

S *ps = malloc(LEN);

Click to expand...

It's possible, but unlikely, that sizeof(S) > LEN due to padding. Better
to use sizeof(S) than LEN.

char *pa = malloc(LEN);

strcpy(sa.a, "Joe Wright Rocks");
puts(sa.a);

*(S*)A = sa; [snip]
Here is where you invoke undefined behavior, since A isn't dynamically
allocated. There is no guarantee that A meets the alignment
requirements for an S. The compiler might generate code that assumes
that A is, causing some sort of trap on some platforms, or possible
misaligned data or overwriting the destination array.

Click to expand...

Click to expand...

Given that element a must be located at the start of struct S, and that it
is a char array of size LEN, it's hard to see how it could be aligned
differently to the char array A of size LEN. Are you referring to this
specific case or in general? If this case, could you explain how the
standard allows the alignments to be different?

Click to expand...

Type `char' has no alignment (ie. alignment(char) == 1), of course,
but at issue is not `char', but rather `char[10]'. Long time ago

Actually Joe's code #define's LEN to 20, you're thinking of the OP.

(don't ask me for details now) I read that on DEC stations character
arrays in structs could have different alignments depending on their
size, so for example `char[15]' could have different alignment than
`char[31]'. All this was for purpose of memory access speed; ordinarily
`char[ANY]' doesn't have alignment (at least when ANY is a prime number,
for others I don't know), but when in a struct, a compiler
could assume that the array is positioned at a "fast" location and
generate more optimal code. (BTW, the discussion in which I read it
was about why struct-hack didn't work.)

Well you've confirmed that it's not merely hypothetical - padding
actually is added in some real-world implementations. So to expand on
Jack's explanation of specific code being generated, perhaps something
like this:

4 padding bytes are added after the array of 20 char in the struct so
that it can be placed on an 8-byte boundary. The compiler generates
code to retrieve the elements of the array 8-bytes at a time and unaligned
access to 8-byte-wide data on this particular implementation is not
allowed.

The automatic char[20] variable A is not aligned on an 8-byte boundary, so
when it's accessed through the struct, unaligned access occurs and our
implementation spits the dummy.

So Joe - no go. Thou code be fraught.

He meant "Chapter & Verse". :-D

The & did seem a little out of place...

Joe Wright · Aug 6, 2005

Tim said:
Structures can have alignment requirements that
are different from those of their members, and
here's a possible reason why they would.

On platforms where a 'char *' is a different
format and/or wider than an 'int *', an
implementation might choose to make all structs
be 'int' aligned, so that structure pointers
would be easier to deal with.

So a structure holding a character array would
still need 'int' alignment, even though the
contained character array would need only 'char'
alignment.

The alignment requirement also implies a sizing
requirement, since alignment_of(T) must evenly
divide 'sizeof(T)'. That's why a struct that
holds only a character array might be bigger
than the character array it holds.

You just made all that up didn't you?

Joe Wright · Aug 6, 2005

Netocrat said:
Netocrat said:

On Fri, 05 Aug 2005 21:52:07 -0700, Krishanu Debnath wrote:

Jack Klein wrote:

Click to expand...

<snip>

/* Assign array via struct */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define LEN 20

typedef struct {
char a[LEN];
} S;

int main(void) {
S sa;
char A[LEN];

S *ps = malloc(LEN);

It's possible, but unlikely, that sizeof(S) > LEN due to padding. Better
to use sizeof(S) than LEN.

char *pa = malloc(LEN);

strcpy(sa.a, "Joe Wright Rocks");
puts(sa.a);

*(S*)A = sa;
[snip]

Here is where you invoke undefined behavior, since A isn't dynamically
allocated. There is no guarantee that A meets the alignment
requirements for an S. The compiler might generate code that assumes
that A is, causing some sort of trap on some platforms, or possible
misaligned data or overwriting the destination array.

Click to expand...

Given that element a must be located at the start of struct S, and that it
is a char array of size LEN, it's hard to see how it could be aligned
differently to the char array A of size LEN. Are you referring to this
specific case or in general? If this case, could you explain how the
standard allows the alignments to be different?

Click to expand...

Type `char' has no alignment (ie. alignment(char) == 1), of course,
but at issue is not `char', but rather `char[10]'. Long time ago

Click to expand...

Actually Joe's code #define's LEN to 20, you're thinking of the OP.

(don't ask me for details now) I read that on DEC stations character
arrays in structs could have different alignments depending on their
size, so for example `char[15]' could have different alignment than
`char[31]'. All this was for purpose of memory access speed; ordinarily
`char[ANY]' doesn't have alignment (at least when ANY is a prime number,
for others I don't know), but when in a struct, a compiler
could assume that the array is positioned at a "fast" location and
generate more optimal code. (BTW, the discussion in which I read it
was about why struct-hack didn't work.)

Click to expand...

Well you've confirmed that it's not merely hypothetical - padding
actually is added in some real-world implementations. So to expand on
Jack's explanation of specific code being generated, perhaps something
like this:

4 padding bytes are added after the array of 20 char in the struct so
that it can be placed on an 8-byte boundary. The compiler generates
code to retrieve the elements of the array 8-bytes at a time and unaligned
access to 8-byte-wide data on this particular implementation is not
allowed.

The automatic char[20] variable A is not aligned on an 8-byte boundary, so
when it's accessed through the struct, unaligned access occurs and our
implementation spits the dummy.

So Joe - no go. Thou code be fraught.

I think not. Consider..

struct {
char a[17];
} sa;

...and explain any case for sizeof sa not being 17. Annecdotes of long
forgotten DEC Stations don't count.

Chris Torek · Aug 7, 2005

You just made all that up didn't you?

He may well have made it up. But it was in fact the case on
the Data General MV/10000 (Eclipse), as I recall.

The Eclipse did actually have separate "word pointers" and "byte
pointers". Most pointers were word pointers; "char *" (and thus
"void *", had it existed) used byte pointers. To convert from byte
to word pointer, you shifted right one bit, discarding the byte-offset
and introducing a zero bit at the top (in the "indirect" bit that
appeared only in word pointers). To convert a word pointer to a
byte pointer, you shifted left one bit, discarding the top (indirect)
bit and introducing a zero bit at the bottom -- so that the resulting
byte pointer pointed to the first, even-numbered byte of the two
bytes that made up each word.

This machine exposed an awful lot of code-conformance problems,
even before the C standard existed.

We had one at the University
of Maryland in the mid-1980s, before the 1989 C standard came out.

Netocrat · Aug 7, 2005

Netocrat said:
Netocrat said:

Jack Klein wrote:
Joe Wright wrote:

Click to expand...

#define LEN 20

typedef struct {
char a[LEN];
} S;

int main(void) {
S sa;
char A[LEN];

S *ps = malloc(LEN);

Click to expand...

*(S*)A = sa;

[snip]

Here is where you invoke undefined behavior, since A isn't dynamically
allocated. There is no guarantee that A meets the alignment
requirements for an S. The compiler might generate code that assumes
that A is, causing some sort of trap on some platforms, or possible
misaligned data or overwriting the destination array.

Click to expand...

Click to expand...

So to expand on
Jack's explanation of specific code being generated, perhaps something
like this:

4 padding bytes are added after the array of 20 char in the struct so
that it can be placed on an 8-byte boundary. The compiler generates
code to retrieve the elements of the array 8-bytes at a time and unaligned
access to 8-byte-wide data on this particular implementation is not
allowed.

The automatic char[20] variable A is not aligned on an 8-byte boundary, so
when it's accessed through the struct, unaligned access occurs and our
implementation spits the dummy.

So Joe - no go. Thou code be fraught.

Click to expand...

Correction: thy code be fraught. Thou codest flawed source.

I think not. Consider..

struct {
char a[17];
} sa;

..and explain any case for sizeof sa not being 17. Annecdotes of long
forgotten DEC Stations don't count.

Why not? Were they not valid C implementation hosts?

That's a contrived choice because being (2 pow 4) + 1 it's impossible to
minimise accesses. Using my example above, consider a 20-byte array
accessed in 8-byte chunks. When aligned on an 8-byte (or 4-byte)
boundary, it will take 3 accesses to read/write the entire array in
8-byte chunks. It will take 4 accesses to do the same when its alignment
is 1, 2 or 3 bytes off an 8-byte alignment. That's a supportable reason
for properly aligning the struct on an 8-byte boundary and hence requiring
4 padding bytes.

As for your 17 byte example, well, this implementation may pad out 7
bytes but more likely it would pad 3 and it would access 4 bytes at a time
on a 4-byte boundary.

Totally hypothetical but for all I know (I don't have a lot of varied
hardware experience) a machine like this does exist. Not the machine that
I work from though (Intel P4) because unaligned access whilst slower is
not an error.

Tim Rentsch · Aug 7, 2005

Joe Wright said:
You just made all that up didn't you?

In fact, I didn't. I read about such platforms here
in comp.lang.c.

For example, consider a machine that addresses 64-bit
words natively. Pointers and ints are both 64 bits,
and use word addresses. A 64-bit word holds 8
eight-bit char's; a pointer to char uses a word
address, but puts the three bits that indicate which
char within the word in the high order bits of the
64-bit pointer. I'm doing this from memory, so I may
have some of the details wrong; however, other people
have written about C implementations on actual machines
that are very much like this.

It would be very natural on such a machine to have
all struct's be multiples of 8 in size, and aligned
on word boundaries.

Chris Croughton · Aug 7, 2005

Netocrat said:
Netocrat said:

<snip>

/* Assign array via struct */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define LEN 20

typedef struct {
char a[LEN];
} S;

int main(void) {
S sa;
char A[LEN];

S *ps = malloc(LEN);

It's possible, but unlikely, that sizeof(S) > LEN due to padding. Better
to use sizeof(S) than LEN.

Click to expand...

What padding could there be? S is essentially a char array.

Click to expand...

Yeah, that's why I called it a pedantic point later in the post. Probably
the DS9000 is the only implementation to include padding. Anyhow you lose
nothing by using sizeof(S) instead of LEN and you are assured of
compliance.

No, it isn't a pedantic point, there are many systems where a struct is
rounded up in length to the "worst case~ alignment size. In the case
given, it probably won't happen all that often because LEN is 20 which
is a multiple of 4 (although certain 64 bit machines may need alignment
to 8 byte boundaries). If LEN were an odd number a lot of systems would
round the size up to at least the nearest even number.

As above - padding. I wrote that it might be added for performance
reasons. I don't know if such reasons legitimately exist on a real-world
implementation (I can contrive a far-fetched hypothetical implementation
where they do), but you never know what code an optimising compiler is
going to generate.

Or a non-optimising one. A fully optimising compiler might notice that
the only thing in the structure is a char array, and hence generate a
structure of length LEN, where a non-optimising one would "play it safe"
by making sure that it is rounded up to a safe alignment.

If there are non-char elements in the structure, of course, the size
will always be rounded up to the worst alignment needed by any field in
the structure. This is because it could be used as an array, and the
array accesses must be correctly aligned.

The only portable way to do malloc is to use the sizeof the actual thing
being allocated:

char *pa = malloc(sizeof(S));

or preferably

S *ps = malloc(sizeof(*ps));

Chris C

Lawrence Kirby · Aug 8, 2005

....

All of S is an array of char. What alignment requirements might there be
for an S? None. Structures don't have alignment requirements, their
members do. What are the alignment requirements of a char array?

Any object type can have alignment requirements. A structure's alignment
requirements must meet the requirements of all of its members, but there's
nothing to stop it being stricter. The reason for doing this is speed,
word aligned access can be faster even for smaller objects. Consider for
example optimised strcpy() memcpy() etc. code that operates a word at a
time.

You 'strongly dislike people' who try to get 'clever' with C in a
newsgroup posting? Boy, are you tough.

When the "clever" method is obscure and possibly wrong (or not easy to
prove correct) and there is "dumb" simple, clear and correct method
available I'd have to agree.

I thought you'd get me for not checking the malloc() returns and not
free()ing ps and pa before exit. You never know your luck.

There's that too.

Lawrence

Keith Thompson · Aug 8, 2005

Chris Croughton said:
On Sun, 07 Aug 2005 00:24:58 +1000, Netocrat

No, it isn't a pedantic point, there are many systems where a struct is
rounded up in length to the "worst case~ alignment size. In the case
given, it probably won't happen all that often because LEN is 20 which
is a multiple of 4 (although certain 64 bit machines may need alignment
to 8 byte boundaries). If LEN were an odd number a lot of systems would
round the size up to at least the nearest even number.

For example, given:

struct foo {
char s[3];
};

it would make sense on many platforms to pad struct foo to 4 bytes and
require 4-byte alignment. That way, assigning a struct foo or passing
it as an argument could be done with a single 4-byte instruction, just
as for a 32-bit (assuming CHAR_BIT==8) integer.

On the other hand, an implementer might decide that copying structures
is rare enough that the extra padding isn't worthwhile. <OT>gcc
doesn't add extra padding, at least by default, at least on the one
platform where I tried this.</OT>

[snip]

Or a non-optimising one. A fully optimising compiler might notice that
the only thing in the structure is a char array, and hence generate a
structure of length LEN, where a non-optimising one would "play it safe"
by making sure that it is rounded up to a safe alignment.

But note that the non-optimizing and fully optimizing compilers in
practice probably can't be the same compiler in different modes.
Given the way most compilers are invoked, you usually want to have the
same data layout in all modes, since a program can be built from
translation units that were compiled in different modes. (Or the
linker can forbid linking units compiled in different modes, but that
makes things more complicated.)

Keith Thompson · Aug 8, 2005

Tim Rentsch said:
In fact, I didn't. I read about such platforms here
in comp.lang.c.

For example, consider a machine that addresses 64-bit
words natively. Pointers and ints are both 64 bits,
and use word addresses. A 64-bit word holds 8
eight-bit char's; a pointer to char uses a word
address, but puts the three bits that indicate which
char within the word in the high order bits of the
64-bit pointer. I'm doing this from memory, so I may
have some of the details wrong; however, other people
have written about C implementations on actual machines
that are very much like this.

Yes, Cray vector machines (at least the ones I've used) are like that.

It would be very natural on such a machine to have
all struct's be multiples of 8 in size, and aligned
on word boundaries.

In fact, I just tried the following program on a Cray Y-MP:

#include <stdio.h>
int main(void)
{
struct foo {
char s[3];
};
printf("sizeof(struct foo) = %d\n", (int)sizeof(struct foo));
return 0;
}

The output was:

sizeof(struct foo) = 8

Joe Wright · Aug 8, 2005

Lawrence said:
Any object type can have alignment requirements. A structure's alignment
requirements must meet the requirements of all of its members, but there's
nothing to stop it being stricter. The reason for doing this is speed,
word aligned access can be faster even for smaller objects. Consider for
example optimised strcpy() memcpy() etc. code that operates a word at a
time.

When the "clever" method is obscure and possibly wrong (or not easy to
prove correct) and there is "dumb" simple, clear and correct method
available I'd have to agree.

There's that too.

Lawrence

Ok, I give up.

Too clever I suppose. Except for this thread, I don't think I've ever
done that: disguise an array as a struct so that it can be assigned to
or used as a value to be assigned to a struct.

Kieth, Stan, Netocrat and Tim notwithstanding, when Chris Torek and
Lawrence Kirby tell me I'm all wet, I'm wet.

It's not just C, I love this group too.

Chris Croughton · Aug 8, 2005

Chris Croughton said:
Chris Croughton said:

On Sun, 07 Aug 2005 00:24:58 +1000, Netocrat

No, it isn't a pedantic point, there are many systems where a struct is
rounded up in length to the "worst case~ alignment size. In the case
given, it probably won't happen all that often because LEN is 20 which
is a multiple of 4 (although certain 64 bit machines may need alignment
to 8 byte boundaries). If LEN were an odd number a lot of systems would
round the size up to at least the nearest even number.

Click to expand...

For example, given:

struct foo {
char s[3];
};

it would make sense on many platforms to pad struct foo to 4 bytes and
require 4-byte alignment. That way, assigning a struct foo or passing
it as an argument could be done with a single 4-byte instruction, just
as for a 32-bit (assuming CHAR_BIT==8) integer.

Yes, efficiency of generated code is one of the reasons for doing it.
On the other hand, even a compiler which generates memcpy() (or its
assembler equivalent) for all copies might still round it up because it
rounds up all structures "just in case" (the standard doesn't say that
it can't). And of course on a word based machine it may only be able to
allocate chunks in multiple bytes anyway.

On the other hand, an implementer might decide that copying structures
is rare enough that the extra padding isn't worthwhile. <OT>gcc
doesn't add extra padding, at least by default, at least on the one
platform where I tried this.</OT>

<OT>
This may be a feature of gcc, it does it both on x86 (Debian Linux) and
on a MicroVAX 3100/M40 running OpenBSD (unfortunately my uVAX 3100/90
running OpenVMS with the Digital C compliler isn't working at the
moment, and I don't have any Sun Sparcs or Digital Alpha machines online
at present). Ye gods, the uVAX is slow (5 minutes to test 32MB RAM
gives an idea)...

[snip]

Or a non-optimising one. A fully optimising compiler might notice that
the only thing in the structure is a char array, and hence generate a
structure of length LEN, where a non-optimising one would "play it safe"
by making sure that it is rounded up to a safe alignment.

Click to expand...

But note that the non-optimizing and fully optimizing compilers in
practice probably can't be the same compiler in different modes.
Given the way most compilers are invoked, you usually want to have the
same data layout in all modes, since a program can be built from
translation units that were compiled in different modes. (Or the
linker can forbid linking units compiled in different modes, but that
makes things more complicated.)

Yes, good point, although I've known compilers which generated
incompatible code when optimising for space vs. speed (sometimes putting
parameters in registers in one mode and not the other, for instance).
I've known gcc to have problems with optimised code interfacing to
non-optimised code, although that may have been a bug (but not forbidden
by the standard)...

Chris C

Eric Laberge · Aug 8, 2005

Regarding my post:

Thanks to all who replied to me, I didn't expected such a number of (highly
interresting) answers.

I mostly note that padding and alignment could cause the implementation to
be not portable, so I'll do it differently.

Why I wanted to do this this way was because I had to implement some kind of
side effect to assignments, and, like C, assignments are nothing more than
binary operators that return a value, so a=b=c+d is effectively
a=(b=(c+d)). Seeing the gold mine of knowledge here, I know you all know
that. Now I know, that's why functions like memcpy returns the destination
pointer, so that this can be done too, but I had to also call another
function concurrently, and I looked for an easy way to achieve what I had
in mind.

My final solution, and I feel somewhat ashamed for having thought of this so
late, is simply to wrap up the memcpy and the side-effect functions in an
inline function, so code-generation will still be easilly done and *much*
more cleaner. Clean code, no matter if it's human or computer generated, is
essential for me too. "Better safe than sorry".

Thanks again,

Chris Croughton · Aug 8, 2005

Kieth, Stan, Netocrat and Tim notwithstanding, when Chris Torek and
Lawrence Kirby tell me I'm all wet, I'm wet.

When Chris Torek writes something, I read it, because not only is he
almost always[0] right but he also explains it so that I know /why/ he
is right and without being patronising or putting my back up. A lot of
people here (and I include myself) aren't that good at saying "You're
wrong" without offending people...

[0] Actually, always as far as I remember except for the Fortran program

where he got the label on the wrong line said:
It's not just C, I love this group too.

Well, I'm still here. Although I have killfiled a lot of the obvious
trolls (a couple of generic rules on gmail and yahoo addresses, with
certain exceptions[1], does wonders for the signal to noise ratio)...

[1] Since I use a scorefile rather than a killfile, I score at -999
which means that (a) if they respond to my posts they get 'unkilled' for
that response and (b) they are still visible so I can check if anyone
has been caught who shouldn't have been caught and can put them in the
exceptions list.

Chris C

pete · Aug 8, 2005

Eric said:
Regarding my post:

Thanks to all who replied to me,
I didn't expected such a number of (highly
interresting) answers.

I mostly note that padding and alignment could
cause the implementation to
be not portable, so I'll do it differently.

I don't get it. The code I posted here:

http://groups-beta.google.com/group/comp.lang.c/msg/084ec4e5817f3f78?hl=en&

is portable code.

Lawrence Kirby · Aug 9, 2005

I don't get it. The code I posted here:

http://groups-beta.google.com/group/comp.lang.c/msg/084ec4e5817f3f78?hl=en&

is portable code.

If you want to use a structure containing an array instead of an array in
your code that's fine. There may be problems if:

1. there is code that already creates arrays that are not structure
wrapped, and more seriously:

2. the size of the array is not known at compile time.

Lawrence

Dave Thompson · Aug 14, 2005

Given that element a must be located at the start of struct S, and that it
is a char array of size LEN, it's hard to see how it could be aligned
differently to the char array A of size LEN. Are you referring to this
specific case or in general? If this case, could you explain how the
standard allows the alignments to be different?

Except that certain pairs like qualified and unqualified pointers to
the same type must be the same, the standard allows the alignment of
anything except char to be anything the implementation wants, though
it nonnormatively expects transitivity, footnote 57 to 6.3.2.3p7.

"Classic" Tandem^WCompaq^WHP NonStop, still supported in emulation, is
(was) 16-bit-word (=2 x 8-bit byte) oriented, and requires basic types
above char (>= short) to be word aligned, and _all_ struct (and union)
even if they contain only char(s). Hence a char [N] might sometimes
not be word-aligned while a struct { char x [N] } must. Although, on
that implementation a "top-level" array variable is always allocated
word-aligned, so the case _given here_ was OK.

- David.Thompson1 at worldnet.att.net

Array of structs function pointer	10	Jul 16, 2023
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
Assigning an array to another array using C's assignment operator	0	Feb 1, 2013
Assigning an array to another array using C's assignment operator	13	Jan 31, 2013
Assigning an array to another array using C's assignment operator	1	Feb 1, 2013
Assigning an array to another array using C's assignment operator	0	Feb 1, 2013
Fibonacci	0	May 13, 2023
Adding adressing of IPv6 to program	1	Feb 16, 2023

Array assignment via struct

Netocrat

Joe Wright

Joe Wright

Chris Torek

Netocrat

Tim Rentsch

Chris Croughton

Lawrence Kirby

Keith Thompson

Keith Thompson

Joe Wright

Chris Croughton

Eric Laberge

Chris Croughton

pete

Lawrence Kirby

Dave Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads