Alignment of automatic arrays

N

Noob

Hello,

Consider the following code, where I define an automatic array.

void foo(void)
{
uint8_t arr[64];
}

How will 'arr' be aligned?

I suppose 'arr' could start on an odd address?
(i.e. "aligned" to an uint8_t i.e. not aligned.)

If I need the buffer to be aligned to 32 bits, do I have to write
something along the lines of ...

void foo(void)
{
uint32_t temp[64/4];
uint8_t *arr = (uint8_t *)temp;
}

Is this valid C? (No UB / I-DB / general-nastyness)

Are things different if I malloc the array?
uint8_t *arr = malloc(64);

Regards.
 
J

Jens Thoms Toerring

Noob said:
Consider the following code, where I define an automatic array.
void foo(void)
{
uint8_t arr[64];
}
How will 'arr' be aligned?
I suppose 'arr' could start on an odd address?
(i.e. "aligned" to an uint8_t i.e. not aligned.)

Yes, since there's nothing that would force the compiler
to take care of the possibly more restrictive alignment
required for a larger type.
If I need the buffer to be aligned to 32 bits, do I have to write
something along the lines of ...
void foo(void)
{
uint32_t temp[64/4];
uint8_t *arr = (uint8_t *)temp;
}
Is this valid C? (No UB / I-DB / general-nastyness)

I don't see any UB in this and it should make sure that
the memory is aligned for uint32_t (if that differs from
the alignment for uint8_t depends of course on the system
you're using).
Are things different if I malloc the array?
uint8_t *arr = malloc(64);

Yes. Since malloc() doesn't know what the memory it returns
is going to be used for it must be aligned for the most
restrictive alignment on your machine - otherwise you
wouldn't be able to allocate memory for types requiring
such alignment.
Regards, Jens
 
B

Ben Bacarisse

Noob said:
Consider the following code, where I define an automatic array.

void foo(void)
{
uint8_t arr[64];
}

How will 'arr' be aligned?

I suppose 'arr' could start on an odd address?
(i.e. "aligned" to an uint8_t i.e. not aligned.)

You can't say much about the alignment (though if you look it may well
be aligned due to other factors -- it would be unwise to reply on this
at all).
If I need the buffer to be aligned to 32 bits, do I have to write
something along the lines of ...

void foo(void)
{
uint32_t temp[64/4];

uint32_t temp[64 / sizeof(uint32_t)];

is slightly neater. Temp is not declared until the end of the
full declarator so you can't use sizeof temp[0]. You could write:

uint32_t exemplar, temp[64 / sizeof exemplar];

but the extra object seems too messy just to avoid repeating the type.
uint8_t *arr = (uint8_t *)temp;
}

Another way is with a union:

union {
uint32_t align;
char temp[64];
} align_u;
uint8_t *arr = align_u.temp; /* or use align_u.temp directly */
Is this valid C? (No UB / I-DB / general-nastyness)

Looks fine though, of course, it aligns for a uint32_t not to 32 bits.
If your target has very lax alignment requirements temp could still
start at an arbitrary address. The only way to force some numeric
alignment is by doing arithmetic on a uintptr_t and hoping that the
implementation plays along. It's not clear is you want 32-bit
alignment or "whatever this machine needs for a 32-bit int".
Are things different if I malloc the array?
uint8_t *arr = malloc(64);

Yes, though all you know is that the memory is properly aligned for
any type -- that might be no more than byte aligned on some odd
machines.
 
R

Richard Bos

Noob said:
Consider the following code, where I define an automatic array.

void foo(void)
{
uint8_t arr[64];
}

How will 'arr' be aligned?

However your implementation chooses to.
I suppose 'arr' could start on an odd address?
(i.e. "aligned" to an uint8_t i.e. not aligned.)

Yes, that is certainly a possibility.
If I need the buffer to be aligned to 32 bits,

If you do, then as far as ISO C is concerned, you're out of luck. The
Standard makes no demands at all[1] about the alignment requirements of
objects.
do I have to write
something along the lines of ...

void foo(void)
{
uint32_t temp[64/4];
uint8_t *arr = (uint8_t *)temp;
}

Is this valid C? (No UB / I-DB / general-nastyness)

I'm not entirely sure. If you replaced uint8_t with unsigned char, I'd
be certain that it would be valid. If uint8_t exists at all, it must be
equivalent in all measures to unsigned char, but that does not
necessarily make it equally valid for casting-from-anything. I _think_
it does, and even if it doesn't your compiler writer would have to be
actively out to get you to make it fail, but without ferreting through
the Standard I can't claim with any pretense to authority that it is
also valid in cold theory.

However, you're still not guaranteed that your uint32_t is aligned on a
4-octet border. There is no feature in ISO C which can make that happen.
There are, however, different extension features in several C
implementations that do this or something very similar.

Richard

[1] Actually, strictly speaking it does, but not in a way that is useful
here. _If_ o is a well-aligned object of a certain type, then &o is
(naturally) a well-aligned pointer to that type, and so is (&o) + 1.
This is because in an array, there may not be padding _between_
members, only inside them.
This can be taken to mean that objects of type t must not require
more strict alignment than sizeof (t), but that still leaves the
implementation free to align the _first_ element of any array-of-t
to a larger (i.e., more strict) alignment than that, if it so
chooses. And, most importantly to the present problem, it may always
require less strict alignment than sizeof (anything but char et al).
 
J

James Kuyper

Noob said:
Hello,

Consider the following code, where I define an automatic array.

void foo(void)
{
uint8_t arr[64];
}

How will 'arr' be aligned?

Any array will be correctly aligned to store objects of it's element
type; that's the only thing you can know for sure about the alignment.
Since uint8_t can exist on a conforming implementation of C99 only if
CHAR_BIT == 8, sizeof(uint8_t) must be 1, and it therefore can't have
any alignment restrictions. Therefore, you can't determine anything
useful about the alignment of that array.
I suppose 'arr' could start on an odd address?
Correct.

If I need the buffer to be aligned to 32 bits, ...

Why? What are you going to do with it? I'm not just asking this question
out of curiosity - the right way to do this depends upon what you're
going to do with it; I strongly suspect that you're taking the wrong
approach.
... do I have to write
something along the lines of ...

void foo(void)
{
uint32_t temp[64/4];
uint8_t *arr = (uint8_t *)temp;
}

Is this valid C? (No UB / I-DB / general-nastyness)

There's no problems with this, in itself. However, if you need 32-bit
alignment for any reason other than storing objects of type uint32_t,
this might not be a good approach. If you are storing uint32_t objects,
why not just use an array of uint32_t, rather than a pointer of type
uint8_t?
Are things different if I malloc the array?
uint8_t *arr = malloc(64);

Yes. The pointer returned by malloc(), if not null, will point to memory
that is guaranteed to be correctly aligned to store objects of ANY type.
 
N

Noob

James said:
Noob said:
Consider the following code, where I define an automatic array.

void foo(void)
{
uint8_t arr[64];
}

How will 'arr' be aligned?

Any array will be correctly aligned to store objects of it's element
type; that's the only thing you can know for sure about the alignment.
Since uint8_t can exist on a conforming implementation of C99 only if
CHAR_BIT == 8, sizeof(uint8_t) must be 1, and it therefore can't have
any alignment restrictions. Therefore, you can't determine anything
useful about the alignment of that array.
I suppose 'arr' could start on an odd address?
Correct.

If I need the buffer to be aligned to 32 bits, ...

Why? What are you going to do with it? I'm not just asking this question
out of curiosity - the right way to do this depends upon what you're
going to do with it; I strongly suspect that you're taking the wrong
approach.

I'm using byte buffers (more precisely, octet buffers). I have to call a
low-level function to copy data from flash memory into my buffer.

For performance reasons, the low-level function needs the address of the buffer
to be a multiple of 4.

The function's prototype is:
void flashcopy(uint32_t *dest, long offset, long len)

I suppose (?) the (uint32_t *) is there to remind callers of the function that
dest must be aligned.

Should I try to change the API?

How do I (portably) get properly aligned octet buffers?

Regards.
 
T

Tim Prince

Jens said:
Yes. Since malloc() doesn't know what the memory it returns
is going to be used for it must be aligned for the most
restrictive alignment on your machine - otherwise you
wouldn't be able to allocate memory for types requiring
such alignment.

Somehow, you've avoided 32-bit Windows and linux, since the advent of
SSE. They come with non-portable wrappers for malloc(). I fear we will
be seeing a similar thing soon with the more frequent requirement for
32-byte alignments on 64-bit OS where malloc gives 16 byte alignment.
 
T

Tim Prince

Jens said:
Yes. Since malloc() doesn't know what the memory it returns
is going to be used for it must be aligned for the most
restrictive alignment on your machine - otherwise you
wouldn't be able to allocate memory for types requiring
such alignment.

Somehow, you've avoided 32-bit Windows and linux, since the advent of
SSE. They come with non-portable wrappers for malloc(). I fear we will
be seeing a similar thing soon with the more frequent requirement for
32-byte alignments on 64-bit OS where malloc gives 16 byte alignment.
 
J

jameskuyper

Noob said:
I'm using byte buffers (more precisely, octet buffers). I have to call a
low-level function to copy data from flash memory into my buffer.

For performance reasons, the low-level function needs the address of the buffer
to be a multiple of 4.

The function's prototype is:
void flashcopy(uint32_t *dest, long offset, long len)

I suppose (?) the (uint32_t *) is there to remind callers of the function that
dest must be aligned.

Should I try to change the API?

I don't think so. If the interface calls for a uint32_t*, the safest
thing to do is to allocate an array of that type, and pass a pointer
to that array into the function call. Declaring an array of uint8_t is
pointless; converting a pointer to the uint32_t array into a uint8_t*
is the right way to go, if you need to access the memory byte by byte.
 
J

jameskuyper

Tim said:
Somehow, you've avoided 32-bit Windows and linux, since the advent of
SSE. They come with non-portable wrappers for malloc(). I fear we will
be seeing a similar thing soon with the more frequent requirement for
32-byte alignments on 64-bit OS where malloc gives 16 byte alignment.

Jens is describing what the standard mandates for a conforming
implementation of the C standard library. You've described a malloc()
that does not qualify as conforming.

I use 32-bit linux at work, and 64-bit linux at home. I know little or
nothing about SSE; how could I determine whether those systems are
affected by this issue? What is the name, on linux systems, of the non-
portable malloc() wrapper? Could you give an example of code that
might fail, on systems where this issue applies, if malloc() is used
directly, rather than through the wrapper?
 
K

Keith Thompson

Noob said:
James said:
Noob wrote: [...]
If I need the buffer to be aligned to 32 bits, ...

Why? What are you going to do with it? I'm not just asking this question
out of curiosity - the right way to do this depends upon what you're
going to do with it; I strongly suspect that you're taking the wrong
approach.

I'm using byte buffers (more precisely, octet buffers). I have to call a
low-level function to copy data from flash memory into my buffer.

For performance reasons, the low-level function needs the address of
the buffer to be a multiple of 4.

The function's prototype is:
void flashcopy(uint32_t *dest, long offset, long len)

I suppose (?) the (uint32_t *) is there to remind callers of the
function that dest must be aligned.

Should I try to change the API?

How do I (portably) get properly aligned octet buffers?

The function takes a uint32_t* argument. Why not just give it one?
Declare your buffer as an array of uint32_t and pass it (or rather the
address of its first element) to flashcopy().

Are the offset and len arguments defined in terms of 32-bit words or
bytes? You may need to multiply and/or divide by 4 here and there.

How are you going to set the buffer contents? If it's being copied
from some other location, just use memcpy(). If you need to
initialize it in place, it might be reasonable to use type punning,
using either a union or pointer conversions.

Your code is likely not to be portable to systems with CHAR_BIT!=8,
but such systems aren't likely to support flashcopy() anyway.
 
T

Tim Prince

jameskuyper said:
Jens is describing what the standard mandates for a conforming
implementation of the C standard library. You've described a malloc()
that does not qualify as conforming.

I use 32-bit linux at work, and 64-bit linux at home. I know little or
nothing about SSE; how could I determine whether those systems are
affected by this issue? What is the name, on linux systems, of the non-
portable malloc() wrapper? Could you give an example of code that
might fail, on systems where this issue applies, if malloc() is used
directly, rather than through the wrapper?
?mm_malloc(), ?aligned_malloc(),...
Beats me why they didn't at least prototype them with size_t types.
Some of the mis-aligned data types (double without 8-byte alignment,
long double without 16-byte alignment) are accepted with a
hardware-dependent performance penalty.
I suspect proposals for changing linux-32 ABI may still be in the works.
Try setting -O3 to compile a gcc function where that produces
auto-vectorization and data on stack, but with -Os set to compile the
function above it in the call stack, or with data allocated by 32-bit
standard malloc(). 50/50 chance of failure on x86_64, 75/25 chance on
32-bit linux.
 
J

James Kuyper

Tim said:
?mm_malloc(), ?aligned_malloc(),...

No function with either name seems to be available on either of the
platforms I described; at least, the "man" program doesn't know anything
about them.
Beats me why they didn't at least prototype them with size_t types.
Some of the mis-aligned data types (double without 8-byte alignment,
long double without 16-byte alignment) are accepted with a
hardware-dependent performance penalty.
I suspect proposals for changing linux-32 ABI may still be in the works.
Try setting -O3 to compile a gcc function where that produces
auto-vectorization and data on stack, but with -Os set to compile the
function above it in the call stack, or with data allocated by 32-bit
standard malloc(). 50/50 chance of failure on x86_64, 75/25 chance on
32-bit linux.

I'm not entirely clear what the test code would look like. How would I
verify a failure to allocate with the correct alignment?

The problem, as you originally described it, is that malloc() aligns to
16-bits, when it should obey stricter alignment restrictions. Now you're
referring to the "32-bit standard malloc()". Apparently, there's two
different malloc()s? I would not be in the least surprised about the
failure of any program containing a module built to work with one
version of the standard library, but linked to a different version of
the standard library. Is that what you're talking about?
 
E

Eric Sosman

Noob said:
[...]
How do I (portably) get properly aligned octet buffers?

Here's one way:

union {
unsigned char buff[BUFSIZE];
int alignment_forcer;
} x;

.... and now x.buff will be an array of the desired size starting
on an address suitable for an int. Change alignment_forcer to
long double or unsigned long long or whatever you like, and you
can get x.buff aligned to match.

Note that this works only if you can state the alignment
requirement as "The same as T" for some type T. If you need
"page alignment" or "cache line alignment" or something of that
kind, which don't correspond to any T, you're going to have to
give up portability.
 
C

Chris M. Thomasson

Eric Sosman said:
Noob said:
[...]
How do I (portably) get properly aligned octet buffers?

Here's one way:

union {
unsigned char buff[BUFSIZE];
int alignment_forcer;
} x;

... and now x.buff will be an array of the desired size starting
on an address suitable for an int. Change alignment_forcer to
long double or unsigned long long or whatever you like, and you
can get x.buff aligned to match.

Note that this works only if you can state the alignment
requirement as "The same as T" for some type T. If you need
"page alignment" or "cache line alignment" or something of that
kind, which don't correspond to any T, you're going to have to
give up portability.


FWIW, here is some code that might be able to help out if one needs such
alignments:

http://groups.google.com/group/comp.arch/msg/e86c4ec2f7c19a0a


You can use it like:
________________________________________________________________
#include <ralloc.h> /* http://pastebin.com/f37a23918 */
#include <assert.h>


#define L2CACHE 64
#define PAGESZ 8192
#define L2PERPAGE (PAGESZ / L2CACHE)


struct foo
{
int a, b, c, d;
};


union foo_l2cache
{
struct foo self;
char l2pad[L2CACHE];
};


struct foo_page
{
union foo_l2cache foo[L2PERPAGE];
};


typedef char static_assert
[
sizeof(union foo_l2cache) == L2CACHE &&
sizeof(struct foo_page) == PAGESZ &&
sizeof(struct foo_page) / L2CACHE == L2PERPAGE
? 1 : -1
];


int main(void)
{
size_t i;
struct region region;
struct foo_page* page;
char raw_buffer[(PAGESZ * 2) - 1] = { '\0' };

rinit(&region, raw_buffer, sizeof(raw_buffer));

/* allocate a page aligned buffer. */
page = rallocex(&region, sizeof(*page), sizeof(*page));

assert(RALLOC_ALIGN_ASSERT(page, sizeof(*page)));

/* each `&page->self' address is aligned on L2 cache line. */
for (i = 0; i < L2PERPAGE; ++i)
{
union foo_l2cache* l2cache = page->foo + i;

assert(RALLOC_ALIGN_ASSERT(&l2cache->self, sizeof(l2cache->self)));
}

return 0;
}

________________________________________________________________



Could be fairly useful (e.g., low-level multi-threaded programming)... Or
even in embedded systems that cannot use `malloc()' and friends.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top