unaligned pointer access

Sven Köhler · Sep 11, 2013

Hi,

I'm currently trying to find out which possibilities exist to access a
4byte aligned int64_t. I know, that I could declare a union of int64_t
and two int32_t and copy each int32_t independently to the struct. This
works. But the assembly code that gcc generates is - well - not optimal.
I tried what happened if I simply cast 4byte aligned pointer to an
int64_t pointer. I know that that's basically forbidden and isn't
portable. But the assembly code was much nice. Also, my primary target
architecture (arm 32bit) has a "load double word" instruction, that only
works with 8byte aligned pointers. So obviously, I was just lucky gcc
didn't end up using the "load double word" instruction.

Currently, I'm experimenting with packed structs. Consider the following :

typedef struct __attribute__((packed)) {
int64_t x;
} s1;

The code uses a gcc extension (the packed attribute). Given a variable
s1 *p, gcc loads the value of p->x byte-wise (i.e. using arm's ldrb
instruction). This seems strange to me because according to the
documentation, the packed attribute ensures that no alignment padding is
used. But first of all, sizeof s1 is equal to 8. Secondly, the member x
is at offset 0 of the struct. So it seems to me, that one could not
obtain any pointers of type s1* without cheating heavily (e.g. by
casting unaligned pointers to s1*). clang also loads p->x byte by byte.

Am I wrong and the compiler must expect that a pointer to s1 may have an
alignment less than 8? If yes, then a packed struct is exactly what I'm
looking for. I already tested the following:

typedef struct __attribute__((packed)) {
__attribute__((aligned(4))) int64_t x;
} s4;

Both gcc and clang load the value of member x word-wise, half-word-wise,
or byte-wise depending on whether the aligned attribute indicates 4, 2,
or 1-byte alignment.

Using the the packed attribute without a struct or specifying the
alignment attribute didn't work. The packed attribute does only seem to
be for structs. The alignment attribute can only increase but not
decrease the alignment.

Any thoughts?

I'm not afraid of using gcc extension, but the code should be portable.

Regards,
Sven

Eric Sosman · Sep 11, 2013

[...]

Currently, I'm experimenting with packed structs. Consider the following :

typedef struct __attribute__((packed)) {
int64_t x;
} s1;

[...]

typedef struct __attribute__((packed)) {
__attribute__((aligned(4))) int64_t x;
} s4;

[...]

I'm not afraid of using gcc extension, but the code should be portable.

What do you mean by "portable?"

Sven Köhler · Sep 11, 2013

What do you mean by "portable?"

The C code should not make any assumptions about the architecture that
the code is compiled for. For example, x86 supports unaligned access,
for example. Hence, if your program is intended for x86 only, then you
might cast an int32_t* into int64_t* without any worries. (Not sure, if
that is completely true.)

James Kuyper · Sep 11, 2013

The C code should not make any assumptions about the architecture that
the code is compiled for. ...

But making use of gcc extensions assumes that there's an implementation
of gcc for that architecture. That's a pretty good bet, but it's not
always the case. A definition of "portable" that is intended to allow
the use of gcc extensions should acknowledge that fact explicitly: "The
C code should not make any assumptions about the architecture that the
code is compiled for, other than assuming that an implementation of gcc
for that architecture is available and will be used."

James Kuyper · Sep 11, 2013

Hi,

I'm currently trying to find out which possibilities exist to access a
4byte aligned int64_t. ...

What's wrong with accessing it as int64_t? That's the simplest way, and
any compiler conforming to C99 where it's not also the most efficient
way is poorly implemented.

... I know, that I could declare a union of int64_t
and two int32_t and copy each int32_t independently to the struct. This
works. But the assembly code that gcc generates is - well - not optimal.
I tried what happened if I simply cast 4byte aligned pointer to an
int64_t pointer. I know that that's basically forbidden and isn't

The C standard never forbids any kind of code - it just says that, in
some cases, the behavior of your code is not defined by the C standard.
This is one example of that. That's nothing inherently wrong with that.
There could be something other than the C standard which does define the
behavior (such as the POSIX standard or the documentation of your
compiler). If the only places where your code needs to work are places
where such a guarantee applies, then it can be reasonable, and in some
cases even necessary, to write such code.

However, if no such guarantees apply to every system where your code
needs to be usable, then you should not write such code. If you're
unlucky, it may work the way you expected; if you're lucky, it will fail
catastrophically, so you'll learn not to write such code.

portable. But the assembly code was much nice. Also, my primary target
architecture (arm 32bit) has a "load double word" instruction, that only
works with 8byte aligned pointers. So obviously, I was just lucky gcc
didn't end up using the "load double word" instruction.

Currently, I'm experimenting with packed structs. Consider the following :

typedef struct __attribute__((packed)) {
int64_t x;
} s1;

The code uses a gcc extension (the packed attribute). Given a variable
s1 *p, gcc loads the value of p->x byte-wise (i.e. using arm's ldrb
instruction). This seems strange to me because according to the
documentation, the packed attribute ensures that no alignment padding is
used. But first of all, sizeof s1 is equal to 8. Secondly, the member x
is at offset 0 of the struct. So it seems to me, that one could not
obtain any pointers of type s1* without cheating heavily (e.g. by
casting unaligned pointers to s1*). clang also loads p->x byte by byte.

Any questions about how a gcc-specific feature works are best asked in a
forum that is also gcc-specific, because you'll get more reliable
answers there. This is NOT such a forum. Similarly for clang.

Am I wrong and the compiler must expect that a pointer to s1 may have an
alignment less than 8? If yes, then a packed struct is exactly what I'm
looking for. I already tested the following:

typedef struct __attribute__((packed)) {
__attribute__((aligned(4))) int64_t x;
} s4;

In C2011, a new feature was introduced, called _Alignas(). I can't be
sure, but I suspect that __attribute__((aligned(4))) might be equivalent
to _Alignas(4). So as soon as C2011 compiler become sufficiently common
(don't hold your breath while waiting for that to happen), you could use
_Alignas() instead, and this is the right place to ask questions about
_Alignas(). Use of _Alignas(4) should enable use of the "load double
word" instruction, and decent quality compilers can reasonably be
expected to do so, where appropriate. However, such use is not mandated
by the C standard - that's entirely a matter of "Quality of
Implementation" (or QoI), which is outside the scope of the standard.

Both gcc and clang load the value of member x word-wise, half-word-wise,
or byte-wise depending on whether the aligned attribute indicates 4, 2,
or 1-byte alignment.

Using the the packed attribute without a struct or specifying the
alignment attribute didn't work. The packed attribute does only seem to
be for structs. The alignment attribute can only increase but not
decrease the alignment.

While not a feature of C itself, packing is a commonplace extension, but
it's usually about making sure that there's no padding bytes between
members of a struct. It's pretty much meaningless for single-member structs.

....

I'm not afraid of using gcc extension, but the code should be portable.

Portable is a matter of degree. gcc extensions will work on gcc, and gcc
is widely available, so in that sense they can be fairly portable.
However, those extension are not guaranteed to work with any other
compiler, so in that sense they're unportable. You'll have to determine
precisely what "should be portable" means to you - for me, it certainly
wouldn't include gcc extensions.

Sven Köhler · Sep 11, 2013

But making use of gcc extensions assumes that there's an implementation
of gcc for that architecture. That's a pretty good bet, but it's not
always the case. A definition of "portable" that is intended to allow
the use of gcc extensions should acknowledge that fact explicitly: "The
C code should not make any assumptions about the architecture that the
code is compiled for, other than assuming that an implementation of gcc
for that architecture is available and will be used."

You're absolutely right about that. On the other hand, I was upfront
about the fact that I'm fine with using gcc extensions.

Regards,
Sven

Sven Köhler · Sep 11, 2013

What's wrong with accessing it as int64_t? That's the simplest way, and
any compiler conforming to C99 where it's not also the most efficient
way is poorly implemented.

That's what's wrong: to the best of my knowledge, a 4byte aligned
int64_t implies undefinied behaviour.

The C standard never forbids any kind of code - it just says that, in
some cases, the behavior of your code is not defined by the C standard.
This is one example of that.

See? That's whats wrong with accessing it as int64_t!

That's nothing inherently wrong with that.
There could be something other than the C standard which does define the
behavior (such as the POSIX standard or the documentation of your
compiler). If the only places where your code needs to work are places
where such a guarantee applies, then it can be reasonable, and in some
cases even necessary, to write such code.

I was very imprecise as to why having a 4byte aligned int64_t pointer is
bad in my case. Sorry for that. But I also didn't specifically say, that
it was all due to the C standard. Nevertheless, thanks for pointing this
out.

However, if no such guarantees apply to every system where your code
needs to be usable, then you should not write such code. If you're
unlucky, it may work the way you expected; if you're lucky, it will fail
catastrophically, so you'll learn not to write such code.
Indeed!

In C2011, a new feature was introduced, called _Alignas(). I can't be
sure, but I suspect that __attribute__((aligned(4))) might be equivalent
to _Alignas(4). So as soon as C2011 compiler become sufficiently common
(don't hold your breath while waiting for that to happen), you could use
_Alignas() instead, and this is the right place to ask questions about
_Alignas(). Use of _Alignas(4) should enable use of the "load double
word" instruction, and decent quality compilers can reasonably be
expected to do so, where appropriate. However, such use is not mandated
by the C standard - that's entirely a matter of "Quality of
Implementation" (or QoI), which is outside the scope of the standard.

I think you're mixing up numbers here. A double word is 8 bytes. If my
guess is correct, then _Alignas(4) would enforce 4 byte alignment. The
question to ask here is: can _Alignas be used to _lower_ the alignment?
Gcc's align attribute can only be used to increase the alignment. Hence,
I need to use it in combination with packed. Anyhow, I will try to find
information on _Alignas.

While not a feature of C itself, packing is a commonplace extension, but
it's usually about making sure that there's no padding bytes between
members of a struct. It's pretty much meaningless for single-member structs.

Your answer is not contribution anything new here. I discusses that
already in my original posting. Like you, I have no idea why gcc assumes
1byte alignment - even for the single member of a struct.

But, if the total site of the packed struct would be odd, then the
compiler would have to assume that a pointed to that struct might be odd
(due to the memory layout of arrays of that struct). Correct?

Regards,
Sven

Sven Köhler · Sep 11, 2013

Am I wrong and the compiler must expect that a pointer to s1 may have an
alignment less than 8?

So at least for gcc's packed attribute it is true that there's no
padding before or after a packed struct if it is used in another
(non-packed) struct, for example:

typedef struct __attribute__((packed)) {
int64_t x;
} s1;

typedef struct {
int8_t x;
s1 y;
} s2;

The size of s2 is actually equal to 9, indicating that s2.y has an odd
offset. This could be the reason why gcc assumes 1byte alignment for s1
pointers.

Regards,
Sven

James Kuyper · Sep 11, 2013

That's what's wrong: to the best of my knowledge, a 4byte aligned
int64_t implies undefinied behaviour.

I know of no reason why that should be the case - could explain why you
think it is?

See? That's whats wrong with accessing it as int64_t!

I was referring to your attempts to access it in ways other than as int64_t.

I think you're mixing up numbers here. A double word is 8 bytes. If my

I did miss that out - I was thinking of machines where a word was 2
bytes. That was the case on every machine where I've ever had to worry
about the word size, though I'm well aware that other word sizes exist.
That's the problem with using "word" to describe the alignment - it
means different things on different platforms.

guess is correct, then _Alignas(4) would enforce 4 byte alignment. The
question to ask here is: can _Alignas be used to _lower_ the alignment?

No. "The combined effect of all alignment attributes in a declaration
shall not specify an alignment that is less strict than the alignment
that would otherwise be required for the type of the object or member
being declared." (6.7.5p4)

That "shall" occurs in a Constraints section, so creating such an
alignment attribute would be a constraint violation, requiring a
diagnostic. _Alignas() is a new feature, and I hadn't previously noticed
this clause. I had thought that _Alignas() requirements that were less
strict were simply ignored.

This means that unless you're certain whether or not _Alignof(T) is less
than _Alignof(U), (where T and U are type names) you should write:

_Alignof(T) _Alignof(U) U u;

That seems unnecessarily clumsy to me: there should always be an
implicit _Alignof(U) whenever you declare something to have the type U.

Note that it is implementation-defined whether alignments stricter than
_Alignof(max_align_t) are supported. (6.2.8p3)

....

Your answer is not contribution anything new here. I discusses that
already in my original posting. Like you, I have no idea why gcc assumes
1byte alignment - even for the single member of a struct.

But, if the total site of the packed struct would be odd, then the
compiler would have to assume that a pointed to that struct might be odd
(due to the memory layout of arrays of that struct). Correct?

That implies that "__attribute__((packed))" not only prohibits padding
between members of a struct, but also padding at the end of the struct.
That might be right; I wouldn't know - it's not something I've ever
needed to use.
It would be an extremely unusual implementation that inserts any padding
at all in a struct that contains only one member, of type int64_t. It's
not impossible, but I would not recommend worrying about it unless you
know for certain that it's happening.

Eric Sosman · Sep 11, 2013

The C code should not make any assumptions about the architecture that
the code is compiled for. For example, x86 supports unaligned access,
for example. Hence, if your program is intended for x86 only, then you
might cast an int32_t* into int64_t* without any worries. (Not sure, if
that is completely true.)

Okay, I *think* I get it, but let me try to restate the
problem in case I'm still lost:

You've got a pointer to a batch of bytes that you'd like
to treat as an int64_t, but you fear the address may not meet
int64_t's alignment requirement. You've tried various gcc
extensions but aren't entirely happy with them, because they
produce ultra-conservative byte-at-a-time code even on systems
where the penalty for unaligned access would be tolerable. You
seek an incantation that will produce "good" code on such systems
yet produce "safe" code on others. Have I got it?

If so, I think you're in the wrong group: Your question is
all about what kind of code gcc generates in response to this or
that set of gcc-specific extensions (including the empty set).
Perhaps a gcc forum -- there must be one for gcc developers, if
nothing else -- would be a better source of information about
gcc's code generation.

From C's perspective, the "portable" approach looks something
like

#include <string.h>

static inline // if desired
int64_t fetch64(void *ptr) {
int64_t value;
memcpy(&value, ptr, sizeof value);
return value;
}

static inline // if desired
void store64(void *ptr, int64_t value) {
memcpy(ptr, &value, sizeof value);
}

.... which I suspect won't fill you with joy. :-(

James Kuyper · Sep 11, 2013

On 09/11/2013 04:39 PM, David Brown wrote:
....

int64_t fetch64(uint32_t *p) {
union {
struct { uint32_t a; uint32_t b; } s;
int64_t x;
} u;
u.s.a = *p++;
u.s.b = *p;
return u.x;
}

This should work on all architectures, and give the best code for
reading an 8-byte int from an address that is known to be 4-byte
aligned. As far as I know, the code is fully portable C (hopefully
someone will correct me if I'm wrong - this c.l.c. is good at that!).

The code seems to assume that there's no padding bytes between u.s.a and
u.s.b. I know of nothing in the C standard that forbids such padding,
though I agree that it's extremely unlikely to be present.

Sven Köhler · Sep 11, 2013

Am 11.09.2013 20:45, schrieb James Kuyper:

I know of no reason why that should be the case - could explain why you
think it is?

You (and also David) are right! I did some research and it seems that C
say anything about the alignment. Maybe it's part of the target
architecture's ABI specification. But regardless of what specification
implies this limitation, as a C programm I have to care about it if my
code is supposed to support a good number of target architectures. For
several target platforms (I specifically mentioned arm), specifications
exist that state or imply that ordinary int64_t pointers can only be
dereferenced if the address is a multiple of 8.

No. "The combined effect of all alignment attributes in a declaration
shall not specify an alignment that is less strict than the alignment
that would otherwise be required for the type of the object or member
being declared." (6.7.5p4)

Then it is very much like gcc's aligned attribute.
Basically, _Alignas() cannot be used to solve my problem, as the desired
alignment (4bytes) is less strict than the alignment that would
otherwise be required for the type (8 bytes in case of arm and several
other architectures).

Regards,
Sven

Eric Sosman · Sep 11, 2013

On 09/11/2013 04:39 PM, David Brown wrote:
...

The code seems to assume that there's no padding bytes between u.s.a and
u.s.b. I know of nothing in the C standard that forbids such padding,
though I agree that it's extremely unlikely to be present.

An array fixes things:

union {
uint32_t s[2];
int64_t x;
} u;
u.s[0] = *p++;
...

Sven Köhler · Sep 11, 2013

Am 11.09.2013 23:39, schrieb David Brown:

Then you've done something wrong... see the end of the post.

I expected this answer. I've read it several time and the consensus
seems to be that I shouldn't need to do this. Now this is correct 99.9%
of the time, I guess. Would it help, if I'd explain why I really really
need to do this? Well, here it goes: you really really need to access a
4byte aligned int64_t, if you would be writing a Java byte code
interpreter. Java stack consists of 4 byte words, and a 64bit integer
spans across 2 words of the stack. The are no alignment guarantees.
One could alter the Java byte code to have double values only in stack
cells 2i and 2i+1 and never in stack cells 2i-1 and 2i. I'm currently
not planning to do this.

Also wrong...

While newer arm CPUs support non-aligned doubleword access, older ones
don't:
http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/Chdggchb.html
(that's not the best link, but I can't find the updated document right now).

Also, gcc uses the ldrd instruction to load an int64_t when the targets
architecture supports it. (Depending on gcc's mood, gcc may also use one
ldmia or two ldr instructions.)

int64_t fetch64(uint32_t *p) {
union {
struct { uint32_t a; uint32_t b; } s;
int64_t x;
} u;
u.s.a = *p++;
u.s.b = *p;
return u.x;
}

This should work on all architectures, and give the best code for
reading an 8-byte int from an address that is known to be 4-byte
aligned. As far as I know, the code is fully portable C (hopefully
someone will correct me if I'm wrong - this c.l.c. is good at that!).

Compiling with gcc for the ARM (arm7tdmi) gives:

fetch64:
ldmia r0, {r0, r1}
bx lr

Like most (or perhaps all) 32-bit processors, ARM is perfectly happy
with 4-byte alignment for 8-byte integers.

That's what I thought, but ARM's documentation proved me wrong. Take a
look at the Architecture Reference Manual for ARMv5/v6. It states that
prior to ARMv6 the LDRD instruction requires 8 byte alignment.
This may be related to unaligned 64bit transfers across cache lines.

(It may require 8-byte
alignment for doubles for cpus that support hardware floating point - I
haven't checked those details.) On some ARMs, it is more efficient when
the load is 8-byte aligned because it can use a single 64-bit memory
access - but it will still work fine with 4-byte alignment.

Well, I also need to access misaligned doubles ;-)
Yes, starting with ARMv6, LDRD can handle 4byte aligned addresses.

Regards,
Sven

James Kuyper · Sep 11, 2013

Am 11.09.2013 20:45, schrieb James Kuyper:

You (and also David) are right! I did some research and it seems that C
say anything about the alignment.

I assume that there's a "does not" missing between "C" and "say" in that
sentence?

If so, that's incorrect - the C standard has a great deal to say about
alignment. However, you have not yet described the problem in a way that
makes what C says about alignment a problem.

... Maybe it's part of the target
architecture's ABI specification. But regardless of what specification
implies this limitation, as a C programm I have to care about it if my
code is supposed to support a good number of target architectures. For
several target platforms (I specifically mentioned arm), specifications
exist that state or imply that ordinary int64_t pointers can only be
dereferenced if the address is a multiple of 8.

It's worse than that; on implementations where such alignment
restrictions exist, it's not even possible to create such a mis-aligned
pointer value with defined behavior. It's the creation of such pointer
values that you should be worried about, not dereferencing them.

Then it is very much like gcc's aligned attribute.
Basically, _Alignas() cannot be used to solve my problem, as the desired
alignment (4bytes) is less strict than the alignment that would
otherwise be required for the type (8 bytes in case of arm and several
other architectures).

I think that I understand what you're probably concerned about. You're
assuming a conventional system with CHAR_BIT==8 (I mention this only for
completeness). You're worried about the possibility that, in terms of
C2011, _Alignof(int64_t)==8. You say you have an 4-byte aligned int64_t.
There's no way to create an int64_t object whose alignment is less
strict than _Alignof(int64_t) using strictly conforming code. Therefore,
you're either talking about non-strictly conforming code, or you're not
describing it correctly. Either one is possible, but I suspect it's the
latter. I think that what you have is a pointer to a block of memory,
which is only aligned to 4 bytes, which contains the same bytes that
would represent an int64_t value if those bytes were correctly aligned
and accessed through an lvalue of that type.

If that's the case, then Eris Sosman's latest suggestion is the
maximally portable way to do what you want, but it's probably not as
efficient as you'd like. However, David Brown's latest suggestion is
almost as portable, so long as you replace the struct with a two-element
array of uint32_t. You should also declare it "static inline", as he
mentioned in an earlier message.
In principle, there need not be any uint32_t type, which is why David's
suggestion is less portable than Eric's. However, a conforming
implementation of C which does support int64_t, and where
sizeof(int64_t) > 1, is almost certain to support uint32_t as well. On
such as system, a good compiler is likely to translate fetch64() into
machine code that you'll find acceptably efficient.

Eric Sosman · Sep 11, 2013

[...] Would it help, if I'd explain why I really really
need to do this? Well, here it goes: you really really need to access a
4byte aligned int64_t, if you would be writing a Java byte code
interpreter. Java stack consists of 4 byte words, and a 64bit integer
spans across 2 words of the stack. The are no alignment guarantees.

<topicality level="marginal" trend="diminishing">

A Java Virtual Machine Stack holds "frames," one per method
(et cetera) activation (JVM Spec 2.5.2, 2.6). Java does *not*
require an 8-byte value to occupy two 4-byte stack slots; on the
contrary, see JVMS 2.6.2:

"Each entry on the operand stack can hold a value of any
Java Virtual Machine type, including a value of type long
or type double."

and

"It is not possible, for example, to push two int values
and subsequently treat them as a long or to push two float
values and subsequently add them with an iadd instruction."

There *is* this confusing notion of "stack depth" (ibid.),
where 8-byte long and double count two units while a 4-byte int
counts just one. But this has nothing to do with the size of
the stacked data! Proof: On a 64-bit JVM with 64-bit object
references, pushing a reference onto the stack takes *one* unit!
My guess is that the 1-vs-2 stuff is a holdover from some early
iteration of Java's design; there are hints of the same confusion
in the class file format, too:

"In retrospect, making 8-byte constants take two constant
pool entries was a poor choice." -- JVMS 4.4.5

One could alter the Java byte code to have double values only in stack
cells 2i and 2i+1 and never in stack cells 2i-1 and 2i. I'm currently
not planning to do this.

I doubt that altering the byte code is necessary (and may
not be feasible, in light of some of the instrumentation API's).
During class file verification (JVMS 4.10) you will discover the
stack "depth" as of each push and pop, and will know the type of
every stacked operand at every point. It seems to me you could
use that information to decide how to push and pop each value,
either by knowing (statically!) whether the access is going to
be aligned or misaligned, or by pushing and popping a "spacer
word" so the data accesses are always aligned (remember, static
analysis tells you whether any particular access needs a spacer).
The "stack depth" declared in a class file is only tenuously
related to the amount of memory the JVM will actually use.

</topicality>

You may get better ideas about JVM implementation from Java
forums than you'll get in a forum about the language in which
you choose to write your JVM. comp.lang.java.programmer may not
be the place where you'll get those ideas, but somebody there is
sure to be able to give you a link or two.

Sven Köhler · Sep 12, 2013

Am 12.09.2013 01:16, schrieb Eric Sosman:

[...] Would it help, if I'd explain why I really really
need to do this? Well, here it goes: you really really need to access a
4byte aligned int64_t, if you would be writing a Java byte code
interpreter. Java stack consists of 4 byte words, and a 64bit integer
spans across 2 words of the stack. The are no alignment guarantees.

Click to expand...

<topicality level="marginal" trend="diminishing">

A Java Virtual Machine Stack holds "frames," one per method
(et cetera) activation (JVM Spec 2.5.2, 2.6). Java does *not*
require an 8-byte value to occupy two 4-byte stack slots; on the
contrary, see JVMS 2.6.2:

"Each entry on the operand stack can hold a value of any
Java Virtual Machine type, including a value of type long
or type double."

If you had to guess, would you say that an implementation of a byte-code
interpreter would implement the notion of entries as introduced above?
It would probably exploit the following:

and

"It is not possible, for example, to push two int values
and subsequently treat them as a long or to push two float
values and subsequently add them with an iadd instruction."

It gives the implementation a great deal of freedom. For example, you
could reserve 64 bits for each value being pushed - even if it was a
32bit int.
But you could also reserve only 32bit for an int and 64bit for a long.
And that would make very much sense. Especially if you need to save
space. Now if the byte code pushes three ints and one long onto the
stack, then the long will be misaligned, right?

I guess, you just wanted to point out that I should not have said that
the Java stack consists of 4 byte words. Ah well, yes. It's only an
aspect of the implementation I'm talking about. However, it's the
obvious way of implementing the stack, unless you have a lot of space to
waste.

There *is* this confusing notion of "stack depth" (ibid.),
where 8-byte long and double count two units while a 4-byte int
counts just one. But this has nothing to do with the size of
the stacked data! Proof: On a 64-bit JVM with 64-bit object
references, pushing a reference onto the stack takes *one* unit!
My guess is that the 1-vs-2 stuff is a holdover from some early
iteration of Java's design; there are hints of the same confusion
in the class file format, too:

Or is this evidence that they didn't think about 64bit references back then?

"In retrospect, making 8-byte constants take two constant
pool entries was a poor choice." -- JVMS 4.4.5

*eg*

Let me guess, these 8-byte constants can be misaligned too?

I doubt that altering the byte code is necessary (and may
not be feasible, in light of some of the instrumentation API's).
During class file verification (JVMS 4.10) you will discover the
stack "depth" as of each push and pop, and will know the type of
every stacked operand at every point. It seems to me you could
use that information to decide how to push and pop each value,
either by knowing (statically!) whether the access is going to
be aligned or misaligned, or by pushing and popping a "spacer
word" so the data accesses are always aligned (remember, static
analysis tells you whether any particular access needs a spacer).
The "stack depth" declared in a class file is only tenuously
related to the amount of memory the JVM will actually use.

The spacer word could also be added by modifying the byte code.

</topicality>

You may get better ideas about JVM implementation from Java
forums than you'll get in a forum about the language in which
you choose to write your JVM. comp.lang.java.programmer may not
be the place where you'll get those ideas, but somebody there is
sure to be able to give you a link or two.

But this wasn't about "how to implement a JVM" to begin with.
I'd even write inline assembly. But knowing that people take the source
and port it to other platforms, I wanted to be nice and write something
more portable.

Regards,
Sven

Sven Köhler · Sep 12, 2013

Am 12.09.2013 00:59, schrieb James Kuyper:

I assume that there's a "does not" missing between "C" and "say" in that
sentence?

I hate when that happens. Yes, a "does not" was intended.

If so, that's incorrect - the C standard has a great deal to say about
alignment. However, you have not yet described the problem in a way that
makes what C says about alignment a problem.

It's worse than that; on implementations where such alignment
restrictions exist, it's not even possible to create such a mis-aligned
pointer value with defined behavior. It's the creation of such pointer
values that you should be worried about, not dereferencing them.

So casting int32_t* to int64_t* is undefined behaviour?

I think that I understand what you're probably concerned about. You're
assuming a conventional system with CHAR_BIT==8 (I mention this only for
completeness). You're worried about the possibility that, in terms of
C2011, _Alignof(int64_t)==8. You say you have an 4-byte aligned int64_t.
There's no way to create an int64_t object whose alignment is less
strict than _Alignof(int64_t) using strictly conforming code. Therefore,
you're either talking about non-strictly conforming code, or you're not
describing it correctly. Either one is possible, but I suspect it's the
latter. I think that what you have is a pointer to a block of memory,
which is only aligned to 4 bytes, which contains the same bytes that
would represent an int64_t value if those bytes were correctly aligned
and accessed through an lvalue of that type.

That is correct.

If that's the case, then Eris Sosman's latest suggestion is the
maximally portable way to do what you want, but it's probably not as
efficient as you'd like. However, David Brown's latest suggestion is
almost as portable, so long as you replace the struct with a two-element
array of uint32_t. You should also declare it "static inline", as he
mentioned in an earlier message.

Didn't I mention the union technique in my first posting? Well, I didn't
know that an array is somehow better than a struct. A padding in the
struct would only be added if uint32_t would require an alignment larger
then the size of uint32_t. And if an uint32_t required an alignment
larger than its size, why would the compiler not add the padding in the
array? Then the array would contain misaligned elements. That would be
fun, I guess.

Regards,
Sven

Sven Köhler · Sep 12, 2013

Am 11.09.2013 23:07, schrieb David Brown:

That should work fine. I've just tested it using gcc for the arm, and
the code generated is ideal as far as I can see.

Yes, sometimes gcc replaces memcpy of fixed size with loads and stores.
I wish I could say that it happens all the time. So far, I have always
encountered a situation where gcc "forgets" to substitute memcpy. clang
is much better at it, but llvm created invalid assembly code for my target.

Two points are important, however. First, if the data is 4-byte aligned
(but not 8-bit aligned), then make sure the compiler /knows/ it is
4-byte aligned. If it knows nothing about the alignment, then it cannot
optimise the code - on processors like the ARM which do not support
unaligned access, then it has to use byte loads.

I changed the void* parameters to int32_t* and clang happily replaced
memcpy with 4byte loads and stores.

Secondly, make sure optimisation is enabled - and use the "static
inline" here. gcc will see that you are doing memcpy with 8 bytes of
data with a 4-byte aligned source and destination, and replace the
memcpy() call with two 32-bit loads.

I wish it was that simply and gcc would be more reliable.

Regards,
Sven

James Kuyper · Sep 12, 2013

On 09/11/2013 07:47 PM, ï¿½ wrote:
....

So casting int32_t* to int64_t* is undefined behaviour?

Yes, if int64_t is more strictly aligned than int32_t. (6.3.2.3p7)

....

Didn't I mention the union technique in my first posting? Well, I didn't
know that an array is somehow better than a struct. A padding in the
struct would only be added if uint32_t would require an alignment larger
then the size of uint32_t.

The way arrays work implies that sizeof(T) must be an integer multiple
of _Alignof(T), so you don't need to worry about that possibility.
However, the standard says nothing to prohibit unnecessary padding
between members of a struct; it does prohibit padding between elements
of an array. Because the padding is in fact unnecessary, you're pretty
unlikely to run into an implementation where the difference matters. But
using an array is no more complicated than using a struct (in fact, it's
marginally simpler), so why not use the approach that is also safer,
even if only by an infinitesimal amount?

Struct with unaligned fields	58	Aug 22, 2013
Practical packing for structs of bytes	12	Sep 17, 2010
Unaligned pointers question	7	Oct 22, 2005
gcc alignment options	19	Sep 16, 2012
Alignment, Cast	27	Aug 28, 2007
incompatible pointer assignment	7	Dec 10, 2012
Alignment problems	20	Dec 1, 2011
Can one get away with an under-allocated union?	5	Dec 25, 2010

unaligned pointer access

Sven Köhler

Eric Sosman

Sven Köhler

James Kuyper

James Kuyper

Sven Köhler

Sven Köhler

Sven Köhler

James Kuyper

Eric Sosman

James Kuyper

Sven Köhler

Eric Sosman

Sven Köhler

James Kuyper

Eric Sosman

Sven Köhler

Sven Köhler

Sven Köhler

James Kuyper

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads