unaligned pointer access

Discussion in 'C Programming' started by Sven Köhler, Sep 11, 2013.

  1. Sven Köhler

    Sven Köhler Guest

    Hi,

    I'm currently trying to find out which possibilities exist to access a
    4byte aligned int64_t. I know, that I could declare a union of int64_t
    and two int32_t and copy each int32_t independently to the struct. This
    works. But the assembly code that gcc generates is - well - not optimal.
    I tried what happened if I simply cast 4byte aligned pointer to an
    int64_t pointer. I know that that's basically forbidden and isn't
    portable. But the assembly code was much nice. Also, my primary target
    architecture (arm 32bit) has a "load double word" instruction, that only
    works with 8byte aligned pointers. So obviously, I was just lucky gcc
    didn't end up using the "load double word" instruction.

    Currently, I'm experimenting with packed structs. Consider the following :

    typedef struct __attribute__((packed)) {
    int64_t x;
    } s1;


    The code uses a gcc extension (the packed attribute). Given a variable
    s1 *p, gcc loads the value of p->x byte-wise (i.e. using arm's ldrb
    instruction). This seems strange to me because according to the
    documentation, the packed attribute ensures that no alignment padding is
    used. But first of all, sizeof s1 is equal to 8. Secondly, the member x
    is at offset 0 of the struct. So it seems to me, that one could not
    obtain any pointers of type s1* without cheating heavily (e.g. by
    casting unaligned pointers to s1*). clang also loads p->x byte by byte.

    Am I wrong and the compiler must expect that a pointer to s1 may have an
    alignment less than 8? If yes, then a packed struct is exactly what I'm
    looking for. I already tested the following:

    typedef struct __attribute__((packed)) {
    __attribute__((aligned(4))) int64_t x;
    } s4;

    Both gcc and clang load the value of member x word-wise, half-word-wise,
    or byte-wise depending on whether the aligned attribute indicates 4, 2,
    or 1-byte alignment.


    Using the the packed attribute without a struct or specifying the
    alignment attribute didn't work. The packed attribute does only seem to
    be for structs. The alignment attribute can only increase but not
    decrease the alignment.

    Any thoughts?

    I'm not afraid of using gcc extension, but the code should be portable.


    Regards,
    Sven
     
    Sven Köhler, Sep 11, 2013
    #1
    1. Advertisements

  2. Sven Köhler

    Eric Sosman Guest

    What do you mean by "portable?"
     
    Eric Sosman, Sep 11, 2013
    #2
    1. Advertisements

  3. Sven Köhler

    Sven Köhler Guest

    The C code should not make any assumptions about the architecture that
    the code is compiled for. For example, x86 supports unaligned access,
    for example. Hence, if your program is intended for x86 only, then you
    might cast an int32_t* into int64_t* without any worries. (Not sure, if
    that is completely true.)
     
    Sven Köhler, Sep 11, 2013
    #3
  4. Sven Köhler

    James Kuyper Guest

    But making use of gcc extensions assumes that there's an implementation
    of gcc for that architecture. That's a pretty good bet, but it's not
    always the case. A definition of "portable" that is intended to allow
    the use of gcc extensions should acknowledge that fact explicitly: "The
    C code should not make any assumptions about the architecture that the
    code is compiled for, other than assuming that an implementation of gcc
    for that architecture is available and will be used."
     
    James Kuyper, Sep 11, 2013
    #4
  5. Sven Köhler

    James Kuyper Guest

    What's wrong with accessing it as int64_t? That's the simplest way, and
    any compiler conforming to C99 where it's not also the most efficient
    way is poorly implemented.
    The C standard never forbids any kind of code - it just says that, in
    some cases, the behavior of your code is not defined by the C standard.
    This is one example of that. That's nothing inherently wrong with that.
    There could be something other than the C standard which does define the
    behavior (such as the POSIX standard or the documentation of your
    compiler). If the only places where your code needs to work are places
    where such a guarantee applies, then it can be reasonable, and in some
    cases even necessary, to write such code.

    However, if no such guarantees apply to every system where your code
    needs to be usable, then you should not write such code. If you're
    unlucky, it may work the way you expected; if you're lucky, it will fail
    catastrophically, so you'll learn not to write such code.
    Any questions about how a gcc-specific feature works are best asked in a
    forum that is also gcc-specific, because you'll get more reliable
    answers there. This is NOT such a forum. Similarly for clang.
    In C2011, a new feature was introduced, called _Alignas(). I can't be
    sure, but I suspect that __attribute__((aligned(4))) might be equivalent
    to _Alignas(4). So as soon as C2011 compiler become sufficiently common
    (don't hold your breath while waiting for that to happen), you could use
    _Alignas() instead, and this is the right place to ask questions about
    _Alignas(). Use of _Alignas(4) should enable use of the "load double
    word" instruction, and decent quality compilers can reasonably be
    expected to do so, where appropriate. However, such use is not mandated
    by the C standard - that's entirely a matter of "Quality of
    Implementation" (or QoI), which is outside the scope of the standard.
    While not a feature of C itself, packing is a commonplace extension, but
    it's usually about making sure that there's no padding bytes between
    members of a struct. It's pretty much meaningless for single-member structs.

    ....
    Portable is a matter of degree. gcc extensions will work on gcc, and gcc
    is widely available, so in that sense they can be fairly portable.
    However, those extension are not guaranteed to work with any other
    compiler, so in that sense they're unportable. You'll have to determine
    precisely what "should be portable" means to you - for me, it certainly
    wouldn't include gcc extensions.
     
    James Kuyper, Sep 11, 2013
    #5
  6. Sven Köhler

    Sven Köhler Guest

    You're absolutely right about that. On the other hand, I was upfront
    about the fact that I'm fine with using gcc extensions.


    Regards,
    Sven
     
    Sven Köhler, Sep 11, 2013
    #6
  7. Sven Köhler

    Sven Köhler Guest

    That's what's wrong: to the best of my knowledge, a 4byte aligned
    int64_t implies undefinied behaviour.
    See? That's whats wrong with accessing it as int64_t!
    I was very imprecise as to why having a 4byte aligned int64_t pointer is
    bad in my case. Sorry for that. But I also didn't specifically say, that
    it was all due to the C standard. Nevertheless, thanks for pointing this
    out.
    I think you're mixing up numbers here. A double word is 8 bytes. If my
    guess is correct, then _Alignas(4) would enforce 4 byte alignment. The
    question to ask here is: can _Alignas be used to _lower_ the alignment?
    Gcc's align attribute can only be used to increase the alignment. Hence,
    I need to use it in combination with packed. Anyhow, I will try to find
    information on _Alignas.
    Your answer is not contribution anything new here. I discusses that
    already in my original posting. Like you, I have no idea why gcc assumes
    1byte alignment - even for the single member of a struct.

    But, if the total site of the packed struct would be odd, then the
    compiler would have to assume that a pointed to that struct might be odd
    (due to the memory layout of arrays of that struct). Correct?


    Regards,
    Sven
     
    Sven Köhler, Sep 11, 2013
    #7
  8. Sven Köhler

    Sven Köhler Guest

    So at least for gcc's packed attribute it is true that there's no
    padding before or after a packed struct if it is used in another
    (non-packed) struct, for example:

    typedef struct __attribute__((packed)) {
    int64_t x;
    } s1;

    typedef struct {
    int8_t x;
    s1 y;
    } s2;

    The size of s2 is actually equal to 9, indicating that s2.y has an odd
    offset. This could be the reason why gcc assumes 1byte alignment for s1
    pointers.


    Regards,
    Sven
     
    Sven Köhler, Sep 11, 2013
    #8
  9. Sven Köhler

    James Kuyper Guest

    I know of no reason why that should be the case - could explain why you
    think it is?

    I was referring to your attempts to access it in ways other than as int64_t.
    I did miss that out - I was thinking of machines where a word was 2
    bytes. That was the case on every machine where I've ever had to worry
    about the word size, though I'm well aware that other word sizes exist.
    That's the problem with using "word" to describe the alignment - it
    means different things on different platforms.
    No. "The combined effect of all alignment attributes in a declaration
    shall not specify an alignment that is less strict than the alignment
    that would otherwise be required for the type of the object or member
    being declared." (6.7.5p4)

    That "shall" occurs in a Constraints section, so creating such an
    alignment attribute would be a constraint violation, requiring a
    diagnostic. _Alignas() is a new feature, and I hadn't previously noticed
    this clause. I had thought that _Alignas() requirements that were less
    strict were simply ignored.

    This means that unless you're certain whether or not _Alignof(T) is less
    than _Alignof(U), (where T and U are type names) you should write:

    _Alignof(T) _Alignof(U) U u;

    That seems unnecessarily clumsy to me: there should always be an
    implicit _Alignof(U) whenever you declare something to have the type U.

    Note that it is implementation-defined whether alignments stricter than
    _Alignof(max_align_t) are supported. (6.2.8p3)

    ....
    That implies that "__attribute__((packed))" not only prohibits padding
    between members of a struct, but also padding at the end of the struct.
    That might be right; I wouldn't know - it's not something I've ever
    needed to use.
    It would be an extremely unusual implementation that inserts any padding
    at all in a struct that contains only one member, of type int64_t. It's
    not impossible, but I would not recommend worrying about it unless you
    know for certain that it's happening.
     
    James Kuyper, Sep 11, 2013
    #9
  10. Sven Köhler

    Eric Sosman Guest

    Okay, I *think* I get it, but let me try to restate the
    problem in case I'm still lost:

    You've got a pointer to a batch of bytes that you'd like
    to treat as an int64_t, but you fear the address may not meet
    int64_t's alignment requirement. You've tried various gcc
    extensions but aren't entirely happy with them, because they
    produce ultra-conservative byte-at-a-time code even on systems
    where the penalty for unaligned access would be tolerable. You
    seek an incantation that will produce "good" code on such systems
    yet produce "safe" code on others. Have I got it?

    If so, I think you're in the wrong group: Your question is
    all about what kind of code gcc generates in response to this or
    that set of gcc-specific extensions (including the empty set).
    Perhaps a gcc forum -- there must be one for gcc developers, if
    nothing else -- would be a better source of information about
    gcc's code generation.

    From C's perspective, the "portable" approach looks something
    like

    #include <string.h>

    static inline // if desired
    int64_t fetch64(void *ptr) {
    int64_t value;
    memcpy(&value, ptr, sizeof value);
    return value;
    }

    static inline // if desired
    void store64(void *ptr, int64_t value) {
    memcpy(ptr, &value, sizeof value);
    }

    .... which I suspect won't fill you with joy. :-(
     
    Eric Sosman, Sep 11, 2013
    #10
  11. Sven Köhler

    James Kuyper Guest

    On 09/11/2013 04:39 PM, David Brown wrote:
    ....
    The code seems to assume that there's no padding bytes between u.s.a and
    u.s.b. I know of nothing in the C standard that forbids such padding,
    though I agree that it's extremely unlikely to be present.
     
    James Kuyper, Sep 11, 2013
    #11
  12. Sven Köhler

    Sven Köhler Guest

    Am 11.09.2013 20:45, schrieb James Kuyper:
    You (and also David) are right! I did some research and it seems that C
    say anything about the alignment. Maybe it's part of the target
    architecture's ABI specification. But regardless of what specification
    implies this limitation, as a C programm I have to care about it if my
    code is supposed to support a good number of target architectures. For
    several target platforms (I specifically mentioned arm), specifications
    exist that state or imply that ordinary int64_t pointers can only be
    dereferenced if the address is a multiple of 8.
    Then it is very much like gcc's aligned attribute.
    Basically, _Alignas() cannot be used to solve my problem, as the desired
    alignment (4bytes) is less strict than the alignment that would
    otherwise be required for the type (8 bytes in case of arm and several
    other architectures).


    Regards,
    Sven
     
    Sven Köhler, Sep 11, 2013
    #12
  13. Sven Köhler

    Eric Sosman Guest

    An array fixes things:

    union {
    uint32_t s[2];
    int64_t x;
    } u;
    u.s[0] = *p++;
    ...
     
    Eric Sosman, Sep 11, 2013
    #13
  14. Sven Köhler

    Sven Köhler Guest

    Am 11.09.2013 23:39, schrieb David Brown:
    I expected this answer. I've read it several time and the consensus
    seems to be that I shouldn't need to do this. Now this is correct 99.9%
    of the time, I guess. Would it help, if I'd explain why I really really
    need to do this? Well, here it goes: you really really need to access a
    4byte aligned int64_t, if you would be writing a Java byte code
    interpreter. Java stack consists of 4 byte words, and a 64bit integer
    spans across 2 words of the stack. The are no alignment guarantees.
    One could alter the Java byte code to have double values only in stack
    cells 2i and 2i+1 and never in stack cells 2i-1 and 2i. I'm currently
    not planning to do this.
    While newer arm CPUs support non-aligned doubleword access, older ones
    don't:
    http://infocenter.arm.com/help/topic/com.arm.doc.dui0068b/Chdggchb.html
    (that's not the best link, but I can't find the updated document right now).

    Also, gcc uses the ldrd instruction to load an int64_t when the targets
    architecture supports it. (Depending on gcc's mood, gcc may also use one
    ldmia or two ldr instructions.)
    That's what I thought, but ARM's documentation proved me wrong. Take a
    look at the Architecture Reference Manual for ARMv5/v6. It states that
    prior to ARMv6 the LDRD instruction requires 8 byte alignment.
    This may be related to unaligned 64bit transfers across cache lines.
    Well, I also need to access misaligned doubles ;-)
    Yes, starting with ARMv6, LDRD can handle 4byte aligned addresses.


    Regards,
    Sven
     
    Sven Köhler, Sep 11, 2013
    #14
  15. Sven Köhler

    James Kuyper Guest

    I assume that there's a "does not" missing between "C" and "say" in that
    sentence?

    If so, that's incorrect - the C standard has a great deal to say about
    alignment. However, you have not yet described the problem in a way that
    makes what C says about alignment a problem.
    It's worse than that; on implementations where such alignment
    restrictions exist, it's not even possible to create such a mis-aligned
    pointer value with defined behavior. It's the creation of such pointer
    values that you should be worried about, not dereferencing them.
    I think that I understand what you're probably concerned about. You're
    assuming a conventional system with CHAR_BIT==8 (I mention this only for
    completeness). You're worried about the possibility that, in terms of
    C2011, _Alignof(int64_t)==8. You say you have an 4-byte aligned int64_t.
    There's no way to create an int64_t object whose alignment is less
    strict than _Alignof(int64_t) using strictly conforming code. Therefore,
    you're either talking about non-strictly conforming code, or you're not
    describing it correctly. Either one is possible, but I suspect it's the
    latter. I think that what you have is a pointer to a block of memory,
    which is only aligned to 4 bytes, which contains the same bytes that
    would represent an int64_t value if those bytes were correctly aligned
    and accessed through an lvalue of that type.

    If that's the case, then Eris Sosman's latest suggestion is the
    maximally portable way to do what you want, but it's probably not as
    efficient as you'd like. However, David Brown's latest suggestion is
    almost as portable, so long as you replace the struct with a two-element
    array of uint32_t. You should also declare it "static inline", as he
    mentioned in an earlier message.
    In principle, there need not be any uint32_t type, which is why David's
    suggestion is less portable than Eric's. However, a conforming
    implementation of C which does support int64_t, and where
    sizeof(int64_t) > 1, is almost certain to support uint32_t as well. On
    such as system, a good compiler is likely to translate fetch64() into
    machine code that you'll find acceptably efficient.
     
    James Kuyper, Sep 11, 2013
    #15
  16. Sven Köhler

    Eric Sosman Guest

    <topicality level="marginal" trend="diminishing">

    A Java Virtual Machine Stack holds "frames," one per method
    (et cetera) activation (JVM Spec 2.5.2, 2.6). Java does *not*
    require an 8-byte value to occupy two 4-byte stack slots; on the
    contrary, see JVMS 2.6.2:

    "Each entry on the operand stack can hold a value of any
    Java Virtual Machine type, including a value of type long
    or type double."

    and

    "It is not possible, for example, to push two int values
    and subsequently treat them as a long or to push two float
    values and subsequently add them with an iadd instruction."

    There *is* this confusing notion of "stack depth" (ibid.),
    where 8-byte long and double count two units while a 4-byte int
    counts just one. But this has nothing to do with the size of
    the stacked data! Proof: On a 64-bit JVM with 64-bit object
    references, pushing a reference onto the stack takes *one* unit!
    My guess is that the 1-vs-2 stuff is a holdover from some early
    iteration of Java's design; there are hints of the same confusion
    in the class file format, too:

    "In retrospect, making 8-byte constants take two constant
    pool entries was a poor choice." -- JVMS 4.4.5
    I doubt that altering the byte code is necessary (and may
    not be feasible, in light of some of the instrumentation API's).
    During class file verification (JVMS 4.10) you will discover the
    stack "depth" as of each push and pop, and will know the type of
    every stacked operand at every point. It seems to me you could
    use that information to decide how to push and pop each value,
    either by knowing (statically!) whether the access is going to
    be aligned or misaligned, or by pushing and popping a "spacer
    word" so the data accesses are always aligned (remember, static
    analysis tells you whether any particular access needs a spacer).
    The "stack depth" declared in a class file is only tenuously
    related to the amount of memory the JVM will actually use.

    </topicality>

    You may get better ideas about JVM implementation from Java
    forums than you'll get in a forum about the language in which
    you choose to write your JVM. comp.lang.java.programmer may not
    be the place where you'll get those ideas, but somebody there is
    sure to be able to give you a link or two.
     
    Eric Sosman, Sep 11, 2013
    #16
  17. Sven Köhler

    Sven Köhler Guest

    Am 12.09.2013 01:16, schrieb Eric Sosman:
    If you had to guess, would you say that an implementation of a byte-code
    interpreter would implement the notion of entries as introduced above?
    It would probably exploit the following:
    It gives the implementation a great deal of freedom. For example, you
    could reserve 64 bits for each value being pushed - even if it was a
    32bit int.
    But you could also reserve only 32bit for an int and 64bit for a long.
    And that would make very much sense. Especially if you need to save
    space. Now if the byte code pushes three ints and one long onto the
    stack, then the long will be misaligned, right?

    I guess, you just wanted to point out that I should not have said that
    the Java stack consists of 4 byte words. Ah well, yes. It's only an
    aspect of the implementation I'm talking about. However, it's the
    obvious way of implementing the stack, unless you have a lot of space to
    waste.
    Or is this evidence that they didn't think about 64bit references back then?
    *eg* :)
    Let me guess, these 8-byte constants can be misaligned too?
    The spacer word could also be added by modifying the byte code.
    But this wasn't about "how to implement a JVM" to begin with.
    I'd even write inline assembly. But knowing that people take the source
    and port it to other platforms, I wanted to be nice and write something
    more portable.


    Regards,
    Sven
     
    Sven Köhler, Sep 12, 2013
    #17
  18. Sven Köhler

    Sven Köhler Guest

    Am 12.09.2013 00:59, schrieb James Kuyper:
    I hate when that happens. Yes, a "does not" was intended.
    So casting int32_t* to int64_t* is undefined behaviour?
    That is correct.
    Didn't I mention the union technique in my first posting? Well, I didn't
    know that an array is somehow better than a struct. A padding in the
    struct would only be added if uint32_t would require an alignment larger
    then the size of uint32_t. And if an uint32_t required an alignment
    larger than its size, why would the compiler not add the padding in the
    array? Then the array would contain misaligned elements. That would be
    fun, I guess.


    Regards,
    Sven
     
    Sven Köhler, Sep 12, 2013
    #18
  19. Sven Köhler

    Sven Köhler Guest

    Am 11.09.2013 23:07, schrieb David Brown:
    Yes, sometimes gcc replaces memcpy of fixed size with loads and stores.
    I wish I could say that it happens all the time. So far, I have always
    encountered a situation where gcc "forgets" to substitute memcpy. clang
    is much better at it, but llvm created invalid assembly code for my target.
    I changed the void* parameters to int32_t* and clang happily replaced
    memcpy with 4byte loads and stores.
    I wish it was that simply and gcc would be more reliable.


    Regards,
    Sven
     
    Sven Köhler, Sep 12, 2013
    #19
  20. Sven Köhler

    James Kuyper Guest

    On 09/11/2013 07:47 PM, � wrote:
    ....
    Yes, if int64_t is more strictly aligned than int32_t. (6.3.2.3p7)

    ....
    The way arrays work implies that sizeof(T) must be an integer multiple
    of _Alignof(T), so you don't need to worry about that possibility.
    However, the standard says nothing to prohibit unnecessary padding
    between members of a struct; it does prohibit padding between elements
    of an array. Because the padding is in fact unnecessary, you're pretty
    unlikely to run into an implementation where the difference matters. But
    using an array is no more complicated than using a struct (in fact, it's
    marginally simpler), so why not use the approach that is also safer,
    even if only by an infinitesimal amount?
     
    James Kuyper, Sep 12, 2013
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.