Padding involved

Discussion in 'C Programming' started by anish singh, Mar 7, 2014.

  1. anish singh

    Eric Sosman Guest

    *Much* better!
    There are different answers at different levels.

    At the hardware level, alignment and the padding to attain
    it are artifacts of the memory subsystem: Not just the busses,
    but the address translation circuitry, the various levels of
    cache, the inter-processor data-consistency protocols, and so
    on. This collection of components may find it easier or cheaper
    or faster to access particular types on a restricted set of
    addresses: For example, it might be advantageous to position a
    `double' object on an eight-byte boundary or an `int' on a
    four-byte boundary. (I don't think register widths have much
    to do with this -- but I'm no hardware designer, so don't treat
    my "I don't think" as Gospel.)

    But that's not the whole story. A host seldom consists only
    of the hardware; there's an operating system to think about. The
    O/S usually specifies an "application binary interface" or ABI
    that describes how data should be arranged when invoking system
    services or when dissecting their results. For example, even if
    the hardware is able to cope with an `int' at an arbitrary address,
    the ABI might insist on four-byte alignment (one possible reason
    to do so could be to simplify the "Does the caller really have
    access to all the bytes implied by this pointer?" test, by not
    having to worry about crossing page boundaries). Then again, an
    ABI might choose *not* to cater to all the hardware's whims: For
    example, a widely-used ABI calls for four-byte alignment of all
    stack-allocated data, even `double' objects that would be
    significantly faster if allocated on eight-byte boundaries.

    But even that isn't the whole story. Eventually, it's the
    developers of the compiler itself who decide what policies it will
    enforce. One developer might say "Speed is important: We'll put
    every object on an address that makes accesses the very fastest
    they can possibly be." Another might say "Memory bloat should be
    avoided: We'll pack the objects as tightly as we can while still
    maintaining reasonable (not optimal) speed." Yet another might
    say "This is an embedded machine with only 4KB of RAM, so memory
    is an extremely scarce resource and we'll pack everything down to
    the absolute minimum." In the end, that is, it's a human choice.
     
    Eric Sosman, Mar 8, 2014
    #21
    1. Advertisements

  2. anish singh

    BartC Guest

    I mentioned that in my post.

    But you didn't pick on the dozen or so non-uses of "%zu" in the rest of my
    post.

    However I dislike having to remember and use all these weird and wonderful
    format specifiers (a 'zoo' of them almost!). If the right one is that
    important, then the compiler should tell me about it (but only lccwin32
    seems to do so at default warning levels).

    Ideally it should figure it out for itself (a few years ago, I proposed a %?
    specifier for that purpose, for use in the 99.9% of cases where the format
    string was a constant). Because managing these format strings can be a lot
    of work (you change one type from int to long long, then you have to change
    hundreds of %d to %lld or %x to %llx).

    (FWIW, not using %zu doesn't seem to matter on my machine; when I'm
    compiling for 64-bits and a size_t value occupies 8 bytes, while int is 4
    bytes, then presumably the parameter stack is also 64-bit aligned so use of
    %d seems to have no ill-effects. Tested with 3 x64 compilers.)
     
    BartC, Mar 8, 2014
    #22
    1. Advertisements

  3. You are commenting on a post to someone else. That post had a single
    use of %d. What else was there to point out?
    On my machine gcc does too, but in any case it's wise to choose the
    warnings you care about. With gcc, I ask for almost everything a turn
    off the couple that I find annoying.

    <snip>
     
    Ben Bacarisse, Mar 8, 2014
    #23
  4. anish singh

    Joe Pfeiffer Guest

    Ah, yes, forgot that. Yep, might well be twelve.
    No reason at all. My point exactly.
     
    Joe Pfeiffer, Mar 8, 2014
    #24
  5. The compiler isn't allowed to alter the order of the members. So b must come after c in memory and d must come last. it also must place the first member
    right at the top of the structure. So struct abbcd x; char *ptr = (char *)&x;
    must give you the address of c.

    But it can insert other padding elements at will. Register size isn't a good
    guide, because often processors allow half word access, even have special half
    word registers, but make it less efficient than full-word access.
     
    Malcolm McLean, Mar 8, 2014
    #25
  6. Sorry I missed that. But you wrote that it "will vary between
    compilers". Not all compilers necessarily have a way to specify
    packing of structure members.
    I was replying to someone else. I don't point out every error in every
    post.
    gcc warns about about mismatches between format strings and arguments,
    at least in many cases. But warning about such mismatches in all cases
    is not possible. Format strings are interpreted at run time. A format
    string is commonly a string literal, but needn't be. You just have to
    develop the habit of using the right format yourself if you want to
    avoid undefined behavior.
    You can always convert the argument to a known type. For example, if u
    is of some unsigned type, but you're not sure which one, you can do:

    printf("%llu\n", (unsigned long long)u);

    Or if you happen to know that the value of u is fairly small (say,
    because it's the size of a structure that you know is smaller than 32
    kbytes), you can just convert to int:

    printf("%d\n", (int)sizeof whatever);
    I see the same behavior. I wouldn't be surprised to see it fail on a
    big-endian system. (Actually I just tried it and it "worked"; I'm not
    sure why.)

    But by using the correct format, perhaps with a cast, I don't have to
    worry about it; I know it will work.
     
    Keith Thompson, Mar 8, 2014
    #26
  7. No, the reverse. The offset must divide the size, or the size must be
    divisible by the offset. E.g. struct { long a; float b; } if both long
    and float are 4 bytes (as is common, though not universal and not
    required by the Standard) then the struct has size at least 8 but is
    unlikely to have alignment more than 4.
    If it contains an element with >alignment< N then the struct or union
    has alignment N or a multiple of N, and padding if necessary to make
    the size a multiple the alignment.

    Note that alignment can be less than size, most commonly on systems
    that can align everything to 1, but I've used a system where int is 4
    bytes and aligned to 2. Alignment cannot be more than size.
    Given that b has both size AND alignment 8, yes. Which AIUI is true
    for x86-64, but not in all architectures.
    static and auto objects must be sufficiently aligned for their actual
    type, but not necessarily for 'anything'. malloc() does have to align
    for 'anything' because it doesn't what the actual type will be.
     
    David Thompson, Mar 28, 2014
    #27
  8. (snip)
    As I understand it, for some systems an alignment greater than
    size is necessary for optimal use. Specifically, some of the SSE
    instructions, as I understand it, will process pairs of doubles
    aligned to 16 byte boundaries. I believe other combinations, such
    as four floats or four ints.

    Also, GPUs might have different alignment requirements than
    traditional processors.

    -- glen
     
    glen herrmannsfeldt, Mar 28, 2014
    #28
  9. anish singh

    James Kuyper Guest

    In C, every object of a given type must be allocated at a location which
    is correctly aligned for it's type. In an array, that means that the
    first element of the array and the second element must both be correctly
    aligned - but those two positions are also required to be separated by
    exactly sizeof(type) bytes. That's not possible unless sizeof(type) is
    an integer multiple of _Alignof(type).

    On the platform you describe, must every double be aligned on a 16 byte
    address, so the SSE instructions can always be used? Then that means
    that the SSE instructions will never actually operate on two 8-byte
    doubles at the same time; at most, they will operate on one 8-byte
    double and one 8-byte piece of padding. In that case, sizeof(double)
    must include the padding in order to implement arrays of double
    correctly, so sizeof(double)==16, not 8.

    Alternatively, is it perfectly feasible to have one double object
    aligned to a 16 byte address and the next double object aligned 8 bytes
    later, allowing both to be processed by the same SSE instruction? If so,
    then _Alignof(double) == 8, not 16.
    Having different alignment requirements is not, in itself, a problem for
    C, for which those requirements are implementation-defined. It's only
    inconsistencies of those requirements with other things such as
    sizeof(type) that would be a problem.
     
    James Kuyper, Mar 28, 2014
    #29
  10. anish singh

    Eric Sosman Guest

    From C's standpoint, alignment can never exceed size: Arrays
    would not work if it did.

    It can still be true -- is true -- that the host system may
    require or benefit from alignments that are unknown to C. For
    example, O/S interfaces like Unix' mmap() require alignment on
    memory pages. But "memory page" is not a C type, nor even a C
    concept, and there's no direct way for C to control memory page
    alignment. (Even with C11's _Alignas keyword, there's no way C
    can discover the memory page size unaided -- and on systems that
    support multiple page sizes simultaneously, the situation gets
    even thornier.)
     
    Eric Sosman, Mar 28, 2014
    #30
  11. In other words, althugh C permits padding between struct members,
    it does not permit padding between array elements.

    If there's a type that's "naturally" 12 bytes long but that requires
    8-byte alignment, that means that the compiler must treat that
    type as having a size of (at least) 16 bytes, with 4 bytes not
    contributing to the value. This is necessary because of the way C
    defines array indexing.
     
    Keith Thompson, Mar 28, 2014
    #31
  12. anish singh

    Kaz Kylheku Guest

    C does not support this. C compilers can support extra alignment for efficient
    access in the way local variables are laid out and perhaps struct members.

    No such thing will be supported for arrays and pointers.

    If a greater alignment than size is required for correctness, then misaligned
    access for pointers and arrays must be implemented.

    E.g. if a short is two bytes, but must be aligned on a four-byte boundary, then
    code generates for array indexing and pointer dereferncing has to somehow
    handle the accesses at odd indices. Perhaps by rounding down to an address
    divisible by four, loading a four byte word, and then shifting down
    the half-word.

    The guys who designed C were no strangers to machines that didn't provide
    access to certain small types such as characters.

    In fact, the B language, predecessor to C, handled strings similarly to C:
    characters were packed into arrays of cells, which had to be unpacked and
    re-packed by routines.

    http://cm.bell-labs.com/who/dmr/chist.html

    "[B's] character-handling mechanisms, inherited with few changes from BCPL,
    were clumsy: using library procedures to spread packed strings into
    individual cells and then repack, or to access and replace individual
    characters, began to feel awkward, even silly, on a byte-oriented machine. "

    So at that point Ritchie went for an addressable character type.
    That can basically be seen as the point of departure at which the design
    of C shifted toward "every type, down to the character/byte, is accessible at
    an address that is no more strictly aligned than a multiple of its size".
     
    Kaz Kylheku, Mar 28, 2014
    #32
  13. Such instructions operate not on a single object but on a group of
    objects, and it is the _group_ that must have greater alignment;
    however, that is invisible at the C level as long as you're using ints,
    floats, etc. If the compiler wants to auto-vectorize access to an array
    of such objects, it is responsible for generating code to handle any
    potential alignment issues at the front--and dealing with remainders at
    the end.

    One alternative is a compiler extension to create vector types, such as
    GCC's vector_size attribute; they are similar to (short) arrays but
    always have the correct alignment for vector instructions, unlike normal
    arrays, and that carries through to arrays of vectors. For instance:

    typedef int v4si __attribute__ ((vector_size (16)));
    v4si a = {1,2,3,4}; // always aligned
    v4si b[2] = {{1,2,3,4},{5,6,7,8}}; // always aligned
    int c[4] = {1,2,3,4}; // maybe unaligned

    S
     
    Stephen Sprunk, Mar 28, 2014
    #33
  14. (snip, someone wrote)

    (then I wrote)
    Pairs of doubles are aligned on 16 byte boundaries. If you have an
    array of even length, you could process them two at a time if
    appropriately aligned. You can then, for example, add a pair of
    doubles to another pair in one operation.
    In some cases, the compiler might be able to generate appropriate
    code, for example adding complex data. In others, one might want
    to call, as an example, and FFT routine written in assembler that
    could optimally use the SSE instructions on pairs of doubles.

    In the struct case, one might have a struct with a pair of doubles
    (or one complex double) along with some other types, and want the
    pair of doubles appropriately aligned, even in an array of such.

    -- glen
     
    glen herrmannsfeldt, Mar 28, 2014
    #34
  15. Standard C has no type "pair of doubles".

    Standard C guarantees that _Alignof(double) <= sizeof(double). On x86,
    we know that sizeof(double) == 8, so _Alignof(double) == 16 is not allowed.
    Standard C only guarantees that your array of doubles will have the same
    alignment as one double, i.e. 8 bytes on x86.

    If you want a guarantee that your array has 16-byte alignment, then you
    must either use/create another type with 16-byte alignment as your array
    element or use an extension to tell the compiler you want stricter
    alignment for a double (or array of doubles) than Standard C requires.

    Note that Standard C doesn't guarantee the existence of _any_ type with
    16-byte alignment or the ability to create such, so the former may not
    be possible, and the latter is inherently outside the Standard.
    If the subroutine is written in assembler, then obviously Standard C
    says nothing about what it can or can't do, nor does Standard C
    guarantee that a pointer-to-double you pass to it will be aligned as
    expected.

    If the subroutine were in Standard C, the compiler must properly handle
    the 8-byte aligned case. However, there is nothing stopping it from
    _also_ detecting the 16-byte aligned case and then using more efficient
    vector instructions.
    Assuming your struct only contains doubles, then the alignment will be
    the same as for one double, like in the array case above.

    S
     
    Stephen Sprunk, Mar 28, 2014
    #35
  16. anish singh

    Eric Sosman Guest

    I'm with you thus far, but ...
    ... are you sure of this last bit? It seems to me that
    the compiler is within its rights to require stricter alignment
    for a struct or union than for any of the individual elements.
    (It cannot use looser alignment, of course.) Do you have C&V
    to the contrary?
     
    Eric Sosman, Mar 28, 2014
    #36
  17. anish singh

    James Kuyper Guest

    If two double can be 8 bytes apart, then _Alignof(double)<=8. Since
    you'll probably want have at least one double in any group of two or
    more doubles to be aligned on a 16 byte boundary, that suggests that an
    implementation for that platform should choose _Alignof(double)==8.
    That is an optimization that a compiler is allowed to take advantage of,
    when it can - but if it doesn't prevent the the existence of doubles
    starting on addresses that are not multiples of 16, then it does NOT
    mean that _Alignof(double) == 16.
    That would suggest that there is a strong incentive for _Alignof(double
    _Complex) == 16, but that's a different issue.
     
    James Kuyper, Mar 29, 2014
    #37
  18. anish singh

    James Kuyper Guest

    Actually, that's precisely what double[2] is; and _Alignof(double[2])
    probably would be 16 on such a platform.
    While that's the only guarantee, an implementation is free to impose
    stricter alignment requirements on arrays of a type than on the type itself.
    Or you could use _Alignas(16), which is not an extension, but is new in
    C2011.
    That is true: 16 is not required to be a valid fundamental or extended
    alignment , and it is a constraint violation to specify _Alignas(n) when
    n is neither 0 nor a valid alignment value (6.7.5p3). That's why it's
    generally safer to specify _Alignas(type) rather than
    _Alignas(constant_expression).
     
    James Kuyper, Mar 29, 2014
    #38
  19. With regard to the complex type that glen mentioned:

    N1570 6.2.5p13:
    "Each complex type has the same representation and alignment
    requirements as an array type containing exactly two elements of the
    corresponding real type;"

    For structs, we know a struct's alignment has to be a positive multiple
    of the largest member's alignment, and we also know it can't be larger
    than the size of the struct, so the only possibilities here are 8 or 16.
    I don't know why an implementation might choose the latter, but I can't
    find anything in N1570 that says it isn't allowed to. So, correction noted.

    S
     
    Stephen Sprunk, Mar 29, 2014
    #39
  20. (then I wrote)
    Standard C doesn't care about speed at all, but users often do.

    Note that x86, back to the 8086, doesn't require alignment, but it
    is often faster if properly aligned. When the 80486 was popular,
    and four byte alignment of double was all that was needed.
    (A 32 bit system, with a 32 bit data bus.) C compilers, and more
    important most of the time, malloc() would generate four byte
    alignment.

    For way too long after the pentium became popular, C was still
    generating four byte alignment.
    Note as above, x86 doesn't require 8 byte alignment for doubles,
    so C might as well not do any padding, and malloc() might just
    as well return odd addresses.
    Or give up C and move onto other languages?
    But if you can't reliably generate them, that doesn't help much.
    And if it doesn't?

    -- glen
     
    glen herrmannsfeldt, Mar 29, 2014
    #40
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.