Struct with unaligned fields

Discussion in 'C Programming' started by James Harris, Aug 22, 2013.

  1. James Harris

    James Harris Guest

    A file allocation table is an on-disk structure that was designed in the
    days before people took much care with field alignment. FAT file systems are
    still prevalent today. I need to write some code to work with such
    structures so have to deal with fields that will not be aligned.

    To illustrate, the FAT header starts like this:

    8-byte BS_OEMName
    2-byte BPB_BytsPerSec
    1-byte BPB_SecPerClus
    2-byte BPB_RsvdSecCnt
    ...

    As you can see, it is impossible to align the last of those fields,
    BPB_RsvdSecCnt. The same is true of some of the fields that follow. (For
    anyone who is interested the full list of fields can be seen in a document
    called fatgen103. Copies are freely available on the internet.)

    I've seen the c faq at http://c-faq.com/struct/padding.html and know that
    there are no ideal solutions. This post is to ask for suggestions as to how
    to address this well and portably.

    I could do something basic and slow but what makes this an interesting
    challenge is that: 1) many of the fields *will* be naturally aligned so
    don't need any special access mechanism and 2) some of the machines the code
    will run on will support unaligned accessed and some will not. It would be
    good to use the same C source to run on any type of machine and get the
    compiler to emit the faster code where it can.

    Anyone already addressed the same sort of issue? I have some ideas but is
    there a standard approach?

    James
     
    James Harris, Aug 22, 2013
    #1
    1. Advertisements

  2. The FAT-12 entries were designed to be easy to address using
    the 8080 or 8086 (or most other 8 bit processor) instructions.

    Not so convenient as bit fields in C.

    -- glen
     
    glen herrmannsfeldt, Aug 22, 2013
    #2
    1. Advertisements

  3. James Harris

    James Kuyper Guest

    If portable is your goal, structures are NOT the right tool to achieve
    it, and that article gives one of main reasons why.

    On the other hand, I find it hard to imaging that a tool which works
    with FATs would need to be very portable. It could only be useful on
    systems which use FATs; I wouldn't expect C implementations targeting
    such systems to have a lot of variation in how they deal with alignment
    issue.
    The new (C2011) _Alignof() field can be used to check the alignment
    requirements of the relevant data types, and you can use it to switch
    between faster code that will be slow, or not work at all if the data is
    misaligned, or slower code that will work whether or not the data is
    misaligned. _Alignof() is, unfortunately, not available for use in #if
    conditions, because types, and therefore alignment, do not exist yet
    during translation phase 4, when #ifs are evaluated.
    It's probably simpler to just write only the slower code; I wouldn't
    recommend assuming that it's a lot slower - test the simpler approach
    first, and bother with optimization only if you find out that it
    actually is significantly slower. A sufficiently smart optimizer might
    even convert the slow code into the equivalent of the fast code, on
    systems where it is in fact faster.
    Yes, I have - by the use of the methods that are probably similar to the
    ones you've already dismissed as "slow".
     
    James Kuyper, Aug 22, 2013
    #3
  4. Well, the portable solution is to do single-byte reads of unaligned
    fields. A sufficiently clever compiler would, assuming the target
    supports it, convert it to a single unaligned read. I'm not sure if any
    existing compilers are that clever, though.

    A common solution is to use a macro for unaligned reads, and then use
    some external knowledge (e.g. autoconf) to determine whether to define
    it as a single unaligned read or as multiple reads with shifting.

    Note that some systems may support unaligned reads but the performance
    penalty is significant enough that shifting is still faster.

    S
     
    Stephen Sprunk, Aug 22, 2013
    #4
  5. James Harris

    Jorgen Grahn Guest

    This comes up often in this group.

    As far as I'm concerned, the standard approach for all binary I/O is
    not to pretend the I/O buffers map to the in-memory representation of
    some C data structure. Treat it as a buffer of N 8-bit octets instead
    if that is what it is:

    // assuming uint8_t is available, and that
    // FAT uses little-endian encoding
    const uint8_t * p = something();
    p += 8;
    unsigned bps = get16_le(p); p += 2;
    unsigned spc = *p++;
    unsigned blahblah = get16_le(p); p += 2;

    You mention not wanting to do something "basic and slow" ... well, the
    above is basic and /should/ be slow -- but (a) is that still true
    today, with caches and stuff, and (b) does it really matter to your
    application?

    /Jorgen
     
    Jorgen Grahn, Aug 22, 2013
    #5
  6. James Harris

    James Harris Guest

    You mean that there's no guarantee of how a struct's fields are laid out? If
    not, I know that's at least one potential issue. AIUI we can only rely on
    the first field of a a struct being at offset zero - i.e. having the same
    address as the struct itself, and the offset of other fields is not defined
    or is implementation dependent or whatever - a bit of hand waving but
    whatever the correct term the effect is the same, i.e. their offsets cannot
    be relied upon...?

    The thing is - or the things are: 1) Structs are frequently used in C code
    for many structures of the types I have to deal with and I have no choice as
    to padding. The layout of fields is defined by the hardware or prior
    standards. And 2) C has nothing else that's particularly helpful for
    describing storage layout. So this type of code tends to rely on the
    compilers laying out structs as wanted (as long as some sensible rules are
    followed).

    Perhaps I could use some form of assertion to check structure sizes or
    similar. That could at least throw an error of some sort if things were not
    as they should be.

    (I also need to work with specific endianness but that's another topic and I
    don't want to go off-topic in this thread. Suffice to say I am aware that
    the two things - alignment and endianness - need to be correct.)
    That's sensible However, you could consider a vfat filesystem as part of a
    disk image file. I was working with one today (or trying to) on an Arm CPU
    and an x86 (as a user, not at the programming level). I'm not sure whether
    the Arm hardware requires alignment or not but I think the example
    illustrates that a FAT file system could still be used on machines which
    have none of the hardware or operating systems one normally associates with
    such file structures.

    James
     
    James Harris, Aug 22, 2013
    #6
  7. James Harris

    James Kuyper Guest

    There are some guarantees, but not enough to be useful:

    "11 An implementation may allocate any addressable storage unit large
    enough to hold a bitfield. If enough space remains, a bit-field that
    immediately follows another bit-field in a structure shall be packed
    into adjacent bits of the same unit. If insufficient space remains,
    whether a bit-field that does not fit is put into the next unit or
    overlaps adjacent units is implementation-defined. The order of
    allocation of bit-fields within a unit (high-order to low-order or
    low-order to high-order) is implementation-defined. The alignment of the
    addressable storage unit is unspecified."
    "12 A bit-field declaration with no declarator, but only a colon and a
    width, indicates an unnamed bit-field. As a special case, a bit-field
    structure member with a width of 0 indicates that no further bit-field
    is to be packed into the unit in which the previous bitfield,
    if any, was placed."
    .....
    "14 Each non-bit-field member of a structure or union object is aligned
    in an implementation defined manner appropriate to its type.
    "15 Within a structure object, the non-bit-field members and the units
    in which bit-fields reside have addresses that increase in the order in
    which they are declared. A pointer to a structure object, suitably
    converted, points to its initial member (or if that member is a
    bit-field, then to the unit in which it resides), and vice versa. There
    may be unnamed padding within a structure object, but not at its
    beginning."
    ....
    "17 There may be unnamed padding at the end of a structure or union."
    (6.7.2.1p11-12,14-15,17)

    Pay careful attention to how many things are "implementation-defined" or
    "unspecified". Notice all of the words that give implementors additional
    freedom: "any", "appropriate", "increase", "suitably", "may".
    All of that's true - but don't expect it to be portable. Where it works,
    it's convenient - but be aware of the fact that it's not required to work.
    If there were an authoritative statement about how "things should be",
    such an assert would be unnecessary - thing either would be that way, or
    it wouldn't be a conforming implementation. The right phrase would be
    "as I expected". However, such an assertion needs to incorporate all the
    details about what you were expecting that is not actually guaranteed to
    be true. In order to make use of structs for this purpose, that's a LOT
    of details, no matter what it was you were expecting.
    You're still talking about an x86 platform- there's only a limited
    amount of variation in how alignment is handled by C compilers targeting
    x86 machines (at least, it's limited by comparison with the rest of the
    C world). Find out what the precise range of variation is (I don't know,
    and a newsgroup not targeted at x86 machines is not a good place to find
    out), and write your code to deal with that range of variation.

    My own code has portability requirements that would render it pointless
    to worry about such things. I must extract fields byte by byte,
    reconstructing the value that multi-byte fields represent, and using
    mask and shift operations to extract the equivalent of bit-fields; no
    simpler approach will work on all the platforms my code is required to
    be portable to.
     
    James Kuyper, Aug 22, 2013
    #7
  8. James Harris

    BartC Guest

    I've used things such as #pragma pack(1). It only needs to be portable
    across the compilers that are going to be used.

    Another way might be to use a byte (ie. char) array, and to use type-punning
    to extract the fields that are wider than one-byte. But you need to
    consider, with both of these, that some accesses could be misaligned when
    the data is in memory.

    Since this is an on-disk format, and disk operations are slow compared with
    memory, it doesn't seem unreasonable to unpack (to a properly aligned and
    padded struct) and repack this data on reads and writes from/to disk.
     
    BartC, Aug 22, 2013
    #8
  9. James Harris

    Siri Cruise Guest

    Some C compilers (such as gcc) support packed (unaligned) structs. If you don't
    mind being locked into such a compiler with nonportable code, that's an easy
    solution.
     
    Siri Cruise, Aug 22, 2013
    #9
  10. James Harris

    Eric Sosman Guest

    One standard approach is to use an array of unsigned char
    for I/O, use a struct arranged by the local compiler for dealing
    with the object internally, and to write a pair of "encode" and
    "decode" functions. (Note that these functions can be highly
    portable, and needn't require rewriting unless you encounter a
    system that has no 8-bit integer type.)

    The first approach can become awkward if the external data
    format re-uses the same spans of bytes for different purposes
    that are not easily detectable by the encoder and decoder. (For
    example, maybe the format of an entry varies depending on something
    in a different entry that's related to it.) In such cases another
    approach is fairly standard: You keep the actual data bytes hidden
    away as opaque data, and write accessor functions for the various
    items within it. That is, you write encoders and decoders for each
    datum instead of a single "wholesale" pair. Again, the accessors
    can be written very portably.

    You make two mentions of speed in connection with processing
    these foreign-format items, but I think your concern is misplaced.
    Where are these FAT entries coming from and going to, after all?
    How many instructions can your CPU execute in the time required to
    read one FAT entry? Maybe if they're pouring in from an SSD at
    6Gbit/sec (who'd use FAT on such a beast?) you might see a delay,
    but ... (Besides, at that rate you'd suck in 4GB in less than six
    seconds; a moment's pause would be welcome anyhow.)

    When considering "something basic and slow," don't be put off
    by the "slow:" It's almost certainly so far down in the noise you'd
    have a hard time finding it. Rather, look at the "basic" and think
    about the virtues of simplicity, and how much you'd love trying to
    recover a file system trashed by code that's too clever by half.
     
    Eric Sosman, Aug 22, 2013
    #10
  11. [...]

    One thing to watch out for: at least for gcc, #pragma pack lets you
    access misaligned struct members by name, but if you take a pointer to
    such a member, dereferencing that pointer can cause your program to
    crash (not on x86, but on other systems).

    See http://stackoverflow.com/q/8568432/827263 for more discussion of
    this.
     
    Keith Thompson, Aug 22, 2013
    #11
  12. There are *some* guarantees.

    To summarize (ignoring bitfields):

    - The first declared member is at offset 0.
    - Members are allocated in declared order.
    - Each member is big enough to hold an object of its type.
    - Each member's offset satisfies its type's required alignment.
    - There are zero or more bytes of padding after each member.

    So a conforming compiler *could* insert 42 bytes of padding between
    x and y in `struct { char x; char y; }`.

    In practice, the padding after a member will be the minimum needed to
    satisfy the required alignment for the following member's type, and
    padding at the end will be the minimum needed to satisfy the alignment
    requirement for the most strictly aligned member (or *maybe* slighly
    more if the compiler decides, say, that all structures should be
    word-aligned -- but gcc, for example, doesn't do this).

    To determine the alignment of a type, use _Alignof if you have a C11
    compiler. If not:

    #define ALIGNOF(type) ((offsetof(struct {char c; type t;}, t)))

    is likely to give consistent and correct results.
     
    Keith Thompson, Aug 22, 2013
    #12
  13. Yes, but just about every system now has to support FAT.

    It is commonly used for USB flash drives, and, similarly,
    for the file system in digital cameras.

    Even without those, there would be use in supporting it so
    that disk images could be read on other systems.

    -- glen
     
    glen herrmannsfeldt, Aug 22, 2013
    #13
  14. James Harris

    Les Cargill Guest

    It is not impossible.
    GNU in general seems to always support #pragma pack. This
    includes a push/pop semantic that would be ideal for bracketing struct
    declarations with.

    http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html

    There still may be endian issues I haven't thought through.
    I have yet to see a case where I could not control packing
    properly, so long as the compiler has furniture for it.

    Still, it may be better to make something table driven. You would
    possibly need a new table for each architecture. And I would not
    be above a general serialization routine to do the right memmove()
    and memcpy() operations to pack an unpacked struct from a chunk of bytes.
     
    Les Cargill, Aug 23, 2013
    #14
  15. James Harris

    Les Cargill Guest

    FAT32 is usually used on USB memory sticks. Those plug in a to a
    lot of devices.
     
    Les Cargill, Aug 23, 2013
    #15
  16. James Harris

    Les Cargill Guest

    This is a library call, and possibly an ioctl(). So no,
    no compiler is smart enough.
     
    Les Cargill, Aug 23, 2013
    #16
  17. That may not solve the unaligned access issue, though.
    FAT originated on x86, so all multibyte fields are little-endian.

    While POSIX provides convenient functions to convert to/from big-endian
    form, which one can count on to be highly optimized (e.g. to no-ops on
    big-endian machines), I'm not aware of any corresponding standardized
    functions for little-endian form. Performance is unlikely to be a real
    problem in light of glacial disk I/O speeds, though.

    S
     
    Stephen Sprunk, Aug 23, 2013
    #17
  18. James Harris

    James Harris Guest

    This sounds good but how would it be coded in practice for the two cases? I
    think I can see the read and shift case (cast to an unsigned char pointer
    then dereference as a byte array) but am not sure about the one which should
    result in no machine code changes.
    I am not sure that would apply generally. I would have thought that it would
    normally be faster to let the hardware align bytes early in the pipeline
    than to do so explicitly via registers. Also, if a separate register is
    required for the shift and merge it adds to register pressure. Finally,
    using software to align a four-byte field would involve some partial
    register updates which can be slower. All-in-all, if the hardware supports
    realignment it is probably a good thing to exploit in most cases.

    James
     
    James Harris, Aug 23, 2013
    #18
  19. James Harris

    James Harris Guest

    The data may have come from a disk initially but a read in this context is
    not a read from disk but from memory so no library call or ioctl() would be
    needed.

    James
     
    James Harris, Aug 23, 2013
    #19
  20. Off the cuff and probably buggy, but shows the general idea:

    #if (defined(ALLOW_UNALIGNED) && defined(LITTLE_ENDIAN))
    # define get_uint16_le(p) (*(uint16_t *)p)
    #else
    # define get_uint16_le(p) (*(uint8_t *)p | *((uint8_t *)p+1) << 8)
    #endif

    The latter is a common enough idiom that a compiler may recognize it and
    substitute the former if it works better for the current target. Some
    may even be smart enough to do the former for aligned reads but the
    latter for unaligned reads, regardless of which you write, so it may be
    a moot point entirely.
    Hence the "may"; it's worth testing, at least.
    Both could be either true or false, depending on the particular
    implementation, the surrounding code, what is currently cached, etc.
    Are you referring to partial register stalls? AIUI, that only happens
    when two partial-register values are combined into one full-register
    value, which doesn't happen in this case.

    For instance, this idiom doesn't trigger a stall:

    uint8_t a, b, c, d;
    uint32_t x = (a << 24) | (b << 16) | (c << 8) | (d);

    Each byte is zero-extended to full width, shifted, and then or'd with
    another full register. The zero-extension itself does not create a
    stall because it depends on only _one_ partial register, not two.
    It may be; it's worth testing. Even if the shift form is faster, the
    loss in readability may outweigh it. Memory (not to mention disk) is so
    slow these days, and OOO mechanisms are so capable, that low-level
    details like this are probably irrelevant anyway.

    S
     
    Stephen Sprunk, Aug 23, 2013
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.