packed structs

Discussion in 'C Programming' started by JohnF, Sep 22, 2012.

  1. JohnF

    JohnF Guest

    Any >>portable<< way to accomplish that in c?
    Don't want to use __attribute__((__packed__))
    or #pragma pack, etc, nor #ifdef's to choose
    among whatever alternatives I happen to know
    about. It's a requirement that the code remain
    portable.

    In particular, I'm trying to write blocks that
    conform to a binary file format (gif), and can
    set up structs for them easily enough, but can't
    fwrite(blockstruct,sizeof(blockstruct),1,fileptr),
    or the like, due to blockstruct's inevitable
    padding (which indeed occurs for gif format blocks).

    At the moment, I just have a different func for
    each block type that writes out the members of that
    particular struct individually... b..o..r..i..n..g.
    A generalization of that idea (if portable packing's
    not possible) would also be fine: >>if<< there's some
    way to reference the members of a struct, passed as
    an argument but of unknown (to the func) type, in a
    loop, i.e., for(i=0;i<nmembers;i++)thisstruct->member.
    Then I could offsetof() and sizeof() each member, and
    write it out, so just one (much less boring) func
    could handle all the different block type structs.

    But, afaik, I don't think that thisstruct->member
    thing is possible, nor portable packing. So is there
    any "one size fits all" way to handle this problem?
    I'm sure people must come across it frequently enough
    that it's been thought about, and the best possible
    approach (among, possibly, several bad alternatives)
    has been identified.
    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 22, 2012
    #1
    1. Advertising

  2. JohnF

    Eric Sosman Guest

    On 9/21/2012 9:54 PM, JohnF wrote:
    > Any >>portable<< way to accomplish that in c?


    No.

    > Don't want to use __attribute__((__packed__))
    > or #pragma pack, etc, nor #ifdef's to choose
    > among whatever alternatives I happen to know
    > about. It's a requirement that the code remain
    > portable.
    >
    > In particular, I'm trying to write blocks that
    > conform to a binary file format (gif), and can
    > set up structs for them easily enough, but can't
    > fwrite(blockstruct,sizeof(blockstruct),1,fileptr),
    > or the like, due to blockstruct's inevitable
    > padding (which indeed occurs for gif format blocks).


    Aha! You don't need (or want) packed structs at all:
    You want a way to pluck information from a plain vanilla
    struct and write it in an externally-defined format. That's
    a horse of another kettle of colored fish (or something
    along those lines).

    > At the moment, I just have a different func for
    > each block type that writes out the members of that
    > particular struct individually... b..o..r..i..n..g.
    > A generalization of that idea (if portable packing's
    > not possible) would also be fine: >>if<< there's some
    > way to reference the members of a struct, passed as
    > an argument but of unknown (to the func) type, in a
    > loop, i.e., for(i=0;i<nmembers;i++)thisstruct->member.
    > Then I could offsetof() and sizeof() each member, and
    > write it out, so just one (much less boring) func
    > could handle all the different block type structs.


    One way is the function-per-struct-type approach, and
    although it may be "b..o..r..i..n..g" it has advantages
    that should not be dismissed lightly. Consider that there
    are (most likely) only a handful of structs and hence only
    a handful of functions; writing them won't take enough
    time to b..o..r..e anyone except an ADHD sufferer.

    Still, there's an alternative: You write one super-
    function that accepts a void* struct pointer and a "struct
    descriptor," usually an array of byte offsets and type codes:

    struct struct_descriptor {
    size_t offset;
    enum { BYTE, BYTEPAIR, BYTEQUAD, ..., STOP } type;
    };

    struct foo {
    int this; // to be written as four bytes
    int that; // to be written as one byte
    ...
    };

    const struct struct_descriptor foo_description[] = {
    { offsetof(struct foo, this), BYTEQUAD },
    { offsetof(struct foo, that), BYTE },
    ...
    { 0, STOP } };

    So: You set up one descriptor table per struct type, and you
    pass a struct pointer and the matching descriptor to the
    all-consuming writer function. (Note that the writer might do
    additional work with each field, like writing a multi-byte
    quantity in a format-specific endianness -- something no amount
    of portable or non-portable packing magic can manage.)

    All very neat and nice, but it has a drawback: The compiler
    doesn't know that foo_description[] and struct foo go together,
    so it won't complain if you make a mistake like

    struct foo mumble = ...;
    struct bar grumble = ...;
    ...
    writer(stream, &mumble, foo_description);
    writer(stream, &grumble, foo_description); // oops!

    .... and you will be stuck debugging the result. That's an
    area where the b..o..r..i..n..g approach has an advantage. But,
    hey: If you've got a hormonal insufficiency and need to give
    your adrenal glands extra exercise, a little terror may help.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 22, 2012
    #2
    1. Advertising

  3. JohnF

    JohnF Guest

    Eric Sosman <> wrote:
    > JohnF wrote:
    >> Any >>portable<< way to accomplish that in c?

    > No.
    >> In particular, I'm trying to write blocks that
    >> conform to a binary file format (gif), and can
    >> set up structs for them easily enough, but can't
    >> fwrite(blockstruct,sizeof(blockstruct),1,fileptr),
    >> or the like, due to blockstruct's inevitable
    >> padding (which indeed occurs for gif format blocks).

    >
    > Aha! You don't need (or want) packed structs at all:
    > You want a way to pluck information from a plain vanilla
    > struct and write it in an externally-defined format.
    > ...
    > You write one super-
    > function that accepts a void* struct pointer and a "struct
    > descriptor," usually an array of byte offsets and type codes:
    > struct struct_descriptor {
    > size_t offset;
    > enum { BYTE, BYTEPAIR, BYTEQUAD, ..., STOP } type; };


    Thanks, Eric (and whoever the other guy is). Both your suggestions
    are along the same lines, and, of course, already occurred to me.
    And, as per the drawbacks you pointed out (plus being a pain in
    the neck to maintain two structs per struct), already dismissed
    by me.

    I'd actually finished (what I think is) a more elegant solution,
    that I called smemf() that's like memcpy() but under format
    control, including additonal format specifiers for hex, for bits,
    and for other stuff. The code actually works fine, but still
    uncompleted is 723 lines (though that includes >>many<< comments),
    which is somewhat of a tail-wagging-dog situation which I also
    want to avoid.
    The point of smemf is that a single format string replaces
    that entire extra struct. So it's still "extra work for mother",
    but much less in-your-face when reading the program that uses it.
    Since it's not done (and the copyright not registered) yet,
    I'm not releasing it, but (since I can't copyright the idea,
    anyway) below is its (still somewhat incomplete) main comment block
    describing its usage and functional specs, in case anyone else wants
    to take a stab at implementation. Also, some stuff is done but not
    documented, e.g., little/big endian flag to control which way %d works,
    the bit-field specifier, etc. But you'll get the idea. And after that,
    the additional details are obvious to anyone who thinks about it.

    /* ==========================================================================
    * Function: smemf ( unsigned char *mem, char *format, ... )
    * Purpose: Construct a formatted block of memory, typically containing
    * binary data, e.g., network packets, gif images, etc.
    * Behaves much like (a subset of) sprintf, but the intent
    * to accommodate binary data requires a few significant
    * exceptions, as explained in the Notes section below.
    * --------------------------------------------------------------------------
    * Arguments: mem (O) (unsigned char *) to memory block
    * to be formatted.
    * format (I) (char *) to null-terminated string containing
    * specifications for how the variable arg list
    * following it should be formatted in mem.
    * See Notes below.
    * ... as many value args as needed to satisfy
    * the format specification above
    * --------------------------------------------------------------------------
    * Returns: ( int ) # bytes in returned mem,
    * 0 for any error.
    * --------------------------------------------------------------------------
    * Notes: o Like sprintf, mem must already be allocated by caller,
    * and be large enough to accommodate the accompanying
    * format specification argument.
    * o format is sprintf-like, but with some significant exceptions
    * and additions to facilitate smemf's different purpose.
    * The one most significant similarity and difference is...
    * * Like sprintf, conversion specifications are introduced by
    * the % character and terminated by a conversion specifier
    * (see list and discussions below).
    * * But unlike sprintf, ordinary characters occurring
    * outside conversion specifications aren't immediately
    * copied into mem...
    * * Instead, literals that you want formatted in mem
    * must always be followed by corresponding conversion
    * specifications.
    * * For example, 123abc%s formats the next >>six bytes<<
    * of mem with that >>ascii character<< string.
    * But 123abc%x interprets that same string as
    * >>hex digits<<, formatting the next >>three bytes<<
    * of mem accordingly.
    * * And that's why ordinary characters must be followed
    * by a corresponding conversion specification, i.e.,
    * because unlike sprintf, which always interprets ordinary
    * characters as ascii, smemf formats binary memory blocks,
    * too, and therefore needs a conversion specification
    * to interpret the intended meaning of literals.
    * * Additional notes:
    * - When %s isn't preceded by a literal field,
    * then smemf interprets the next argument from your
    * argument list, in the usual way like sprintf.
    * But 123abc%s uses no arguments from your argument list.
    * - Field widths like 123abc%10s generate a 10-byte field,
    * left-justified with your 123abc literal, and
    * right-filled with four blanks. See the field width
    * discussion below for additional information about
    * optional right-justification, non-blank filler, etc.
    * - Leading/trailing whitespace is ignored, so
    * " 123abc %s " is the same as "123abc%s", but any
    * embedded whitespace like "123 abc%s" is respected
    * (although "123 abc%x" would still be an error,
    * while " 123abc %x " becomes okay).
    * - Leading/trailing (or pure) whitespace is obtained
    * by surrounding the literal with its own quotes,
    * e.g., format = " \" 123abc \" %s " includes one blank
    * before and after 123abc.
    * - On the very rare occasion when you want a literal
    * quote character, escape it, i.e., smemf needs
    * to see \" in your format string, so you'd need to
    * write format = " 123\\\"abc %s " to actually format
    * the string 123"abc in mem. That's confusing, but
    * a straightforward application of the obvious rules.
    * o The conversion specifiers recognized by smemf are
    * s,S, x,X, d,D, all discussed in detail below.
    * Note that x,X behave identically, as do d,D,
    * but s,S have different behaviors, discussed in detail below.
    * But first, some general remarks...
    * * s,S,x,X are default left-justified, i.e., the first byte
    * (or first hex digit for x,X) from your literal or argument
    * goes into the next available byte (or hex digit) of mem.
    * But %+etc (i.e., a + flag following %) right-justifies
    * your literal or argument instead, e.g., 123abc%+10s
    * generates a 10-byte field, left-filled with four blanks,
    * then followed by your right-justified 123abc literal.
    * And 123abc%+10x generates a five-byte field, left-filled
    * with two leading 0 bytes, followed by three bytes
    * containing your 12,3A,BC.
    * o The s conversion specifier...
    * *
    * o The S conversion specifier...
    * o The x,X conversion specifier...
    * o The d,D conversion specifier...
    * * A literal 123%d is taken as the decimal integer 123,
    * or the argument for %d is taken as an int.
    * * Justification flags following % are ignored.
    * The bits comprising your int are "right-justified" in
    * your specified field, i.e., low-order bit is rightmost.
    * * A width %10d means 10 bits. But all fields are byte-sized,
    * and 10-bits is "promoted" to 2-bytes, with your int value
    * "right-justified" (explained above) in that 16-bit field.
    * However, if your int value is greater than 1023=2^10-1,
    * then your value is "truncated" to its low-order 10 bits,
    * even though that 16-bit field could accommodate more bits.
    * * If % width.precision d is given, precision is ignored.
    * * If width is not given, it defaults to 16(bits) if your
    * value is less than 65536, or to 32(bits) otherwise.
    * * Example: 47%d generates two bytes containing 00,2F.
    * And 47%17d generates three bytes 00,00,2F.
    * ======================================================================= */

    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 22, 2012
    #3
  4. On Sep 22, 4:23 am, Eric Sosman <> wrote:

    serializing structs

    >      One way is the function-per-struct-type approach, and
    > although it may be "b..o..r..i..n..g" it has advantages
    > that should not be dismissed lightly.  Consider that there
    > are (most likely) only a handful of structs and hence only
    > a handful of functions; writing them won't take enough
    > time to b..o..r..e anyone except an ADHD sufferer.


    I've dealt with cases where there were considerably more than a
    "handful"- communication protocol.

    Since I have a low b..o..r..e..d..o..m threshold I resorted to code
    generation. The protocol was defined by tables in a PDF document (ug).
    Copy-paste turned them into text and perl turned them into something
    easily processed. Most of the code was generated by people who didn't
    appear to mind writing tons of tedious boring repetitive code. There
    was a bit-banging library that did most of the heavy lifting.
    Nick Keighley, Sep 22, 2012
    #4
  5. JohnF

    JohnF Guest

    Nick Keighley <> wrote:
    > Eric Sosman <> wrote:
    >
    > serializing structs
    >
    >> One way is the function-per-struct-type approach, and
    >> although it may be "b..o..r..i..n..g" it has advantages
    >> that should not be dismissed lightly. ?Consider that there
    >> are (most likely) only a handful of structs and hence only
    >> a handful of functions; writing them won't take enough
    >> time to b..o..r..e anyone except an ADHD sufferer.

    >
    > I've dealt with cases where there were considerably more than a
    > "handful"- communication protocol.
    >
    > Since I have a low b..o..r..e..d..o..m threshold I resorted to code
    > generation. The protocol was defined by tables in a PDF document (ug).
    > Copy-paste turned them into text and perl turned them into something
    > easily processed. Most of the code was generated by people who didn't
    > appear to mind writing tons of tedious boring repetitive code. There
    > was a bit-banging library that did most of the heavy lifting.


    What do you think of my my smemf()-type of solution in other post?
    Not necessarily that particular solution, but the point being that
    this has got to be a pretty frequently occurring problem, and the
    only available solutions, like yours above, seem to be pretty
    awfully ugly. But it ain't rocket science -- there ought to be
    a way to deal with these situations elegantly. smemf() solves it
    with >>zero<< structs. Instead of a struct, you define the block
    or packet format with a sprintf-like format string, and then just
    smemf(buffer_for_block, format_string_describing_block_layout,
    data_for_field_1, data_for_field_2, ...);
    And I suppose there are other kinds of ways to deal with this
    whole class of problems, which wouldn't exist at all if some
    kind of packed structs were C standard. In any case, there
    should exist some standard practice for dealing with it.
    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 22, 2012
    #5
  6. JohnF

    BartC Guest

    "JohnF" <> wrote in message
    news:k3jmbi$qgf$...

    * Notes: o Like sprintf, mem must already be allocated by caller,
    > * o format is sprintf-like, but with some significant
    > exceptions
    > * * For example, 123abc%s formats the next >>six bytes<<
    > * But 123abc%x interprets that same string as


    <snip complicated ways of avoiding pragma pack()>

    This is similar to the situation of reading a writing a binary file,
    containing variable-size values.

    Then you might use functions such as inbyte() or outint() to read or build
    such data serially.

    But, once you've used your smemf() to create the data representing a packed
    struct, how do you access a field in a the middle?

    (With my file methods, I might use some seek-function to get to a particular
    offset, but with in-memory structs you'd expect to do so in a more efficient
    manner.)

    --
    Bartc
    BartC, Sep 22, 2012
    #6
  7. "BartC" <> writes:

    > "JohnF" <> wrote in message
    > news:k3jmbi$qgf$...
    >
    > * Notes: o Like sprintf, mem must already be allocated by caller,
    >> * o format is sprintf-like, but with some significant
    >> exceptions
    >> * * For example, 123abc%s formats the next >>six bytes<<
    >> * But 123abc%x interprets that same string as

    >
    > <snip complicated ways of avoiding pragma pack()>


    That misses half the problem. Even if you could rely on generating the
    right packed structure (particularly hard if there are bit-fields
    involved), you can't rely on getting the right representation. The most
    obvious problem being byte ordering in integer fields.

    <snip>
    --
    Ben.
    Ben Bacarisse, Sep 22, 2012
    #7
  8. JohnF

    Eric Sosman Guest

    On 9/22/2012 2:37 AM, JohnF wrote:
    > Eric Sosman <> wrote:
    >> ...
    >> You write one super-
    >> function that accepts a void* struct pointer and a "struct
    >> descriptor," usually an array of byte offsets and type codes:
    >> struct struct_descriptor {
    >> size_t offset;
    >> enum { BYTE, BYTEPAIR, BYTEQUAD, ..., STOP } type; };

    >[...]
    > I'd actually finished (what I think is) a more elegant solution,
    > that I called smemf() that's like memcpy() but under format
    > control, including additonal format specifiers for hex, for bits,
    > and for other stuff.[...]


    (Shrug.) Seems to me your approach could be feasible for
    structs with only a few elements, but would likely become very
    bulky with larger structs.

    /* My suggestion */
    writer(stream, &instance, descriptorArray);

    /* Your way */
    smemf(buffer, descriptorString,
    instance.this, instance.that, instance.tother,
    instance.x1, instance.x2, instance.y1, instance.y2,
    instance.namelength, instance.namestring);

    You've got the same opportunity for mismatch that I pointed out
    in my suggestion, *plus* the chance to mix up the individual
    fields. (Did you spot the error in my example? No? Ah, well,
    you see: It's supposed to be x1,y1,x2,y2, not x1,x2,y1,y2 --
    wasn't that obvious?)

    But, hey: Whatever floats your boat.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 22, 2012
    #8
  9. JohnF

    Jorgen Grahn Guest

    On Sat, 2012-09-22, JohnF wrote:
    > Nick Keighley <> wrote:
    >> Eric Sosman <> wrote:
    >>
    >> serializing structs
    >>
    >>> One way is the function-per-struct-type approach, and
    >>> although it may be "b..o..r..i..n..g" it has advantages
    >>> that should not be dismissed lightly. ?Consider that there
    >>> are (most likely) only a handful of structs and hence only
    >>> a handful of functions; writing them won't take enough
    >>> time to b..o..r..e anyone except an ADHD sufferer.

    >>
    >> I've dealt with cases where there were considerably more than a
    >> "handful"- communication protocol.
    >>
    >> Since I have a low b..o..r..e..d..o..m threshold I resorted to code
    >> generation. The protocol was defined by tables in a PDF document (ug).
    >> Copy-paste turned them into text and perl turned them into something
    >> easily processed. Most of the code was generated by people who didn't
    >> appear to mind writing tons of tedious boring repetitive code. There
    >> was a bit-banging library that did most of the heavy lifting.

    >
    > What do you think of my my smemf()-type of solution in other post?
    > Not necessarily that particular solution, but the point being that
    > this has got to be a pretty frequently occurring problem, and the
    > only available solutions, like yours above, seem to be pretty
    > awfully ugly.


    Didn't read very carefully, but it's like the Python 'struct' module,
    isn't it?

    http://docs.python.org/library/struct.html

    Well, that works well in Python and the only drawbacks I can see in C
    are:
    - you give up some type safety
    - you give up some speed
    - your output buffer must be big enough

    I don't know why I've never seen this done in C or C++ before.
    I'm happy with the b..o..r..i..n..g approach because I've never had
    many such message formats (or at least they have had so much common
    structure that it was manageable).

    ....
    > And I suppose there are other kinds of ways to deal with this
    > whole class of problems, which wouldn't exist at all if some
    > kind of packed structs were C standard.


    Packed structs would not solve the class of problems. You still have
    endianness, you still have variable-sized message formats, and so on.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Sep 22, 2012
    #9
  10. JohnF

    JohnF Guest

    Jorgen Grahn <> wrote:
    > JohnF wrote:
    >> Nick Keighley <> wrote:
    >>> Eric Sosman <> wrote:
    >>>
    >>> serializing structs
    >>>
    >>>> One way is the function-per-struct-type approach, and
    >>>> although it may be "b..o..r..i..n..g" it has advantages
    >>>> that should not be dismissed lightly. ?Consider that there
    >>>> are (most likely) only a handful of structs and hence only
    >>>> a handful of functions; writing them won't take enough
    >>>> time to b..o..r..e anyone except an ADHD sufferer.
    >>>
    >>> I've dealt with cases where there were considerably more than a
    >>> "handful"- communication protocol.
    >>>
    >>> Since I have a low b..o..r..e..d..o..m threshold I resorted to code
    >>> generation. The protocol was defined by tables in a PDF document (ug).
    >>> Copy-paste turned them into text and perl turned them into something
    >>> easily processed. Most of the code was generated by people who didn't
    >>> appear to mind writing tons of tedious boring repetitive code. There
    >>> was a bit-banging library that did most of the heavy lifting.

    >>
    >> What do you think of my my smemf()-type of solution in other post?
    >> Not necessarily that particular solution, but the point being that
    >> this has got to be a pretty frequently occurring problem, and the
    >> only available solutions, like yours above, seem to be pretty
    >> awfully ugly.


    > Didn't read very carefully, but it's like the Python 'struct' module,
    > isn't it?
    > http://docs.python.org/library/struct.html


    Thanks, Jorgen, that's >>excellent<<. Quick read suggests it's
    exactly like my smemf(), or vice versa since theirs clearly
    came first. I guess "great minds think alike". Of course,
    I guess idiots probably think alike, too... Okay, I've made
    my choice :)
    In any event, I'll read that much more carefully, and
    incorporate their inevitable improvements into my "functional
    spec", such as it is, and then see if it seems worth re-coding
    any changes and finishing it.

    > Well, that works well in Python and the only drawbacks I can see in C
    > are:
    > - you give up some type safety
    > - your output buffer must be big enough


    Yeah, yeah, programmers ought to be careful.
    Not much different than sprintf()'s potential pitfalls.

    > - you give up some speed


    Hardly a likely problem. This is typically for i/o,
    not some compute intensive loop. You'd have to be
    formatting god-knows-how-many GB's before anybody
    would notice the overhead.

    > I don't know why I've never seen this done in C or C++ before.


    Well, I'm inclined to re-do smemf()'s specs as close as possible
    to python's, except that their format strings are more different
    from C's than necessary. In any case, I'm sure their stuff deals
    with problems that hadn't occurred to me yet. So it'll be a great
    improvement to see precisely what they're doing.

    > I'm happy with the b..o..r..i..n..g approach because I've never had
    > many such message formats (or at least they have had so much common
    > structure that it was manageable).
    > ...
    >> And I suppose there are other kinds of ways to deal with this
    >> whole class of problems, which wouldn't exist at all if some
    >> kind of packed structs were C standard.

    >
    > Packed structs would not solve the class of problems. You still have
    > endianness, you still have variable-sized message formats, and so on.
    > /Jorgen


    Yeah, I wrote that "packed structs solves all" sentence too quickly,
    and it's wrong. smemf() and python pack(fmt,v1,v2,...) both already
    handle endianness. And smemf() implements "%*x" to pick up variable
    lengths from the arg list (my very quick glance at python doesn't
    show that yet, but I'll bet it's there somewheres I'll find later).
    You'd have to be more specific about "and so on", but that's also
    on my to-do list. Thanks again for the pointer,
    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 22, 2012
    #10
  11. JohnF

    JohnF Guest

    BartC <> wrote:
    > But, once you've used your smemf() to create the data
    > representing a packed struct, how do you access a field
    > in a the middle?


    You don't. This is to format output, so you already know
    what you're putting in, and don't need to re-read it.
    I'm writing a gif encoder, forkosh.com/gifsave89.html
    For a decoder, you would be reading those blocks,
    and then your problem is indeed a problem. I'd have
    to write the corresponding scan-like function to smemf()
    to deal with that.
    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 22, 2012
    #11
  12. JohnF

    Jorgen Grahn Guest

    On Sat, 2012-09-22, JohnF wrote:
    > Jorgen Grahn <> wrote:

    ....
    >> http://docs.python.org/library/struct.html


    >> - you give up some speed

    >
    > Hardly a likely problem. This is typically for i/o,
    > not some compute intensive loop. You'd have to be
    > formatting god-knows-how-many GB's before anybody
    > would notice the overhead.


    Now that you mention it, yes, it's probably like that. But I still
    suspect that's part of the reason you don't seen that design in real
    code. It's a reflex among C programmers: you don't invent interpreted
    mini-languages which require parsing at every call.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Sep 22, 2012
    #12
  13. Jorgen Grahn <> writes:
    [...]
    > Didn't read very carefully, but it's like the Python 'struct' module,
    > isn't it?
    >
    > http://docs.python.org/library/struct.html


    It's also reminiscent of Perl's built-in "pack" and "unpack" functions.

    http://perldoc.perl.org/functions/pack.html
    http://perldoc.perl.org/functions/unpack.html

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Sep 22, 2012
    #13
  14. JohnF

    BartC Guest

    "JohnF" <> wrote in message
    news:k3klnr$lq7$...
    > Jorgen Grahn <> wrote:


    >> Didn't read very carefully, but it's like the Python 'struct' module,
    >> isn't it?
    >> http://docs.python.org/library/struct.html


    > Well, I'm inclined to re-do smemf()'s specs as close as possible
    > to python's, except that their format strings are more different
    > from C's than necessary. In any case, I'm sure their stuff deals
    > with problems that hadn't occurred to me yet. So it'll be a great
    > improvement to see precisely what they're doing.


    The main problem Python has to deal with is that the language doesn't have
    structs.

    --
    Bartc
    BartC, Sep 22, 2012
    #14
  15. JohnF

    Ian Collins Guest

    On 09/22/12 08:53 PM, JohnF wrote:
    > Nick Keighley<> wrote:
    >> Eric Sosman<> wrote:
    >>
    >> serializing structs
    >>
    >>> One way is the function-per-struct-type approach, and
    >>> although it may be "b..o..r..i..n..g" it has advantages
    >>> that should not be dismissed lightly. ?Consider that there
    >>> are (most likely) only a handful of structs and hence only
    >>> a handful of functions; writing them won't take enough
    >>> time to b..o..r..e anyone except an ADHD sufferer.

    >>
    >> I've dealt with cases where there were considerably more than a
    >> "handful"- communication protocol.
    >>
    >> Since I have a low b..o..r..e..d..o..m threshold I resorted to code
    >> generation. The protocol was defined by tables in a PDF document (ug).
    >> Copy-paste turned them into text and perl turned them into something
    >> easily processed. Most of the code was generated by people who didn't
    >> appear to mind writing tons of tedious boring repetitive code. There
    >> was a bit-banging library that did most of the heavy lifting.

    >
    > What do you think of my my smemf()-type of solution in other post?
    > Not necessarily that particular solution, but the point being that
    > this has got to be a pretty frequently occurring problem, and the
    > only available solutions, like yours above, seem to be pretty
    > awfully ugly. But it ain't rocket science -- there ought to be
    > a way to deal with these situations elegantly. smemf() solves it
    > with>>zero<< structs.


    having worked on numerous on the wire protocols, I've encountered this
    problem many times. The most practical solution for all but the most
    trivial cases is to code generate the formatting code. The source for
    the code generator can either be C (or C++ if inheritance helps) code or
    some other easy to parse format. This is one case where I prefer XML
    (often in the form of an OpenOffice document) as the "other easy to
    parse format".

    --
    Ian Collins
    Ian Collins, Sep 22, 2012
    #15
  16. JohnF

    JohnF Guest

    Keith Thompson <> wrote:
    > Jorgen Grahn <> writes:
    > [...]
    >> Didn't read very carefully, but it's like the Python 'struct' module,
    >> isn't it?
    >>
    >> http://docs.python.org/library/struct.html

    >
    > It's also reminiscent of Perl's built-in "pack" and "unpack" functions.
    > http://perldoc.perl.org/functions/pack.html
    > http://perldoc.perl.org/functions/unpack.html


    Thanks, Keith (and Jorgen again), that's also incredibly useful.
    Lots of food for thought to spec out a C variant. And yet another
    format/template to peruse. I really should put some effort into
    learning these "little languages".
    Despite BartC's remark, "The main problem Python has to deal
    with is that the language doesn't have structs", I think this
    kind of pack function has valuable uses in C (along with a
    scan-like unpack, as suggested by BartC's other remark),
    e.g., formatting (and reading) binary blocks/packets/whatever,
    which is what brought the idea to my mind. And the fact that
    all these little languages have pack/unpack just supports
    the notion they're useful funcs. Moreover, I'm sure you can
    see they're no great big deal to code. So why not do it
    (I'm in the process, but that's no reason others shouldn't
    do it differently/better/whatever)? The naysayers can just
    ignore it. I'm sure everybody has their favorite C feature
    they choose not to use.
    Finally, regarding Ian's remarks, pack/unpack would be
    a >>lightweight<< alternative to accomplish this kind of
    task. Ian's way involves importing additional tool dependencies,
    possibly turning what could be just a few lines of code
    into a subtask unto itself. For his large project case,
    involving lots and lots of different block formats,
    that may be the best approach (I've worked with Swift
    at Bankers Trust, albeit a while back, and also with various
    ticker feeds, etc, etc, and more etc). But that's no reason
    not to have a lightweight alternative.
    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 23, 2012
    #16
  17. JohnF wrote:
    > Eric Sosman<> wrote:
    >> JohnF wrote:
    >>> Any>>portable<< way to accomplish that in c?

    >> No.
    >>> In particular, I'm trying to write blocks that
    >>> conform to a binary file format (gif), and can
    >>> set up structs for them easily enough, but can't
    >>> fwrite(blockstruct,sizeof(blockstruct),1,fileptr),
    >>> or the like, due to blockstruct's inevitable
    >>> padding (which indeed occurs for gif format blocks).

    >>
    >> Aha! You don't need (or want) packed structs at all:
    >> You want a way to pluck information from a plain vanilla
    >> struct and write it in an externally-defined format.
    >> ...
    >> You write one super-
    >> function that accepts a void* struct pointer and a "struct
    >> descriptor," usually an array of byte offsets and type codes:
    >> struct struct_descriptor {
    >> size_t offset;
    >> enum { BYTE, BYTEPAIR, BYTEQUAD, ..., STOP } type; };

    >
    > Thanks, Eric (and whoever the other guy is). Both your suggestions
    > are along the same lines, and, of course, already occurred to me.
    > And, as per the drawbacks you pointed out (plus being a pain in
    > the neck to maintain two structs per struct), already dismissed
    > by me.
    >
    > I'd actually finished (what I think is) a more elegant solution,
    > that I called smemf() that's like memcpy() but under format
    > control, including additonal format specifiers for hex, for bits,
    > and for other stuff. The code actually works fine, but still
    > uncompleted is 723 lines (though that includes>>many<< comments),
    > which is somewhat of a tail-wagging-dog situation which I also
    > want to avoid.
    > The point of smemf is that a single format string replaces
    > that entire extra struct. So it's still "extra work for mother",
    > but much less in-your-face when reading the program that uses it.
    > Since it's not done (and the copyright not registered) yet,
    > I'm not releasing it, but (since I can't copyright the idea,
    > anyway) below is its (still somewhat incomplete) main comment block
    > describing its usage and functional specs, in case anyone else wants
    > to take a stab at implementation. Also, some stuff is done but not
    > documented, e.g., little/big endian flag to control which way %d works,
    > the bit-field specifier, etc. But you'll get the idea. And after that,
    > the additional details are obvious to anyone who thinks about it.
    >
    > /* ==========================================================================
    > * Function: smemf ( unsigned char *mem, char *format, ... )
    > * Purpose: Construct a formatted block of memory, typically containing
    > * binary data, e.g., network packets, gif images, etc.
    > * Behaves much like (a subset of) sprintf, but the intent
    > * to accommodate binary data requires a few significant
    > * exceptions, as explained in the Notes section below.
    > * --------------------------------------------------------------------------
    > * Arguments: mem (O) (unsigned char *) to memory block
    > * to be formatted.
    > * format (I) (char *) to null-terminated string containing
    > * specifications for how the variable arg list
    > * following it should be formatted in mem.
    > * See Notes below.
    > * ... as many value args as needed to satisfy
    > * the format specification above
    > * --------------------------------------------------------------------------
    > * Returns: ( int ) # bytes in returned mem,
    > * 0 for any error.
    > * --------------------------------------------------------------------------
    > * Notes: o Like sprintf, mem must already be allocated by caller,
    > * and be large enough to accommodate the accompanying
    > * format specification argument.
    > * o format is sprintf-like, but with some significant exceptions
    > * and additions to facilitate smemf's different purpose.
    > * The one most significant similarity and difference is...
    > * * Like sprintf, conversion specifications are introduced by
    > * the % character and terminated by a conversion specifier
    > * (see list and discussions below).
    > * * But unlike sprintf, ordinary characters occurring
    > * outside conversion specifications aren't immediately
    > * copied into mem...
    > * * Instead, literals that you want formatted in mem
    > * must always be followed by corresponding conversion
    > * specifications.
    > * * For example, 123abc%s formats the next>>six bytes<<
    > * of mem with that>>ascii character<< string.
    > * But 123abc%x interprets that same string as
    > *>>hex digits<<, formatting the next>>three bytes<<
    > * of mem accordingly.
    > * * And that's why ordinary characters must be followed
    > * by a corresponding conversion specification, i.e.,
    > * because unlike sprintf, which always interprets ordinary
    > * characters as ascii, smemf formats binary memory blocks,
    > * too, and therefore needs a conversion specification
    > * to interpret the intended meaning of literals.
    > * * Additional notes:
    > * - When %s isn't preceded by a literal field,
    > * then smemf interprets the next argument from your
    > * argument list, in the usual way like sprintf.
    > * But 123abc%s uses no arguments from your argument list.
    > * - Field widths like 123abc%10s generate a 10-byte field,
    > * left-justified with your 123abc literal, and
    > * right-filled with four blanks. See the field width
    > * discussion below for additional information about
    > * optional right-justification, non-blank filler, etc.
    > * - Leading/trailing whitespace is ignored, so
    > * " 123abc %s " is the same as "123abc%s", but any
    > * embedded whitespace like "123 abc%s" is respected
    > * (although "123 abc%x" would still be an error,
    > * while " 123abc %x " becomes okay).
    > * - Leading/trailing (or pure) whitespace is obtained
    > * by surrounding the literal with its own quotes,
    > * e.g., format = " \" 123abc \" %s " includes one blank
    > * before and after 123abc.
    > * - On the very rare occasion when you want a literal
    > * quote character, escape it, i.e., smemf needs
    > * to see \" in your format string, so you'd need to
    > * write format = " 123\\\"abc %s " to actually format
    > * the string 123"abc in mem. That's confusing, but
    > * a straightforward application of the obvious rules.
    > * o The conversion specifiers recognized by smemf are
    > * s,S, x,X, d,D, all discussed in detail below.
    > * Note that x,X behave identically, as do d,D,
    > * but s,S have different behaviors, discussed in detail below.
    > * But first, some general remarks...
    > * * s,S,x,X are default left-justified, i.e., the first byte
    > * (or first hex digit for x,X) from your literal or argument
    > * goes into the next available byte (or hex digit) of mem.
    > * But %+etc (i.e., a + flag following %) right-justifies
    > * your literal or argument instead, e.g., 123abc%+10s
    > * generates a 10-byte field, left-filled with four blanks,
    > * then followed by your right-justified 123abc literal.
    > * And 123abc%+10x generates a five-byte field, left-filled
    > * with two leading 0 bytes, followed by three bytes
    > * containing your 12,3A,BC.
    > * o The s conversion specifier...
    > * *
    > * o The S conversion specifier...
    > * o The x,X conversion specifier...
    > * o The d,D conversion specifier...
    > * * A literal 123%d is taken as the decimal integer 123,
    > * or the argument for %d is taken as an int.
    > * * Justification flags following % are ignored.
    > * The bits comprising your int are "right-justified" in
    > * your specified field, i.e., low-order bit is rightmost.
    > * * A width %10d means 10 bits. But all fields are byte-sized,
    > * and 10-bits is "promoted" to 2-bytes, with your int value
    > * "right-justified" (explained above) in that 16-bit field.
    > * However, if your int value is greater than 1023=2^10-1,
    > * then your value is "truncated" to its low-order 10 bits,
    > * even though that 16-bit field could accommodate more bits.
    > * * If % width.precision d is given, precision is ignored.
    > * * If width is not given, it defaults to 16(bits) if your
    > * value is less than 65536, or to 32(bits) otherwise.
    > * * Example: 47%d generates two bytes containing 00,2F.
    > * And 47%17d generates three bytes 00,00,2F.
    > * ======================================================================= */
    >


    There is already a library which does very similar stuff.
    It is called libtpl.
    Johann Klammer, Sep 23, 2012
    #17
  18. JohnF

    Ian Collins Guest

    On 09/23/12 12:19 PM, JohnF wrote:

    > Finally, regarding Ian's remarks, pack/unpack would be
    > a>>lightweight<< alternative to accomplish this kind of
    > task. Ian's way involves importing additional tool dependencies,
    > possibly turning what could be just a few lines of code
    > into a subtask unto itself.


    The up front effort is much the same. In your case you have to code the
    pack and unpack functions, in mine I had to write a simple code
    generator. In both cases, once it's done it's done. While the code
    generator can be simple, mine has evolved to support enum types and
    inheritance, but its still only a couple of hundred lines of code. A
    colleague on a previous project write all the transforms in XLST, but
    that made my head explode.

    In coding, I add a table to an OpenOffice document (part of the project
    documentation, so it's dual use) while you write a format string.
    Probably similar effort although I'd say the table approach is less
    error prone.

    At run time, I have optimised code to perform the packing and unpacking
    while your functions have to interpret the format string each time they
    are called. If you have to handle a lot of traffic, especially on a low
    powered device, that can be a killer.

    --
    Ian Collins
    Ian Collins, Sep 23, 2012
    #18
  19. JohnF

    JohnF Guest

    Johann Klammer <1.net> wrote:
    > JohnF wrote:
    >>
    >> /* =======================================================================
    >> * Function: smemf ( unsigned char *mem, char *format, ... )
    >> * Purpose: Construct a formatted block of memory, typically containing
    >> * binary data, e.g., network packets, gif images, etc.
    >> * Behaves much like (a subset of) sprintf, but the intent
    >> * to accommodate binary data requires a few significant
    >> * exceptions, as explained in the Notes section below.
    >> <<snip>>

    >
    > There is already a library which does very similar stuff.
    > It is called libtpl.


    Thanks for the pointer, Johann.
    http://tpl.sourceforge.net/userguide.html
    Also looks interesting, though less similar than pack/unpack
    for python and perl, pointed out previously. Nevertheless,
    certainly valuable additional food for thought. I'll certainly
    study them all, and then try to spec out (what seems to me like)
    the best C variant.
    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 23, 2012
    #19
  20. JohnF

    JohnF Guest

    Ian Collins <> wrote:
    > JohnF wrote:
    >
    >> Finally, regarding Ian's remarks, pack/unpack would be
    >> a lightweight alternative to accomplish this kind of
    >> task. Ian's way involves importing additional tool dependencies,
    >> possibly turning what could be just a few lines of code
    >> into a subtask unto itself.

    >
    > The up front effort is much the same. <<snip>>


    Yeah, well, no biggie. There's no reason not to have
    two or several different ways to approach the same problem,
    and let the programmer decide what he prefers in a
    particular situation. This isn't "either/or", it's "both".
    --
    John Forkosh ( mailto: where j=john and f=forkosh )
    JohnF, Sep 23, 2012
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patricia  Van Hise

    structs with fields that are structs

    Patricia Van Hise, Apr 5, 2004, in forum: C Programming
    Replies:
    5
    Views:
    633
    Al Bowers
    Apr 5, 2004
  2. Chris Hauxwell

    const structs in other structs

    Chris Hauxwell, Apr 23, 2004, in forum: C Programming
    Replies:
    6
    Views:
    555
    Chris Hauxwell
    Apr 27, 2004
  3. Paminu
    Replies:
    5
    Views:
    637
    Eric Sosman
    Oct 11, 2005
  4. Daniel Rudy
    Replies:
    15
    Views:
    1,385
    Keith Thompson
    Apr 10, 2006
  5. Tuan  Bui
    Replies:
    14
    Views:
    470
    it_says_BALLS_on_your forehead
    Jul 29, 2005
Loading...

Share This Page