Re: Structure with unsigned chars and internal alignment

Discussion in 'C Programming' started by Eric Sosman, Aug 31, 2011.

  1. Eric Sosman

    Eric Sosman Guest

    On 8/30/2011 6:05 PM, pozz wrote:
    > I have a struct composed by two arrays of unsigned char.
    >
    > struct myStruct {
    > unsigned char field1[2];
    > unsigned char field2[30];
    > };
    >
    > Is myStruct *always* 32 bytes long? Is field2 *always* starting after
    > two bytes the pointer to myStruct (i.e., no padding is allowed between
    > field1 and field2)?


    No and no. Many compilers will lay out the struct as you hope,
    but none are under any obligation to do so. If you want thirty-two
    consecutive bytes, consider `unsigned char field_both[32]'.

    > In this case, in my application I'd like to read the size of myStruct
    > (between 3 and 32) from a file. field1 will be always 2-bytes long,
    > field2 will be the size of myStruct minus 2 bytes of field1 (in the case
    > no padding is present between field1 and field2).


    You probably do *not* want to read "the size of myStruct" from
    the file; you want to read "the size of some blob of data." The two
    environments (on-disk form and in-memory form) are not necessarily
    identical, even if they're strongly related by intention.

    I said "probably," because perhaps your file actually does hold
    "the size of myStruct." This could be the case if an actual `struct
    myStruct' was written to the file originally, complete with whatever
    padding it might have included. If you never, never need to move the
    data to another system (not even for post-mortem analysis), you can
    probably get away with this.

    > I could allocate field2 array dynamically, but I haven't malloc/free on
    > my embedded platform. So I decided to statically allocate the biggest
    > size (32 bytes, 2 bytes for field1 and 30 bytes for field2).


    It's not clear that the presence or absence of malloc() has
    anything to do with the presence or absence of padding.

    > If the real size of myStruct, read from the configuration file, is
    > struct_size, how can I deduce the size of field2? Actually I use the
    > following formula:
    >
    > field2_size = struct_size - 2


    I guess `struct_size' is something you compute from the two-byte
    array? Well, it matters not: The validity of the formula depends not
    on C but on the program that wrote the file in the first place. What
    formula did *that* program use?

    (A literal reading of your question leads to the answer "The size
    of field2 is thirty, always." This may sound nit-picky, but I have a
    hunch that if you think about it hard enough you'll arrive at the
    question you *should* be asking instead.)

    > but I don't like it. It would be wrong if I'll decide to change the size
    > of field1 member, and it would be wrong if padding is present between
    > the two fields. Maybe the following is better?
    >
    > field2_size = struct_size - offsetof(struct myStruct, field2)


    Same problem: It could be right or wrong (or "not even wrong"),
    because what you need to care about is who wrote the file and how.

    --
    Eric Sosman
    d
     
    Eric Sosman, Aug 31, 2011
    #1
    1. Advertising

  2. Eric Sosman

    Eric Sosman Guest

    On 8/31/2011 3:07 AM, pozz wrote:
    > Il 31/08/2011 03:39, Eric Sosman ha scritto:
    >>> In this case, in my application I'd like to read the size of myStruct
    >>> (between 3 and 32) from a file. field1 will be always 2-bytes long,
    >>> field2 will be the size of myStruct minus 2 bytes of field1 (in the case
    >>> no padding is present between field1 and field2).

    >>
    >> You probably do *not* want to read "the size of myStruct" from
    >> the file; you want to read "the size of some blob of data." The two
    >> environments (on-disk form and in-memory form) are not necessarily
    >> identical, even if they're strongly related by intention.

    >
    > This is my case.
    >
    >
    >> I said "probably," because perhaps your file actually does hold
    >> "the size of myStruct." This could be the case if an actual `struct
    >> myStruct' was written to the file originally, complete with whatever
    >> padding it might have included. If you never, never need to move the
    >> data to another system (not even for post-mortem analysis), you can
    >> probably get away with this.

    >
    > I understand your point and I expaling what I'm trying to do.


    I'm not so sure you understand my points. For myself, I'm
    *sure* I don't understand "expaling."

    > I have to read a file, created by another application on another
    > platform, that is composed by blocks of data (what you named "blob of
    > data"). The size of these blocks (between 3 and 32) is written in the
    > same file at the beginning.
    > A single block is composed by 2 bytes and (block_size - 2) bytes.


    You haven't said so, but I guess that the initial two bytes
    somehow encode `block_size'. Whether the remaining bytes are all
    "payload" or may themselves include padding may be known to you, but
    remains a mystery to the rest of us.

    > Because I don't know the size of blocks I'll read and I can't malloc the
    > right size at run-time, I was trying to define the maximum size of
    > block, splitting it in the two fields:


    Again the hangup over the absence of malloc(). If this has
    anything at all to do with the problem, it has to do with some aspect
    of the problem that you have not yet revealed. Based on what you've
    said and shown, the existence or non-existence of malloc() has zilch
    to do with the matter.

    > struct myStruct {
    > unsigned char field1[2];
    > unsigned char field2[30];
    > };


    ... but here comes the "splitting it in the two fields" part,
    which you're going about (as several people have told you) in an
    unreliable way.

    > I thought I could have read the block and copy it directly to myStruct.
    > Anyway if padding could be present in myStruct, I can't use this approach.
    >
    > Maybe the best approach is:
    >
    > #define FIELD1_OFFSET 0
    > #define FIELD1_SIZE 2
    > #define FIELD2_OFFSET FIELD1_SIZE
    > #define BLOCK_MAXSIZE 32
    > void read_block(struct myStruct *s, size_t block_size) {
    > unsigned char block[BLOCK_MAXSIZE];
    > <read BLOCK_MAXSIZE bytes and copy it in block array>
    > memcpy(s->field1, &block[FIELD1_OFFSET], FIELD1_SIZE);
    > memcpy(s->field2, &block[FIELD2_OFFSET], block_size - FIELD1_SIZE);
    > }
    >
    > Here block_size is passed as an argument, because I don't know it in
    > advance.


    That's odd. Where do you learn `block_size', *before* reading
    the first two bytes of your blob? And if `block_size' turns out to
    be less than thirty-two, how does your "read BLOCK_MAXSIZE bytes"
    avoid running off the end and into whatever follows the blob?

    Observation: It is premature to seek the "best" way to do
    something when you have not yet come up with "any" way. That's
    premature optimization personified.

    --
    Eric Sosman
    d
     
    Eric Sosman, Sep 1, 2011
    #2
    1. Advertising

  3. Eric Sosman

    BartC Guest

    "pozz" <> wrote in message
    news:j3n70j$qrf$...

    > Just to better explain the content of the file:
    >
    > - 2 bytes that code the size N of all the subsequent blocks
    > - N bytes for block 1
    > - N bytes for block 2
    > - ...up to the end of the file
    >
    > A single N-bytes block is composed by two field:
    > - 2 bytes for field1
    > - N-2 bytes for field2
    >
    > field1 and field2 are application data that aren't important now for our
    > discussion.


    So why is it necessary to read each block of N bytes in one go?

    Why not read 2 bytes into field1, then N-2 bytes into field2? Then problems
    of padding and alignment will disappear.

    (And field1 does sound very much sound like a 16-bit numeric value; if it
    is, it might as well be declared as one, making it easier to work with.)

    --
    Bartc
     
    BartC, Sep 1, 2011
    #3
  4. "BartC" <> writes:
    > "pozz" <> wrote in message
    > news:j3n70j$qrf$...
    >
    >> Just to better explain the content of the file:
    >>
    >> - 2 bytes that code the size N of all the subsequent blocks
    >> - N bytes for block 1
    >> - N bytes for block 2
    >> - ...up to the end of the file
    >>
    >> A single N-bytes block is composed by two field:
    >> - 2 bytes for field1
    >> - N-2 bytes for field2
    >>
    >> field1 and field2 are application data that aren't important now for our
    >> discussion.

    >
    > So why is it necessary to read each block of N bytes in one go?
    >
    > Why not read 2 bytes into field1, then N-2 bytes into field2? Then problems
    > of padding and alignment will disappear.


    Those problems also disappear if you just read each block into an N-byte
    array.

    > (And field1 does sound very much sound like a 16-bit numeric value; if it
    > is, it might as well be declared as one, making it easier to work with.)


    It's stored in big-endian format, so you can't just read it directly
    into a 16-bit integer object (unless you use htons() and/or ntohs()).

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Sep 1, 2011
    #4
  5. pozz <> writes:
    [...]
    > I think I understood your point of view. Data in the file can be of two
    > types:
    > - blob of data of exactly N bytes sized
    > - myStruct previously written to the file by the same software
    > on the same platform (so with the same layout of padding and data)
    > I'm in the first case. I know the file contains a sequence of blobs of
    > the same size N. N is constant for a file, but may vary from file to
    > file. So the software should be ready to read blobs of 10 or 15 or 30
    > or 32 bytes.
    > How the software can know the size of blobs in the file? There are two
    > bytes at the beginning of the file (just one time and *not* for each
    > blob) coded as a 16-bits unsigned integer in Big Endian.


    [...]

    > Just to better explain the content of the file:
    >
    > - 2 bytes that code the size N of all the subsequent blocks
    > - N bytes for block 1
    > - N bytes for block 2
    > - ...up to the end of the file
    >
    > A single N-bytes block is composed by two field:
    > - 2 bytes for field1
    > - N-2 bytes for field2
    >
    > field1 and field2 are application data that aren't important now for our
    > discussion.


    Given that the maximum size is only 32 bytes, I probably wouldn't
    use malloc() even if it were available; the overhead is likely to
    exceed any savings from allocating, say, 24 bytes rather than 32.

    Here's how I'd approach it:

    Read 2 bytes from the file.
    Compute N ((byte0 << 8) + byte1)
    loop
    Read N bytes into a 32-byte buffer (unsigned char buf[32];)
    Bytes 0..1 contain field1
    Bytes 2..N-1 contain field2

    (I've omitted any error checking.)

    If you need the data in a friendlier format than a 32-byte buffer, you
    can copy it out of buf into whatever is more convenient.

    If you're *extremely* short on available memory, you can read
    2 bytes directly into field1 and N-2 bytes directly into field2
    (this saves you the 32-byte buffer). Assuming field1 represents
    a 16-bit integer, don't forget about byte ordering.

    Reading directly into a struct is a bad idea if you care about
    portability (including to future versions of the same compiler).
    If reading directly into a struct turns out to be the best approach
    anyway, I'd add some asserts to ensure that the sizes and offsets
    of field1 and field2 are what they need to be, so the program will
    fail to run rather than operate incorrectly if the layout changes.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
     
    Keith Thompson, Sep 1, 2011
    #5
  6. Eric Sosman

    NicStevens Guest

    On Aug 30, 6:39 pm, Eric Sosman <> wrote:
    > On 8/30/2011 6:05 PM, pozz wrote:
    >
    > > I have a struct composed by two arrays of unsigned char.

    >
    > > struct myStruct {
    > > unsigned char field1[2];
    > > unsigned char field2[30];
    > > };

    >
    > > Is myStruct *always* 32 bytes long? Is field2 *always* starting after
    > > two bytes the pointer to myStruct (i.e., no padding is allowed between
    > > field1 and field2)?

    >
    >      No and no.  Many compilers will lay out the struct as you hope,
    > but none are under any obligation to do so.  If you want thirty-two
    > consecutive bytes, consider `unsigned char field_both[32]'.
    >
    > > In this case, in my application I'd like to read the size of myStruct
    > > (between 3 and 32) from a file. field1 will be always 2-bytes long,
    > > field2 will be the size of myStruct minus 2 bytes of field1 (in the case
    > > no padding is present between field1 and field2).

    >
    >      You probably do *not* want to read "the size of myStruct" from
    > the file; you want to read "the size of some blob of data."  The two
    > environments (on-disk form and in-memory form) are not necessarily
    > identical, even if they're strongly related by intention.
    >
    >      I said "probably," because perhaps your file actually does hold
    > "the size of myStruct."  This could be the case if an actual `struct
    > myStruct' was written to the file originally, complete with whatever
    > padding it might have included.  If you never, never need to move the
    > data to another system (not even for post-mortem analysis), you can
    > probably get away with this.
    >
    > > I could allocate field2 array dynamically, but I haven't malloc/free on
    > > my embedded platform. So I decided to statically allocate the biggest
    > > size (32 bytes, 2 bytes for field1 and 30 bytes for field2).

    >
    >      It's not clear that the presence or absence of malloc() has
    > anything to do with the presence or absence of padding.
    >
    > > If the real size of myStruct, read from the configuration file, is
    > > struct_size, how can I deduce the size of field2? Actually I use the
    > > following formula:

    >
    > > field2_size = struct_size - 2

    >
    >      I guess `struct_size' is something you compute from the two-byte
    > array?  Well, it matters not: The validity of the formula depends not
    > on C but on the program that wrote the file in the first place.  What
    > formula did *that* program use?
    >
    >      (A literal reading of your question leads to the answer "The size
    > of field2 is thirty, always."  This may sound nit-picky, but I have a
    > hunch that if you think about it hard enough you'll arrive at the
    > question you *should* be asking instead.)
    >
    > > but I don't like it. It would be wrong if I'll decide to change the size
    > > of field1 member, and it would be wrong if padding is present between
    > > the two fields. Maybe the following is better?

    >
    > > field2_size = struct_size - offsetof(struct myStruct, field2)

    >
    >      Same problem: It could be right or wrong (or "not even wrong"),
    > because what you need to care about is who wrote the file and how.
    >
    > --
    > Eric Sosman
    >


    may I ask what compiler offsetof works on? I tried gcc and msc and
    both complain about offsetof(struct foo, bar)
     
    NicStevens, Sep 12, 2011
    #6
  7. NicStevens wrote:
    >
    > may I ask what compiler offsetof works on? I tried gcc and msc and
    > both complain about offsetof(struct foo, bar)


    Every compiler which implements Standard C. It's a macro in <stddef.h>.
     
    J. J. Farrell, Sep 12, 2011
    #7
  8. Eric Sosman

    James Kuyper Guest

    On 09/12/2011 03:09 PM, NicStevens wrote:
    > On Aug 30, 6:39 pm, Eric Sosman <> wrote:
    >> On 8/30/2011 6:05 PM, pozz wrote:
    >>
    >>> I have a struct composed by two arrays of unsigned char.

    >>
    >>> struct myStruct {
    >>> unsigned char field1[2];
    >>> unsigned char field2[30];
    >>> };

    ....
    >>> field2_size = struct_size - offsetof(struct myStruct, field2)

    >>
    >> Same problem: It could be right or wrong (or "not even wrong"),
    >> because what you need to care about is who wrote the file and how.

    ....
    > may I ask what compiler offsetof works on? I tried gcc and msc and
    > both complain about offsetof(struct foo, bar)


    The following code should compile without complaint on any hosted
    implementation of C which fully conforms to any version of the C
    standard. I can't test it on msc, but it does compile without any
    complaints with gcc, gcc -std=c89, and gcc -std=c99. I'd be surprised
    at anything that dares to call itself a C compiler which would complain
    about it:

    #include <stddef.h>
    struct foo { int bar; };
    int main(void)
    {
    return offsetof(struct foo, bar);
    }

    If you're running into problems with offsetof(struct foo, bar), it's
    probably not due to the offsetof() expression itself, but it's
    connection with the rest of your program. For instance, did you remember
    to #include <stddef.h>?
    If that's not the problem, can you provide an complete, short, program
    that demonstrates the problem, along with the command line that you used
    to compile it, and the message that was generated complaining about it?
     
    James Kuyper, Sep 12, 2011
    #8
  9. Eric Sosman

    NicStevens Guest

    On Sep 12, 12:32 pm, James Kuyper <> wrote:
    > On 09/12/2011 03:09 PM, NicStevens wrote:
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > > On Aug 30, 6:39 pm, Eric Sosman <> wrote:
    > >> On 8/30/2011 6:05 PM, pozz wrote:

    >
    > >>> I have a struct composed by two arrays of unsigned char.

    >
    > >>> struct myStruct {
    > >>> unsigned char field1[2];
    > >>> unsigned char field2[30];
    > >>> };

    > ...
    > >>> field2_size = struct_size - offsetof(struct myStruct, field2)

    >
    > >>      Same problem: It could be right or wrong (or "not even wrong"),
    > >> because what you need to care about is who wrote the file and how.

    > ...
    > > may I ask what compiler offsetof works on? I tried gcc and msc and
    > > both complain about offsetof(struct foo, bar)

    >
    > The following code should compile without complaint on any hosted
    > implementation of C which fully conforms to any version of the C
    > standard. I can't test it on msc, but it does compile without any
    > complaints  with gcc, gcc -std=c89, and gcc -std=c99. I'd be surprised
    > at anything that dares to call itself a C compiler which would complain
    > about it:
    >
    >         #include <stddef.h>
    >         struct foo { int bar; };
    >         int main(void)
    >         {
    >             return offsetof(struct foo, bar);
    >         }
    >
    > If you're running into problems with offsetof(struct foo, bar), it's
    > probably not due to the offsetof() expression itself, but it's
    > connection with the rest of your program. For instance, did you remember
    > to #include <stddef.h>?
    > If that's not the problem, can you provide an complete, short, program
    > that demonstrates the problem, along with the command line that you used
    > to compile it, and the message that was generated complaining about it?


    Works on MSC
     
    NicStevens, Sep 14, 2011
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. RHNewBie

    How to print a string of unsigned chars?

    RHNewBie, Oct 31, 2003, in forum: C Programming
    Replies:
    3
    Views:
    1,110
    Peter Pichler
    Nov 15, 2003
  2. Kosio

    Floats to chars and chars to floats

    Kosio, Sep 16, 2005, in forum: C Programming
    Replies:
    44
    Views:
    1,342
    Tim Rentsch
    Sep 23, 2005
  3. Hongyu
    Replies:
    9
    Views:
    965
    James Kanze
    Aug 8, 2008
  4. pozz
    Replies:
    12
    Views:
    791
    Tim Rentsch
    Mar 20, 2011
  5. M.Posseth

    receiving ??? chars instead of "special" chars

    M.Posseth, Nov 15, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    285
    Dan Rogers
    Nov 16, 2004
Loading...

Share This Page