Re: Checksum in a struct

Discussion in 'C Programming' started by Eric Sosman, Jul 11, 2012.

  1. Eric Sosman

    Eric Sosman Guest

    On 7/11/2012 10:56 AM, pozz wrote:
    > I have a function that computes a 16-bit checksum (following whatever
    > algorithm) of a memory space:
    >
    > unsigned int checksum(const void *buffer, size_t size);
    >
    > I want to embed this checksum in a struct:
    >
    > struct PStruct {
    > int x;
    > unsigned int y;
    > char z[13];
    > ...
    > unsigned int checksum;
    > };
    >
    > How to use the checksum() function above? I propose:
    >
    > struct PStruct ps;
    > ...
    > ps.checksum = checksum(&ps, offsetof(struct PStruct, checksum));
    >
    > Is there a better mechanism?


    You'd better hope so :)

    A problem with the approach you've outlined is that the
    checksum computation will include the values of any padding
    bytes -- the size of `z' in your example almost begs for some
    padding bytes to be inserted. Since padding bytes are not
    necessarily preserved when assigning structs or even when
    assigning to struct elements, a checksum that includes padding
    bytes is unlikely to be very useful. Similar concerns apply to
    bit-field elements: The values of un-named bits are not necessarily
    preserved. For that matter, if `z' holds a string (as opposed to a
    generic batch of chars), the bytes after '\0' should probably be
    omitted from a checksum since they're not part of the "value."

    One possibility would be to checksum the fields individually,
    perhaps with a variadic function:

    ps.checksum = checksum(&p.x, sizeof p.x,
    &p.y, sizeof p.y,
    p.z, strlen(p.z) + 1,
    ...,
    (void*)NULL);

    It seems to me this would be cumbersome, and also prone to error:
    somebody could omit a field by accident, or (for checksums that
    are non-commutative) get them in the wrong order. Also, it can't
    handle bit-fields since you can't point at them.

    A preferable approach would be to write a checksum function
    specifically for struct PStruct objects, even if that function
    winds up making the cumbersome call(s) to the true underlying
    checksummer:

    unsigned int PSChecksum(const struct PStruct *);
    ps.checksum = PSChecksum(&ps);

    Such a function could even handle bit-fields by copying their
    values to addressable local variables before applying the low-
    level computation.

    --
    Eric Sosman
    d
    Eric Sosman, Jul 11, 2012
    #1
    1. Advertising

  2. Eric Sosman

    Eric Sosman Guest

    On 7/12/2012 10:42 AM, pozz wrote:
    > Il 11/07/2012 17:59, Eric Sosman ha scritto:
    >> A problem with the approach you've outlined is that the
    >> checksum computation will include the values of any padding
    >> bytes -- the size of `z' in your example almost begs for some
    >> padding bytes to be inserted. Since padding bytes are not
    >> necessarily preserved when assigning structs or even when
    >> assigning to struct elements, a checksum that includes padding
    >> bytes is unlikely to be very useful. Similar concerns apply to
    >> bit-field elements: The values of un-named bits are not necessarily
    >> preserved. For that matter, if `z' holds a string (as opposed to a
    >> generic batch of chars), the bytes after '\0' should probably be
    >> omitted from a checksum since they're not part of the "value."

    >
    > Yes, they are considerations I also made. In my application (running on
    > a single processor), I have to read/write the struct from/to a file and
    > use it in memory. I'm not interested in a standard format file (its a
    > custom configuration for the application) and I'll never need to
    > read/write the struct on a different processor.
    >
    > I know other better standard file formats for configuration settings are
    > available (INI, XML, ...), but I'm working on an embedded simple
    > processor and I don't want to increase the complexity of the software
    > just for the configuration.


    The fact that you intend to use the struct only locally and
    only on one processor doesn't change anything: Padding bytes will
    still contain random and potentially non-constant garbage, bytes
    after the '\0' terminating a string are probably garbage, and so
    on. It's unlikely, but the mere act of storing the checksum into
    the struct could in principle change the padding bytes -- if it
    does, the checksum is self-invalidating!

    If you want to write a struct and a checksum to a file and
    verify the checksum when you read it back, keep the checksum as
    a separate variable and don't put it inside the struct.

    --
    Eric Sosman
    d
    Eric Sosman, Jul 12, 2012
    #2
    1. Advertising

  3. Eric Sosman

    Jorgen Grahn Guest

    On Thu, 2012-07-12, pozz wrote:
    > Il 11/07/2012 17:59, Eric Sosman ha scritto:
    >> A problem with the approach you've outlined is that the
    >> checksum computation will include the values of any padding
    >> bytes -- the size of `z' in your example almost begs for some
    >> padding bytes to be inserted. Since padding bytes are not
    >> necessarily preserved when assigning structs or even when

    ....
    > Yes, they are considerations I also made. In my application (running on
    > a single processor), I have to read/write the struct from/to a file and
    > use it in memory. I'm not interested in a standard format file (its a
    > custom configuration for the application) and I'll never need to
    > read/write the struct on a different processor.
    >
    > I know other better standard file formats for configuration settings are
    > available (INI, XML, ...), but I'm working on an embedded simple
    > processor


    How simple? Your target has a file system, at least.

    > and I don't want to increase the complexity of the software
    > just for the configuration.


    Just bear in mind that your solution increases complexity in other
    ways. For example, debugging is harder when the files are in a binary
    ad-hoc format. You may need to document the format. Extending the
    amount of configuration data can be tricky, if you need to upgrade a
    system without losing the current config. And so on.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Jul 13, 2012
    #3
  4. Eric Sosman

    Les Cargill Guest

    pozz wrote:
    > Il 11/07/2012 17:59, Eric Sosman ha scritto:
    >> A problem with the approach you've outlined is that the
    >> checksum computation will include the values of any padding
    >> bytes -- the size of `z' in your example almost begs for some
    >> padding bytes to be inserted. Since padding bytes are not
    >> necessarily preserved when assigning structs or even when
    >> assigning to struct elements, a checksum that includes padding
    >> bytes is unlikely to be very useful. Similar concerns apply to
    >> bit-field elements: The values of un-named bits are not necessarily
    >> preserved. For that matter, if `z' holds a string (as opposed to a
    >> generic batch of chars), the bytes after '\0' should probably be
    >> omitted from a checksum since they're not part of the "value."

    >
    > Yes, they are considerations I also made. In my application (running on
    > a single processor), I have to read/write the struct from/to a file and
    > use it in memory. I'm not interested in a standard format file (its a
    > custom configuration for the application) and I'll never need to
    > read/write the struct on a different processor.
    >
    > I know other better standard file formats for configuration settings are
    > available (INI, XML, ...), but I'm working on an embedded simple
    > processor and I don't want to increase the complexity of the software
    > just for the configuration.
    >


    So do this: ( WARNING! - THIS CODE PROBABLY DOES NOT COMPILE! )

    typedef enum TYPTYP = {
    v_int32,
    v_short, // or int16
    v_float,
    v_double,
    v_int8, // used for scalar chars
    v_string,
    };

    typedef struct {
    char name[128];
    TYPTYP type;
    void *ptr;
    } cfgElem;

    extern int32 thing1;
    extern double thing2;
    ....

    cfgElem configTable[] = {
    { "Thing1", v_int32 , &thing1 },
    { "Thing2", v_double , &thing2 },
    ....
    };

    then write a simple parser for .ini files ( if it's
    > 80 lines you did it wrong ) that exploits this table.

    You will not regret it unless you are somehow trying
    to make somebody's life more difficult.

    At least write a 'C' ( or perl, python, Tcl - although
    'C' has advantages here ) program to manipulate these files
    for you on a PC. Trust me on this - you need it, and
    it will save you time in the long run. If need be, do it
    at home on a Saturday - you'll get a few Saturdays back in the
    end....

    you can even ... *gasp* ... write a logger that polls
    the configuration store and tells you every time something
    changes.

    You may not be interested in configuration management,
    but configuration management is interested in you.

    --
    Les Cargill
    Les Cargill, Jul 13, 2012
    #4
  5. Eric Sosman

    Eric Sosman Guest

    On 7/15/2012 3:56 AM, pozz wrote:
    > Il 12/07/2012 17:08, Eric Sosman ha scritto:
    >> On 7/12/2012 10:42 AM, pozz wrote:
    >>> Il 11/07/2012 17:59, Eric Sosman ha scritto:
    >>>> A problem with the approach you've outlined is that the
    >>>> checksum computation will include the values of any padding
    >>>> bytes -- the size of `z' in your example almost begs for some
    >>>> padding bytes to be inserted. Since padding bytes are not
    >>>> necessarily preserved when assigning structs or even when
    >>>> assigning to struct elements, a checksum that includes padding
    >>>> bytes is unlikely to be very useful. Similar concerns apply to
    >>>> bit-field elements: The values of un-named bits are not necessarily
    >>>> preserved. For that matter, if `z' holds a string (as opposed to a
    >>>> generic batch of chars), the bytes after '\0' should probably be
    >>>> omitted from a checksum since they're not part of the "value."
    >>>
    >>> Yes, they are considerations I also made. In my application (running on
    >>> a single processor), I have to read/write the struct from/to a file and
    >>> use it in memory. I'm not interested in a standard format file (its a
    >>> custom configuration for the application) and I'll never need to
    >>> read/write the struct on a different processor.
    >>>
    >>> I know other better standard file formats for configuration settings are
    >>> available (INI, XML, ...), but I'm working on an embedded simple
    >>> processor and I don't want to increase the complexity of the software
    >>> just for the configuration.

    >>
    >> The fact that you intend to use the struct only locally and
    >> only on one processor doesn't change anything: Padding bytes will
    >> still contain random and potentially non-constant garbage, bytes
    >> after the '\0' terminating a string are probably garbage, and so
    >> on. It's unlikely, but the mere act of storing the checksum into
    >> the struct could in principle change the padding bytes -- if it
    >> does, the checksum is self-invalidating!

    >
    > So a possible solution is to store the checksum outside the struct as
    > a different variable.


    Yes, as I suggested in the very next paragraph:

    >> If you want to write a struct and a checksum to a file and
    >> verify the checksum when you read it back, keep the checksum as
    >> a separate variable and don't put it inside the struct.

    >
    > Could I ignore the "randomness" of the padding bytes? I read that
    > the padding bytes can be randomly changed even assigning a value to a
    > field of the struct.


    Now, *where* could you have read such a thing? ;-)

    > My application should work in this way:
    >
    > - at startup, read the configuration file, calculate and verify the
    > checksum: if it isn't correct, use a default struct;


    Right: You'd read the struct's bytes directly into an instance
    of itself using fread(), say, rather than making field assignments.
    Then you'd read the stored checksum into an independent variable,
    re-calculate the struct's checksum, and compare. It's your choice
    what to do about a mismatch.

    > - when a field changes (after assigning it the new value), calculate
    > the new checksum and save both (struct and checksum) to the file;


    Right again: Calculate the new checksum, store it in a free-
    standing variable, and write the bytes of both to the file. Again,
    it's up to you to decide how frequently you want to do this: On
    every change, only at program shutdown, or something in between.

    > - during the normal execution of the application, the fields of the
    > struct are accessed many times.
    >
    > In this situation, could I calculate the checksum on the entire
    > memory area of the struct (with padding bytes)? I read the padding
    > bytes can be randomly changed when a value is assigned to a field, but
    > in this case a re-calculate the checksum. What happens if I access a
    > field? Also for read operations the padding bytes could be changed?


    Padding bytes are "vulnerable" when their fellow travellers are
    stored to (6.2.6.1p6). There's no similar language for read accesses,
    which I interpret as meaning reads won't change them. Note that this
    applies only to the padding in the instance that's being read; if you
    copy a padded struct from one instance to another

    struct padded s1 = ...;
    // Suppose the padding bytes in s1 have values p1,p2,...
    struct padded s2 = s1;
    // s1's padding is still p1,p2,... but s2's can differ.

    .... the padding in the original doesn't change, but the padding in
    the copy need not agree with it. So when you're moving data back
    and forth to files, be sure to do the checksum calculations on the
    exact same struct instance that you use for the I/O, not on a copy.

    --
    Eric Sosman
    d
    Eric Sosman, Jul 15, 2012
    #5
  6. Eric Sosman

    Stefan Ram Guest

    pozz <> writes:
    >that correspond to any padding bytes take unspecified values." (6.2.6.1p6)
    >Here unspecified values means random data.


    »unspecified« does not imply that the date will pass
    tests for randomness.

    »unspecified behavior« is behavior »where each
    implementation documents how the choice is made«.
    Stefan Ram, Jul 15, 2012
    #6
  7. Eric Sosman

    Eric Sosman Guest

    On 7/15/2012 1:30 PM, Stefan Ram wrote:
    > pozz <> writes:
    >> that correspond to any padding bytes take unspecified values." (6.2.6.1p6)
    >> Here unspecified values means random data.

    >
    > »unspecified« does not imply that the date will pass
    > tests for randomness.
    >
    > »unspecified behavior« is behavior »where each
    > implementation documents how the choice is made«.


    No; that's "implementation-defined behavior" (3.4.1p1).
    "Unspecified behavior" (3.4.4p1) is

    use of an unspecified value, or other behavior where this
    International Standard provides two or more possibilities
    and imposes no further requirements on which is chosen in
    any instance

    "No further requirements" implies "No requirement to document."

    --
    Eric Sosman
    d
    Eric Sosman, Jul 15, 2012
    #7
  8. Eric Sosman

    Stefan Ram Guest

    Eric Sosman <> writes:
    >On 7/15/2012 1:30 PM, Stefan Ram wrote:
    >>(...)

    >No; that's "implementation-defined behavior" (3.4.1p1).
    >"Unspecified behavior" (3.4.4p1) is


    You are right. I did not read careful enough. I just saw:

    »unspecified behavior where each implementation
    documents how the choice is made«

    and thought it was from a table or so, so that one can
    read »is « in front of »where«, but I erred.
    Stefan Ram, Jul 15, 2012
    #8
  9. Eric Sosman

    alex Guest

    On Wed, 11 Jul 2012 11:59:42 -0400, Eric Sosman wrote:
    > A problem with the approach you've outlined is that the
    > checksum computation will include the values of any padding bytes -- the
    > size of `z' in your example almost begs for some padding bytes to be
    > inserted. Since padding bytes are not necessarily preserved when
    > assigning structs or even when assigning to struct elements, a checksum
    > that includes padding bytes is unlikely to be very useful.


    Are you sure about this?? I would expect a struct assign/deepcopy to be
    implemented "under the hood" using memcpy(), not { s1.a=s2.a;
    s1.b=s2.b; } etc. Pretty sure that's what GCC does.
    alex, Jul 16, 2012
    #9
  10. alex <> writes:
    > On Wed, 11 Jul 2012 11:59:42 -0400, Eric Sosman wrote:
    >> A problem with the approach you've outlined is that the
    >> checksum computation will include the values of any padding bytes -- the
    >> size of `z' in your example almost begs for some padding bytes to be
    >> inserted. Since padding bytes are not necessarily preserved when
    >> assigning structs or even when assigning to struct elements, a checksum
    >> that includes padding bytes is unlikely to be very useful.

    >
    > Are you sure about this?? I would expect a struct assign/deepcopy to be
    > implemented "under the hood" using memcpy(), not { s1.a=s2.a;
    > s1.b=s2.b; } etc. Pretty sure that's what GCC does.


    A compiler can do it either way. Using the equivalent of a memcpy()
    call is certainly a likely approach, but there's no guarantee that
    it's done that way.

    Code that assumes padding bytes are preserved is likely to work
    perfectly until the moment you demonstrate it to an important client.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Will write code for food.
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Jul 17, 2012
    #10
  11. Eric Sosman

    Eric Sosman Guest

    On 7/16/2012 5:01 PM, alex wrote:
    > On Wed, 11 Jul 2012 11:59:42 -0400, Eric Sosman wrote:
    >> A problem with the approach you've outlined is that the
    >> checksum computation will include the values of any padding bytes -- the
    >> size of `z' in your example almost begs for some padding bytes to be
    >> inserted. Since padding bytes are not necessarily preserved when
    >> assigning structs or even when assigning to struct elements, a checksum
    >> that includes padding bytes is unlikely to be very useful.

    >
    > Are you sure about this??


    Yes.

    > I would expect a struct assign/deepcopy to be
    > implemented "under the hood" using memcpy(), not { s1.a=s2.a;
    > s1.b=s2.b; } etc. Pretty sure that's what GCC does.


    My car is blue. Therefore, cars are blue.

    --
    Eric Sosman
    d
    Eric Sosman, Jul 17, 2012
    #11
  12. Eric Sosman

    Tim Rentsch Guest

    pozz <> writes:

    > Il 12/07/2012 17:08, Eric Sosman ha scritto:

    [snip]
    >> If you want to write a struct and a checksum to a file and
    >> verify the checksum when you read it back, keep the checksum as
    >> a separate variable and don't put it inside the struct.

    >
    > Could I ignore the "randomness" of the padding bytes? I read that
    > the padding bytes can be randomly changed even assigning a value to a
    > field of the struct. My application should work in this way:
    >
    > - at startup, read the configuration file, calculate and verify the
    > checksum: if it isn't correct, use a default struct;
    >
    > - when a field changes (after assigning it the new value), calculate
    > the new checksum and save both (struct and checksum) to the file;
    >
    > - during the normal execution of the application, the fields of the
    > struct are accessed many times.
    >
    > In this situation, could I calculate the checksum on the entire
    > memory area of the struct (with padding bytes)? [snip]


    Yes, provided (1) if the whole struct is updated, either you
    calculate a new checksum from scratch or you make sure the
    padding bytes are copied also (eg, by using memcpy()) and
    use the checksum of the source struct, and (2) the checksum
    is stored in a way so as not to perturb the struct's padding
    bytes (this can be done by putting the checksum outside the
    struct in question, or by updating an in-struct checksum
    using, eg, memcpy()).

    This question was asked fairly clearly and obviously is the most
    important one to answer; I don't know why it wasn't responded
    to more directly.

    You are right that reading a struct or its members has no
    effect on its padding bytes.
    Tim Rentsch, Jul 21, 2012
    #12
  13. Eric Sosman

    Tim Rentsch Guest

    Eric Sosman <> writes:

    > On 7/11/2012 10:56 AM, pozz wrote:
    >> I have a function that computes a 16-bit checksum (following whatever
    >> algorithm) of a memory space:
    >>
    >> unsigned int checksum(const void *buffer, size_t size);
    >>
    >> I want to embed this checksum in a struct:
    >>
    >> struct PStruct {
    >> int x;
    >> unsigned int y;
    >> char z[13];
    >> ...
    >> unsigned int checksum;
    >> };
    >>
    >> How to use the checksum() function above? I propose:
    >>
    >> struct PStruct ps;
    >> ...
    >> ps.checksum = checksum(&ps, offsetof(struct PStruct, checksum));
    >>
    >> Is there a better mechanism?

    >
    > You'd better hope so :)
    >
    > A problem with the approach you've outlined is that the
    > checksum computation will include the values of any padding
    > bytes -- the size of `z' in your example almost begs for some
    > padding bytes to be inserted. [snip elaboration]


    You're assuming he wants a checksum on the "logical
    value" of the struct. If what he wants is a checksum
    on the physical value of the struct -- which appears to
    be what he does want -- then this approach will work fine
    (provided of course care is taken so that storing the
    checksum will not perturb the padding bytes, which I
    have already addressed in an earlier reply).
    Tim Rentsch, Jul 21, 2012
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Stefan Ram

    Re: Checksum in a struct

    Stefan Ram, Jul 11, 2012, in forum: C Programming
    Replies:
    2
    Views:
    361
    James Kuyper
    Jul 25, 2012
  2. Re: Checksum in a struct

    , Jul 11, 2012, in forum: C Programming
    Replies:
    0
    Views:
    329
  3. Re: Checksum in a struct

    , Jul 12, 2012, in forum: C Programming
    Replies:
    0
    Views:
    368
  4. Peter Nilsson

    Re: Checksum in a struct

    Peter Nilsson, Jul 13, 2012, in forum: C Programming
    Replies:
    0
    Views:
    312
    Peter Nilsson
    Jul 13, 2012
  5. Tim Rentsch

    Re: Checksum in a struct

    Tim Rentsch, Jul 21, 2012, in forum: C Programming
    Replies:
    0
    Views:
    324
    Tim Rentsch
    Jul 21, 2012
Loading...

Share This Page