Re: Copy a struct field by file

Discussion in 'C Programming' started by Eric Sosman, Sep 3, 2011.

  1. Eric Sosman

    Eric Sosman Guest

    On 9/3/2011 9:04 AM, pozz wrote:
    > Suppose I have a structure:
    >
    > typedef struct {
    > int version;
    > DUMMY dummy;
    > FOO foo;
    > BAR bars[128];
    > } CONFIG;
    >
    > stored in a "config.dat" file with fwrite(). At startup, the application
    > open the file and read the configuration. I think it is a normal
    > approach to store the configuration of an application in a non-volatile
    > way.
    > Of course, there are many file types for storing application
    > configuration (INI, XML, CSV, database...), but in my case a pure binary
    > file is sufficient and simple to use.
    >
    > Now suppose I have a new version of the software and a new version of
    > the CONFIG structure:
    >
    > typedef struct {
    > int version;
    > DUMMY dummy;
    > FOOOLD foo;
    > BAR bars[128];
    > } CONFIGOLD;
    >
    > typedef struct {
    > int version;
    > DUMMY dummy;
    > FOO foo;
    > NEWELEM newelem;
    > BAR bars[256];
    > } CONFIG;
    >
    > Note that some elements are inserted in the middle of the structure, the
    > size of the array bars is changed and the definition of sub-structure
    > (FOO in the example) is also changed.
    >
    > I want to write a function that opens the configuration file and, based
    > on the version, read the configuration or make an upgrade of the
    > configuration file.
    >
    > Normally I would proceed opening the file, reading the version and, in
    > the case it is old, reading the old configuration structure, copying to
    > the new configuration structure (making adaptation), deleting the old
    > file and creating/writing the new structure to the file. Something
    > similar to this (without error checking):
    >
    > int fd;
    > CONFIG cfg;
    > fd = open("config.dat", O_RDONLY);


    It seems odd that you use C's fwrite() for output but then
    resort to non-C methods to read it again. Why not fread()?

    > read(fd, &cfg.version, sizeof(cfg.version));
    > if (cfg.version == 2) {
    > lseek(fd, 0, SEEK_SET);
    > read(fd, &cfg, sizeof(cfg));


    The seeking seems superfluous. Why not just keep on reading
    from the current file position, taking into account the fact that
    you've read the version number already?

    read(fd, (char*)&cfg + sizeof(cfg.version),
    sizeof(cfg) - sizeof(cfg.version));

    > close(fd);
    > } else if (cfg.version == 1) {
    > CONFIGOLD cfgold;
    > BAR bar_default = { ... };
    > lseek(fd, 0, SEEK_SET);
    > read(fd, &cfgold, sizeof(cfgold));
    > /* Copy from old to new configuration, filling the new elements
    > * with default values */
    > cfg.version = 2;
    > cfg.dummy = cfgold.dummy;
    > <...adapt cfgold.foo to cfg.foo, it's application dependent...>
    > cfg.newelem = newelem_default;
    > memcpy(cfg.bars, cfgold.bars, 128 * sizeof(BAR));
    > memcpy(&cfg.bars[128], &bar_default, 128 * sizeof(BAR));
    > close(fd);
    > remove("config.dat");


    Aside: You may live to regret this. What if the system crashes
    just after removing the old configuration file but before creating
    the new one? It might be better to write the new data to "config.tmp"
    and then remove("config.dat"), rename("config.tmp", "config.dat")
    once you're sure the new data has been safely written. Better still:

    /* ... write "config.tmp" ... */
    remove("config.bak");
    rename("config.dat", "config.bak");
    rename("config.tmp", "config.dat");

    .... and still more elaborate schemes are possible.

    > fd = open("config.dat", O_WRONLY | O_CREAT);
    > write(fd, &cfg, sizeof(cfg));
    > close(fd);
    > }
    >
    > This algorithm assumes to maintain both structures in RAM, but I
    > couldn't on my embedded platform with a small amount of memory.


    You need both only while the load-and-convert is in progress.
    If `oldcfg' is an `auto' variable it will go away when the function
    returns; if you get its space from malloc() you can free() it when
    conversion is finished.

    But if even that is too much of a burden, you can perhaps read
    the old configuration piecemeal instead of in one big gulp. It looks
    like the DUMMY element can be read directly into `cfg' without using
    extra storage. You haven't revealed the relationship between FOOOLD
    and FOO, but you can surely perform the conversion with no more than
    sizeof(FOOOLD) additional memory, perhaps less. If the expanded BAR
    array just has the old BAR elements as a prefix you need no extra
    space; if the conversion is more complicated you might need some.
    But in all, you need at most max(sizeof(FOOOLD), 128*sizeof(BAR))
    additional memory, possibly less.

    > [...]
    > The problem I couldn't solve is related to the reading/writing of each
    > field. Indeed, between fields the compiler could add padding bytes, so
    > reading/writing the entire structure (with padding) is completely
    > different than reading/writing field by field (without padding).


    You don't need an actual instance of the struct to determine
    how many padding bytes, if any, are present. If you're writing
    a struct S { T1 f1; T2 f2; ... } field-by-field using independent
    sources for the f1,f2,... you can do something like this:

    T1 x_f1 = ...;
    T2 x_f2 = ...;
    ...
    size_t written = 0; // bytes written thus far
    fwrite (&x_f1, sizeof x_f1, 1, stream);
    written += sizeof x_f1;
    while (written < offsetof(struct S, f2)) {
    putc('\0', stream); // write padding bytes
    ++written;
    }
    fwrite (&x_f2, sizeof x_f2, 1, stream);
    written += sizeof x_f2;
    ...

    A similar approach works for reading: Just use getc() to consume
    and ignore padding bytes instead of putc() to create them.

    > What do you think? Do you have other better suggestions?


    Design a better configuration file format. Seriously. You
    are in this bind and going to all this work *because* you've got
    an on-disk image of an in-memory object, and because the in-memory
    object's form is subject to incompatible changes. If you had
    written the data field-by-field in the first place you would not
    need to worry about padding bytes. If you had changed the `cfg'
    solely by adding things to the end instead of roiling the middle,
    you could read the prefix, check the version, and then maybe read
    more. If you had adopted a more flexible format than image-of-RAM
    you would have even more freedom to adapt and extend. In short,
    your difficulties seem mostly self-inflicted.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 3, 2011
    #1
    1. Advertising

  2. Eric Sosman

    Eric Sosman Guest

    On 9/4/2011 10:53 AM, pozz wrote:
    > Il 03/09/2011 16:18, Eric Sosman ha scritto:
    >> On 9/3/2011 9:04 AM, pozz wrote:
    >>> [...]
    >>> remove("config.dat");

    >>
    >> Aside: You may live to regret this. What if the system crashes
    >> just after removing the old configuration file but before creating
    >> the new one? It might be better to write the new data to "config.tmp"
    >> and then remove("config.dat"), rename("config.tmp", "config.dat")
    >> once you're sure the new data has been safely written. Better still:
    >>
    >> /* ... write "config.tmp" ... */
    >> remove("config.bak");
    >> rename("config.dat", "config.bak");
    >> rename("config.tmp", "config.dat");
    >>
    >> ... and still more elaborate schemes are possible.

    >
    > This is another good suggestion. Anyway, even in your sequence of
    > instructions there is a weakness of the same type. If the system crashes
    > just after the first rename(), you won't have any "config.dat" file. Of
    > course, with your approach the probability of "bad crashes" is greatly
    > reduced.


    A crucial difference is that in your original version the data
    is gone forever, while in mine it can be recovered by renaming the
    "config.bak" file. Which would you rather be faced with: "It'll take
    a moment and some manual intervention to restore your records," or
    "Account balance? What account balance? We have no record that you
    have ever done business with this bank."

    > Consider that CONFIG and CONFIGOLD structures are only examples that
    > show what could typically happen with a new software version: some field
    > could be inserted in the middle of the structure, some array could be
    > expanded, some sub-structure could change.


    If you want to make trouble for yourself, the amount of trouble
    you can get into is limited only by your own imagination.

    > You are suggesting to ignore padding bytes with a dummy reading/writing
    > cycle. And you use offsetof() macro as I did in my last piece of code.
    > Differently from you, I ignore padding bytes through lseek().


    I'm not ignoring the padding bytes at all: I'm explicitly reading
    and writing them.

    When you use fseek() to position past the end of an output file,
    it's not clear what will happen; the C Standard is silent. Yes, you
    say you're using lseek() rather than fseek() -- but you've also said
    you're using some kind of embedded system whose emulation of other
    standards (like POSIX) may be less than perfectly POSIX-faithful in
    corner cases. My advice is to deal with the bytes explicitly (there
    will be only a few of them, after all) rather than to explore those
    corners too assiduously.

    If you want further advice on how to use POSIX functions, try a
    POSIX-oriented forum.

    >>> What do you think? Do you have other better suggestions?

    >>
    >> Design a better configuration file format. Seriously. You
    >> are in this bind and going to all this work *because* you've got
    >> an on-disk image of an in-memory object, and because the in-memory
    >> object's form is subject to incompatible changes.If you had
    >> written the data field-by-field in the first place you would not
    >> need to worry about padding bytes.

    >
    > You're right, but the temptation to read()/write() the entire
    > configuration in a gulp was too strong.
    > [...]
    >> your difficulties seem mostly self-inflicted.


    Q.E.D.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 4, 2011
    #2
    1. Advertising

  3. Eric Sosman

    pozz Guest

    On 4 Set, 17:19, Eric Sosman <> wrote:
    > On 9/4/2011 10:53 AM, pozz wrote:
    > > Il 03/09/2011 16:18, Eric Sosman ha scritto:
    > >> On 9/3/2011 9:04 AM, pozz wrote:
    > >>> [...]
    > >>> remove("config.dat");

    >
    > >> Aside: You may live to regret this. What if the system crashes
    > >> just after removing the old configuration file but before creating
    > >> the new one? It might be better to write the new data to "config.tmp"
    > >> and then remove("config.dat"), rename("config.tmp", "config.dat")
    > >> once you're sure the new data has been safely written. Better still:

    >
    > >> /* ... write "config.tmp" ... */
    > >> remove("config.bak");
    > >> rename("config.dat", "config.bak");
    > >> rename("config.tmp", "config.dat");

    >
    > >> ... and still more elaborate schemes are possible.

    >
    > > This is another good suggestion. Anyway, even in your sequence of
    > > instructions there is a weakness of the same type. If the system crashes
    > > just after the first rename(), you won't have any "config.dat" file. Of
    > > course, with your approach the probability of "bad crashes" is greatly
    > > reduced.

    >
    >      A crucial difference is that in your original version the data
    > is gone forever, while in mine it can be recovered by renaming the
    > "config.bak" file.  Which would you rather be faced with: "It'll take
    > a moment and some manual intervention to restore your records," or
    > "Account balance?  What account balance?  We have no record that you
    > have ever done business with this bank."


    Sure, I haven't thought to check also for "config.bak"!


    > > Consider that CONFIG and CONFIGOLD structures are only examples that
    > > show what could typically happen with a new software version: some field
    > > could be inserted in the middle of the structure, some array could be
    > > expanded, some sub-structure could change.

    >
    >      If you want to make trouble for yourself, the amount of trouble
    > you can get into is limited only by your own imagination.


    :)
    In the past and for a single software, I changed the configuration
    structure with each upgrade. And the changes could happen everywhere
    in the
    structure.


    > > You are suggesting to ignore padding bytes with a dummy reading/writing
    > > cycle. And you use offsetof() macro as I did in my last piece of code.
    > > Differently from you, I ignore padding bytes through lseek().

    >
    >      I'm not ignoring the padding bytes at all: I'm explicitly reading
    > and writing them.
    >
    >      When you use fseek() to position past the end of an output file,
    > it's not clear what will happen; the C Standard is silent.  Yes, you
    > say you're using lseek() rather than fseek() -- but you've also said
    > you're using some kind of embedded system whose emulation of other
    > standards (like POSIX) may be less than perfectly POSIX-faithful in
    > corner cases.  My advice is to deal with the bytes explicitly (there
    > will be only a few of them, after all) rather than to explore those
    > corners too assiduously.


    I was reading glibc reference manual and I thought the behaviour of
    lseek()
    when setting a position after the end of file was standardized, but I
    was
    wrong.
    In this case, your solution (dummy loops for reading/writing padding
    bytes)
    is more portable than mine.


    >      If you want further advice on how to use POSIX functions, try a
    > POSIX-oriented forum.
    >
    > >>> What do you think? Do you have other better suggestions?

    >
    > >> Design a better configuration file format. Seriously. You
    > >> are in this bind and going to all this work *because* you've got
    > >> an on-disk image of an in-memory object, and because the in-memory
    > >> object's form is subject to incompatible changes.If you had
    > >> written the data field-by-field in the first place you would not
    > >> need to worry about padding bytes.

    >
    > > You're right, but the temptation to read()/write() the entire
    > > configuration in a gulp was too strong.
    > > [...]
    > >> your difficulties seem mostly self-inflicted.

    >
    >      Q.E.D.


    Ok, ok, I'm a masochist :)

    Anyway, do you have suggestions for a good file format that doesn't
    waste
    too much memory space (considering my small non-volatile memory) and
    is
    fast to read/write, even for a single field?
    pozz, Sep 5, 2011
    #3
  4. Eric Sosman

    Eric Sosman Guest

    On 9/5/2011 7:16 AM, pozz wrote:
    > [...]
    > Anyway, do you have suggestions for a good file format that doesn't
    > waste
    > too much memory space (considering my small non-volatile memory) and
    > is
    > fast to read/write, even for a single field?


    The main point is that the file format and the in-memory format
    need not resemble each other. The file and an in-memory object will
    hold "the same information," but need not represent it the same way.
    Look at some of the configuration files on your own system: Your
    browser's bookmarks or public-key certificates, for example. Do you
    think the browser's in-memory version of that information is an
    image of the file it came from?

    I'm not going to offer specific suggestions about file formats,
    because you've revealed next to nothing about your circumstances:
    structs with elements like DUMMY and FOO and FOOOLD and BAR and
    NEWELEM do not convey much information. (I sort of imagine that may
    be intentional: You don't want to drop too many hints about the super-
    secret project you're engaged in, so you've filed off all the serial
    numbers. Fair enough.) Choose a scheme that can represent whatever
    information you need, and that you think you'll be able to extend
    compatibly to represent the kinds of changes you might want to make
    in future releases. You needn't go all the way to XML, but some
    advantages accrue if you adopt a format that's already in use: Tools
    for reading and writing JSON, for example, are easily found. If you
    choose a binary format, choose a format that you've designed for your
    own needs, not "whatever the compiler's whim happens to be."

    A comment about "fast to read/write," though: It seems odd that
    you'd worry about speed in this context. Things like configuration
    files are (typically) read once at start-up, perhaps re-written at
    shutdown, and possibly written a few more times at "checkpoint/save"
    intervals. If the accesses are infrequent, their speed is usually
    not critical.

    --
    Eric Sosman
    d
    Eric Sosman, Sep 5, 2011
    #4
  5. Eric Sosman

    Eric Sosman Guest

    On 9/5/2011 5:22 PM, pozz wrote:
    > Il 05/09/2011 13:47, Eric Sosman ha scritto:
    >>
    >> The main point is that the file format and the in-memory format
    >> need not resemble each other.

    > [...]
    > After the discussion with you, now I definitevely understand this point.
    > [...]
    > Don't you think JSON waste too much space (it's a text file format),
    > considering 8KB memory?


    Perhaps your understanding of the point could still be improved.

    > You're right. I wanted to write simple (so compact code) and fast to be
    > executed on a small microcontroller. The overall configuration reading
    > or writing can last even one second, without problems.
    >
    > If the user changes one parameter on the display, the program enters the
    > function to save configuration. It'd be nice if I can chnage only that
    > parameter in the configuration file, without writing all the parameters.
    > The configuration writing function is a blocking function and the user
    > perception could be very bad if it lasts too long.


    <topicality "marginal">

    I'm not sure what kind of storage device you're using, but it's
    quite likely that writing one field could take longer than writing an
    entire smallish file. Many storage devices perform I/O in units of
    "sectors" or "blocks" of a size somewhere between half a K and eight K,
    maybe more. If you want to overwrite eight bytes in the middle of such
    a block while leaving its neighbors undisturbed, the system must read
    the old data, stuff your eight bytes into the buffer, and write it all
    back out -- two physical I/O operations. If you just write the whole
    business, you can probably do the job with one I/O since anything
    already in the file can just be abandoned.

    (It's not quite as stark as one-versus-two, since there will surely
    be additional I/O's for file system housekeeping. But it'll very
    likely be N-versus-(N+1) for N ~= half a dozen, so your attempt at
    optimization may slow things down by something like 10-20%. Cache
    effects make the picture cloudier still. But don't just assume that
    writing "less payload" automatically means "faster." If you care about
    the answer, measure it!)

    </topicality>

    --
    Eric Sosman
    d
    Eric Sosman, Sep 5, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Chris Fogelklou
    Replies:
    36
    Views:
    1,345
    Chris Fogelklou
    Apr 20, 2004
  2. Alex
    Replies:
    2
    Views:
    1,199
  3. Replies:
    26
    Views:
    2,083
    Roland Pibinger
    Sep 1, 2006
  4. NotGiven
    Replies:
    3
    Views:
    329
    Michael D. Kersey
    May 13, 2004
  5. VUNETdotUS
    Replies:
    25
    Views:
    431
    Thomas 'PointedEars' Lahn
    Nov 10, 2007
Loading...

Share This Page