problem reading/writing structures from and to files

Discussion in 'C Programming' started by arne.muller@gmail.com, Oct 6, 2006.

  1. Guest

    Hello,

    I've come across some problems reading strucutres from binary files.
    Basically I've some strutures

    typedef struct {
    int i;
    double x;
    int n;
    double *mz;
    short *intens;
    } Data;

    I've an array of these structures and their mz and intens pointers
    point to arrays with n elements each.

    My progam can write the Data array into a binary file, after writing
    the structure itself (using fwrite) it fwrites (appends) the mz and
    then the itnens arrays. I don't need to exachange this datafile between
    machines, but another program (on the same machine, compiled with the
    same compiler) needs to read this file frequently.

    My idea is to read this file in one go into memory using fread (I could
    even use mmap, since this file is accessed by several processes read
    only), and then to reconstruct the mz and intens pointers properly. An
    iterator would then fetch the next Data element (returning a pointer to
    it). Well, this is were I got stuck, how do I best access this chunk of
    memory to reconstruct the Data structure. Something like this?

    ....
    the pointer returned by mmap:
    void* mem;

    the filesize in bytes
    size_t filesize;
    ....

    Data* nextEntry() {

    static bytes_read = 0;
    data *d = NULL
    size_t n = 0;

    /* is there still so,ething to reqd ? */
    if ( filesize < bytes_read ) {
    d = (Data*)&(((char*)mem)[bytes_read]);
    bytes_read += sizeof(mem); /* jump to mz part */
    d->mz = (double*)&(((char*)mem)[bytes_read]); /* re-assign mz */
    bytes_read += sizeof(double) * d->n; /* jump to intens part */
    d->intens = (short*)&(((char*)mem)[bytes_read]); /* re-assign */
    bytes_read += sizeof(short) * s->n; /* jump to next Data field */
    }

    return d;
    }

    This is not the actual imlpementation I use, but it's the same
    principle (simplified). I'm not feeling good using all these casts ...
    .. The code needs to be portable (but not the binary file itself, it
    will always stay on the same machine). This works under Linux, but
    gives me 'Unaligned access' (Tru64) or a Bus Error (SunOS). So I guess
    this code is not realy prtable ... :-(

    Any hints how get this working?

    thanks a lot for your help,

    Arne
    , Oct 6, 2006
    #1
    1. Advertising

  2. wrote:
    > Hello,
    >
    > I've come across some problems reading strucutres from binary files.
    > Basically I've some strutures
    >
    > typedef struct {
    > int i;
    > double x;
    > int n;
    > double *mz;
    > short *intens;
    > } Data;


    Aym, ay, ay, caramba! I doubt if the C standard lets you read or
    write pointers-- they're supposed to be valid only in the time and
    space of one invocation of one program.

    You can fix this two ways:

    (1) Chaneg the pointers to be integer indexes into a big array of
    doubles and another of shorts.
    write your own Doublemalloc() which just returns the next free index in
    the array.

    (2) Keep your blasterd pointers, but when it comes time to write out
    the file, copy the pointed to values to an array (same as (1)) and
    replace the pointers with the array indexes.
    Ancient_Hacker, Oct 6, 2006
    #2
    1. Advertising

  3. Guest

    Ancient_Hacker wrote:
    > wrote:
    > > Hello,
    > >
    > > I've come across some problems reading strucutres from binary files.
    > > Basically I've some strutures
    > >
    > > typedef struct {
    > > int i;
    > > double x;
    > > int n;
    > > double *mz;
    > > short *intens;
    > > } Data;

    >
    > Aym, ay, ay, caramba! I doubt if the C standard lets you read or
    > write pointers-- they're supposed to be valid only in the time and
    > space of one invocation of one program.


    yes, but fwrite takes just a void pointer an 'object' and writes it, it
    shouldn't look inside the structure. The pointers stored in the file
    are meaningless, and that's why I try to reconstruct the correct
    address after wards.

    >
    > You can fix this two ways:
    >
    > (1) Chaneg the pointers to be integer indexes into a big array of
    > doubles and another of shorts.
    > write your own Doublemalloc() which just returns the next free index in
    > the array.
    >
    > (2) Keep your blasterd pointers, but when it comes time to write out
    > the file, copy the pointed to values to an array (same as (1)) and
    > replace the pointers with the array indexes.


    I thkn I'll try that one!

    thanks,

    Arne
    , Oct 6, 2006
    #3
  4. Snis Pilbor Guest

    wrote:
    > Hello,
    >
    > I've come across some problems reading strucutres from binary files.
    > Basically I've some strutures
    >
    > (snip very complicated code to write the raw data in a structure verbatim into a file)
    >


    Writing structures bit-for-bit into a file is always a terrible idea.
    It will make the file extremely unportable. Unportable between
    systems, even unportable between compilations. Certainly any time you
    alter the structure itself and recompile, all earlier files will become
    garbage. And if you're writing pointers verbatim as binary data, then
    the file may not even work right between different runs of the exact
    same executable, making your file next to worthless.

    It's more work, but it's virtually always worthwhile to instead write
    individual elements of the structure to the file in a carefully
    formatted way. Now, if your structure contains pointers, chances are
    those pointers point to other types of structures which, assuming they
    too are preserved in files or at least hardcoded into your program, you
    should be able to look up by text name or identifier number or
    something like that. For instance if your thingies contain linked
    lists of widgets, then load the widget file first, including a name or
    id number for each widget, THEN load your thingy file. Finally, in
    case where structures point to other structures from the same file, or
    in case different structures point to eachother so neither is truly
    "better to load first", then you should do some acrobatics, like
    temporarily just save the names or id numbers while loading both files,
    then after both are fully loaded, only then work out what the pointers
    should point to.
    Snis Pilbor, Oct 6, 2006
    #4
  5. On 6 Oct 2006 09:47:44 -0700, wrote:

    >Hello,
    >
    >I've come across some problems reading strucutres from binary files.
    >Basically I've some strutures
    >
    >typedef struct {
    > int i;
    > double x;
    > int n;
    > double *mz;
    > short *intens;
    >} Data;
    >
    >I've an array of these structures and their mz and intens pointers
    >point to arrays with n elements each.
    >
    >My progam can write the Data array into a binary file, after writing
    >the structure itself (using fwrite) it fwrites (appends) the mz and
    >then the itnens arrays. I don't need to exachange this datafile between
    >machines, but another program (on the same machine, compiled with the
    >same compiler) needs to read this file frequently.
    >
    >My idea is to read this file in one go into memory using fread (I could
    >even use mmap, since this file is accessed by several processes read


    mmap is not a standard function so I have no idea what it does for or
    to you.

    >only), and then to reconstruct the mz and intens pointers properly. An
    >iterator would then fetch the next Data element (returning a pointer to


    C doesn't have iterators. What does next Data element mean?

    >it). Well, this is were I got stuck, how do I best access this chunk of
    >memory to reconstruct the Data structure. Something like this?


    If your description of the file contents is correct, you only have one
    set of data in the file. While it is possible to read the entire set
    of data in a single fread, be aware that this could lead to some
    alignment issues if sizeof(Data) is not a multiple of sizeof(double)
    or sizeof(double) is not a multiple of sizeof(short). I think they
    will be but I'm not certain and I don't think the standard guarantees
    the second. If Data did not contain a member of type double, I would
    not depend on it at all.

    If all the sizeof's are proper multiples, a single fread into a large
    dynamically allocated buffer would tell you how many bytes were read
    and insure that the data was properly aligned. The "array" of doubles
    would start sizeof(Data) bytes into the buffer, basically immediately
    following the struct. You would set mz to this address. Something in
    Data must tell you many doubles there are (N). The remaining bytes
    (number read - sizeof(Data) - N*sizeof(double)) are the shorts. The
    shorts start N*sizeof(double) beyond the value in mz and you would
    store this address in intens.

    If the sizeof's do not cooperate, use fread to read the struct into a
    properly aligned buffer (obviously sizeof(Data) bytes). Something in
    the struct must tell you how many doubles (N) follow. Use fread to
    read them into another properly aligned buffer (obviously
    N*sizeof(double) bytes) and set mz to the address of this buffer. You
    can then read the remaining data into a third properly aligned buffer
    and set intens to its address. The number of bytes read divided by
    sizeof(short) gives you the number of object read.

    snip mmap code


    Remove del for email
    Barry Schwarz, Oct 7, 2006
    #5
  6. Chris Torek Guest

    In article <>
    <> wrote:
    >I've come across some problems reading strucutres from binary files.


    As others have cautioned, it is often wise to use something other
    than "raw binary" format for data files. Problems that were
    guaranteed to run on a single machine seem often to expand as
    if by magic and suddenly require a heterogenous network. :)

    That said:

    >typedef struct {
    > int i;
    > double x;
    > int n;
    > double *mz;
    > short *intens;
    >} Data;
    >
    >I've an array of these structures and their mz and intens pointers
    >point to arrays with n elements each.
    >
    >My progam can write the Data array into a binary file, after writing
    >the structure itself (using fwrite) it fwrites (appends) the mz and
    >then the itnens arrays.


    In other words, you use fwrite() to write out the i, x, and n
    fields (which you need in the file) plus also the "mz" and "intens"
    fields (which you do *not* need in the file, since they have to
    be replaced on subsequent "re-load-from-file" runs):

    Data *p;
    FILE *somefile;
    ... set up p, p->i, p->x, p->n, etc ...

    somefile = fopen("somename", "wb");
    if (somefile == NULL)
    ... handle error ...

    /* possible additional code here */

    if (fwrite(p, sizeof *p, 1, somefile) != 1)
    ... handle error ...
    if (fwrite(p->mz, sizeof *p->mz, p->n, somefile) != p->n)
    ... handle error ...
    if (fwrite(p->intens, sizeof *p->intens, p->n, somefile) != p->n)
    ... handle error ...

    This code is OK, although the initial fwrite() -- writing bytes
    from (void *)p for length sizeof *p -- writes three useful values
    and two useless ones. It would be "better" (in some sense) to
    write only the useful values, by replacing the first fwrite()
    with three separate fwrite()s:

    if (fwrite(&p->i, sizeof p->i, 1, somefile) != 1 ||
    fwrite(&p->x, sizeof p->x, 1, somefile) != 1 ||
    fwrite(&p->n, sizeof p->n, 1, somefile) != 1)
    ... handle error ...

    >My idea is to read this file in one go into memory using fread ...


    The simplest way to read it back is to use as many fread()s as
    fwrite()s above:

    Data *p;
    FILE *somefile;
    ...
    p = malloc(sizeof *p);
    if (p == NULL)
    ... handle error ...
    somefile = fopen("somename", "rb");
    if (somefile == NULL)
    ... handle error ...

    /* assuming three separate fwrite()s for the useful elements: */
    if (fread(&p->i, sizeof p->i, 1, somefile) != 1 ||
    fread(&p->x, sizeof p->x, 1, somefile) != 1 ||
    fread(&p->n, sizeof p->n, 1, somefile) != 1)

    /* insert range-checking on i, x, and p here if desired,
    to validate the input data */

    if ((p->mz = malloc(p->n * sizeof *p->mz)) == NULL)
    ... handle error ...
    if ((p->intens = malloc(p->n * sizeof *p->intens)) == NULL)
    ... handle error ...
    if (fread(p->mz, sizeof *p->mz, p->n, somefile) != p->n)
    ... handle error ...
    if (fread(p->intens, sizeof *p->intens, p->n, somefile) != p->n)
    ... handle error ...

    >(I could even use mmap, since this file is accessed by several
    >processes read only),


    The mmap() routines are dangerously seductive. Using them ties
    your code and data to OS- and machine-dependent items, and makes
    it error-prone in ways that are not always obvious on first blush.
    (For instance, the really odd one is what happens if the file is
    truncated after successfully mapping it.)

    >and then to reconstruct the mz and intens pointers properly.


    If you omit them when writing, you can omit them when reading
    back, as above.

    While mmap() avoids what some people call "unnecessary" copying of
    the data (during I/O), that very copying is what makes the code
    above so simple and reliable. Often, the simplicity is worth the
    performance penalty. (If it is not, one can always complextify
    the code later. :) )
    --
    In-Real-Life: Chris Torek, Wind River Systems
    Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
    email: forget about it http://web.torek.net/torek/index.html
    Reading email is like searching for food in the garbage, thanks to spammers.
    Chris Torek, Oct 8, 2006
    #6
  7. Joe Wright Guest

    Chris Torek wrote:
    > In article <>
    > <> wrote:
    >> I've come across some problems reading strucutres from binary files.

    >
    > As others have cautioned, it is often wise to use something other
    > than "raw binary" format for data files. Problems that were
    > guaranteed to run on a single machine seem often to expand as
    > if by magic and suddenly require a heterogenous network. :)
    >
    > That said:
    >
    >> typedef struct {
    >> int i;
    >> double x;
    >> int n;
    >> double *mz;
    >> short *intens;
    >> } Data;
    >>
    >> I've an array of these structures and their mz and intens pointers
    >> point to arrays with n elements each.
    >>
    >> My progam can write the Data array into a binary file, after writing
    >> the structure itself (using fwrite) it fwrites (appends) the mz and
    >> then the itnens arrays.

    >
    > In other words, you use fwrite() to write out the i, x, and n
    > fields (which you need in the file) plus also the "mz" and "intens"
    > fields (which you do *not* need in the file, since they have to
    > be replaced on subsequent "re-load-from-file" runs):
    >
    > Data *p;
    > FILE *somefile;
    > ... set up p, p->i, p->x, p->n, etc ...
    >
    > somefile = fopen("somename", "wb");
    > if (somefile == NULL)
    > ... handle error ...
    >
    > /* possible additional code here */
    >
    > if (fwrite(p, sizeof *p, 1, somefile) != 1)
    > ... handle error ...
    > if (fwrite(p->mz, sizeof *p->mz, p->n, somefile) != p->n)
    > ... handle error ...
    > if (fwrite(p->intens, sizeof *p->intens, p->n, somefile) != p->n)
    > ... handle error ...
    >
    > This code is OK, although the initial fwrite() -- writing bytes
    > from (void *)p for length sizeof *p -- writes three useful values
    > and two useless ones. It would be "better" (in some sense) to
    > write only the useful values, by replacing the first fwrite()
    > with three separate fwrite()s:
    >
    > if (fwrite(&p->i, sizeof p->i, 1, somefile) != 1 ||
    > fwrite(&p->x, sizeof p->x, 1, somefile) != 1 ||
    > fwrite(&p->n, sizeof p->n, 1, somefile) != 1)
    > ... handle error ...
    >
    >> My idea is to read this file in one go into memory using fread ...

    >
    > The simplest way to read it back is to use as many fread()s as
    > fwrite()s above:
    >
    > Data *p;
    > FILE *somefile;
    > ...
    > p = malloc(sizeof *p);
    > if (p == NULL)
    > ... handle error ...
    > somefile = fopen("somename", "rb");
    > if (somefile == NULL)
    > ... handle error ...
    >
    > /* assuming three separate fwrite()s for the useful elements: */
    > if (fread(&p->i, sizeof p->i, 1, somefile) != 1 ||
    > fread(&p->x, sizeof p->x, 1, somefile) != 1 ||
    > fread(&p->n, sizeof p->n, 1, somefile) != 1)
    >
    > /* insert range-checking on i, x, and p here if desired,
    > to validate the input data */
    >
    > if ((p->mz = malloc(p->n * sizeof *p->mz)) == NULL)
    > ... handle error ...
    > if ((p->intens = malloc(p->n * sizeof *p->intens)) == NULL)
    > ... handle error ...
    > if (fread(p->mz, sizeof *p->mz, p->n, somefile) != p->n)
    > ... handle error ...
    > if (fread(p->intens, sizeof *p->intens, p->n, somefile) != p->n)
    > ... handle error ...
    >
    >> (I could even use mmap, since this file is accessed by several
    >> processes read only),

    >
    > The mmap() routines are dangerously seductive. Using them ties
    > your code and data to OS- and machine-dependent items, and makes
    > it error-prone in ways that are not always obvious on first blush.
    > (For instance, the really odd one is what happens if the file is
    > truncated after successfully mapping it.)
    >
    >> and then to reconstruct the mz and intens pointers properly.

    >
    > If you omit them when writing, you can omit them when reading
    > back, as above.
    >
    > While mmap() avoids what some people call "unnecessary" copying of
    > the data (during I/O), that very copying is what makes the code
    > above so simple and reliable. Often, the simplicity is worth the
    > performance penalty. (If it is not, one can always complextify
    > the code later. :) )


    Combining C structures and data and writing it to a file, then reading
    that file back into memory, is non-trivial.

    I invite you all to examine the .DBF file structure of dBASE or FoxPro
    or Clipper. The file consists of a binary header (certainly a C-like
    structure) describing record length, number of rows and such. Then
    another array of structures describing the attributes of each column in
    a row. The remainder of the .dbf file is text which begins at an offset
    defined in the header and continues for cols * rows bytes, ending with
    the ever-popular 0x1A byte.

    I have worked with this structure for more than 20 years now. I like it.
    Ten years ago I began writing C programs to manipulate .dbf files.
    Doable of course but not 'simple' by any means.

    Attempts to write structures and data to a single file and then read the
    file and data in a meaningful way will prove to be non-trivial.

    Simpler is better. Define your data in terms of columns per row and rows
    per file, and write it in text, not binary. Beware the Endians.

    --
    Joe Wright
    "Everything should be made as simple as possible, but not simpler."
    --- Albert Einstein ---
    Joe Wright, Oct 9, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. TC
    Replies:
    3
    Views:
    26,092
    jessica
    May 19, 2004
  2. tweak
    Replies:
    14
    Views:
    2,767
    Eric Sosman
    Jun 11, 2004
  3. Replies:
    8
    Views:
    353
    Roel Schroeven
    Aug 4, 2006
  4. Alfonso Morra
    Replies:
    11
    Views:
    703
    Emmanuel Delahaye
    Sep 24, 2005
  5. Replies:
    3
    Views:
    167
    Bob Barrows [MVP]
    Jan 25, 2006
Loading...

Share This Page