How to fread ((void *)header, sizeof (struct Sheader), 1, file); in C++

H

Hetware

I have some ancient c code that reads files stored in raw binary
format. The first 216 bytes are to be read into an instance of:

struct Sheader
{
long attr_start;
long form_start;
short fsize;
short attr_size;
short lastfieldidx;
short num_inputs;
char oldheader[200];
};

The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file); Is there a better way to do this in C++?
 
I

Ian Collins

I have some ancient c code that reads files stored in raw binary
format. The first 216 bytes are to be read into an instance of:

struct Sheader
{
long attr_start;
long form_start;
short fsize;
short attr_size;
short lastfieldidx;
short num_inputs;
char oldheader[200];
};

The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file); Is there a better way to do this in C++?

Not really (but you don't need 'struct" in there), it's awful in C and
it will still be awful in C++!
 
H

Hetware

On 07/16/11 10:36 AM, Hetware wrote:
I have some ancient c code that reads files stored in raw binary
format.  The first 216 bytes are to be read into an instance of:
struct Sheader
{
   long    attr_start;
   long    form_start;
   short   fsize;
   short   attr_size;
   short   lastfieldidx;
   short   num_inputs;
   char    oldheader[200];
};
The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file);  Is there a better way to do this in C++?

Not really (but you don't need 'struct" in there), it's awful in C and
it will still be awful in C++!

I'm not sure why struct is "awful". At this point, its not a question
of whether I like it or not. I just need a way to get the first bytes
of the file read in as POD. I actually did make Sheader into a class,
but that really makes little difference. The following doesn't appear
to get the correct data into the class members:

#include <iostream>

class Header {
public:
Header(){}
Header(std::istream& in);
std::eek:stream& print(std::eek:stream& out);
private:
long attr_start;
long form_start;
short fsize;
short attr_size;
short lastfieldidx;
short num_inputs;
char oldheader[200];

};


Header::Header(std::istream& in) {
in>>attr_start}

std::eek:stream& Header::print(std::eek:stream& out) {

return out
<<"attr_start = " << (long)attr_start << std::endl
<<"form_start = " << (long)form_start << std::endl
<<"fsize = " << (short)fsize << std::endl
<<"attr_size = " << (short)attr_size << std::endl
<<"lastfieldidx = " << (short)lastfieldidx << std::endl
<<"num_inputs = " << (short)num_inputs << std::endl;

//oldheader[200];

}

#include <fstream>
#include <sstream>

int main(int argc, char* argv[]) {
if (argc < 1) { std::cerr<<"enter a file name"<<std::endl; return
-1; }

std::ifstream inf(argv[1]);
if(inf) {
Header header(inf);
header.print(std::cout);
return 0;
}
}
 
H

Hetware

I have some ancient c code that reads files stored in raw binary
format.  The first 216 bytes are to be read into an instance of:
struct Sheader
{
   long    attr_start;
   long    form_start;
   short   fsize;
   short   attr_size;
   short   lastfieldidx;
   short   num_inputs;
   char    oldheader[200];
};
The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file);  Is there a better way to do this in C++?
Not really (but you don't need 'struct" in there), it's awful in C and
it will still be awful in C++!

I'm not sure why struct is "awful".  At this point, its not a question
of whether I like it or not.  I just need a way to get the first bytes
of the file read in as POD.  I actually did make Sheader into a class,
but that really makes little difference.  The following doesn't appear
to get the correct data into the class members:

#include <iostream>

class Header {
public:
  Header(){}
  Header(std::istream& in);
  std::eek:stream& print(std::eek:stream& out);
private:
  long  attr_start;
  long  form_start;
  short fsize;
  short attr_size;
  short lastfieldidx;
  short num_inputs;
  char  oldheader[200];

};

Header::Header(std::istream& in) {
  in>>attr_start
    >>form_start
    >>fsize
    >>attr_size
    >>lastfieldidx
    >>num_inputs;

}

std::eek:stream& Header::print(std::eek:stream& out) {

  return out
    <<"attr_start   = " << (long)attr_start   << std::endl
    <<"form_start   = " << (long)form_start   << std::endl
    <<"fsize        = " << (short)fsize        << std::endl
    <<"attr_size    = " << (short)attr_size    << std::endl
    <<"lastfieldidx = " << (short)lastfieldidx << std::endl
    <<"num_inputs   = " << (short)num_inputs   << std::endl;

    //oldheader[200];

}

#include <fstream>
#include <sstream>

int main(int argc, char* argv[]) {
  if (argc < 1) { std::cerr<<"enter a file name"<<std::endl; return
-1; }

  std::ifstream inf(argv[1]);
  if(inf) {
    Header header(inf);
    header.print(std::cout);
    return 0;
  }







}
This appears to work, but it sure is brutal:

Header::Header(std::istream& in) {
in.read( reinterpret_cast<char*>(this), sizeof(*this) );
}
 
I

Ian Collins

On 07/16/11 10:36 AM, Hetware wrote:
I have some ancient c code that reads files stored in raw binary
format. The first 216 bytes are to be read into an instance of:
struct Sheader
{
long attr_start;
long form_start;
short fsize;
short attr_size;
short lastfieldidx;
short num_inputs;
char oldheader[200];
};
The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file); Is there a better way to do this in C++?

Not really (but you don't need 'struct" in there), it's awful in C and
it will still be awful in C++!

I'm not sure why struct is "awful".

Storing raw binary is awful!
At this point, its not a question
of whether I like it or not. I just need a way to get the first bytes
of the file read in as POD. I actually did make Sheader into a class,
but that really makes little difference. The following doesn't appear
to get the correct data into the class members:

#include<iostream>

class Header {
public:
Header(){}
Header(std::istream& in);
std::eek:stream& print(std::eek:stream& out);
private:
long attr_start;
long form_start;
short fsize;
short attr_size;
short lastfieldidx;
short num_inputs;
char oldheader[200];

};


Header::Header(std::istream& in) {
in>>attr_start}

You are using formatted input operators to read raw data (which is one
reason raw data is evil).

If you have to read in the raw binary data, you have to use in.read(),
which will look pretty much the same as the C code.
 
H

Hetware

On 07/16/11 10:36 AM, Hetware wrote:
I have some ancient c code that reads files stored in raw binary
format.  The first 216 bytes are to be read into an instance of:
struct Sheader
{
    long    attr_start;
    long    form_start;
    short   fsize;
    short   attr_size;
    short   lastfieldidx;
    short   num_inputs;
    char    oldheader[200];
};
The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file);  Is there a better way to do this in C++?
Not really (but you don't need 'struct" in there), it's awful in C and
it will still be awful in C++!
I'm not sure why struct is "awful".

Storing raw binary is awful!








At this point, its not a question
of whether I like it or not.  I just need a way to get the first bytes
of the file read in as POD.  I actually did make Sheader into a class,
but that really makes little difference.  The following doesn't appear
to get the correct data into the class members:
#include<iostream>

class Header {
public:
   Header(){}
   Header(std::istream&  in);
   std::eek:stream&  print(std::eek:stream&  out);
private:
   long    attr_start;
   long    form_start;
   short   fsize;
   short   attr_size;
   short   lastfieldidx;
   short num_inputs;
   char    oldheader[200];

Header::Header(std::istream&  in) {
   in>>attr_start
     >>form_start
     >>fsize
     >>attr_size
     >>lastfieldidx
     >>num_inputs;
}

You are using formatted input operators to read raw data (which is one
reason raw data is evil).

If you have to read in the raw binary data, you have to use in.read(),
which will look pretty much the same as the C code.

This code was delivered in 1993. Some of it is over 20 years old. I
don't have much say in how it is implemented. No one seems to know
how it works. That's why I'm trying to pick it apart. It's still in
production, so despite all of it's warts, it has done the job it was
intended for.
 
A

Arvind

I have some ancient c code that reads files stored in raw binary
format.  The first 216 bytes are to be read into an instance of:
struct Sheader
{
   long    attr_start;
   long    form_start;
   short   fsize;
   short   attr_size;
   short   lastfieldidx;
   short   num_inputs;
   char    oldheader[200];
};
The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file);  Is there a better way to do this in C++?
Not really (but you don't need 'struct" in there), it's awful in C and
it will still be awful in C++!

I'm not sure why struct is "awful".  At this point, its not a question
of whether I like it or not.  I just need a way to get the first bytes
of the file read in as POD.  I actually did make Sheader into a class,
but that really makes little difference.  The following doesn't appear
to get the correct data into the class members:

#include <iostream>

class Header {
public:
  Header(){}
  Header(std::istream& in);
  std::eek:stream& print(std::eek:stream& out);
private:
  long  attr_start;
  long  form_start;
  short fsize;
  short attr_size;
  short lastfieldidx;
  short num_inputs;
  char  oldheader[200];

};

Header::Header(std::istream& in) {
  in>>attr_start
    >>form_start
    >>fsize
    >>attr_size
    >>lastfieldidx
    >>num_inputs;

}

std::eek:stream& Header::print(std::eek:stream& out) {

  return out
    <<"attr_start   = " << (long)attr_start   << std::endl
    <<"form_start   = " << (long)form_start   << std::endl
    <<"fsize        = " << (short)fsize        << std::endl
    <<"attr_size    = " << (short)attr_size    << std::endl
    <<"lastfieldidx = " << (short)lastfieldidx << std::endl
    <<"num_inputs   = " << (short)num_inputs   << std::endl;

    //oldheader[200];

}

#include <fstream>
#include <sstream>

int main(int argc, char* argv[]) {
  if (argc < 1) { std::cerr<<"enter a file name"<<std::endl; return
-1; }

  std::ifstream inf(argv[1]);
  if(inf) {
    Header header(inf);
    header.print(std::cout);
    return 0;
  }







}

First of all you need to open file in binary mode (ios::binary) and
use read function inplace of extraction operator (>>). Then it will
work as it was working in your old 'C' code.
 
K

Krice

The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file);  Is there a better way to do this in C++?

No. But it could be possible to convert the old data to more C++
readable form. That way you could crush two bugs at the same time:
dump old binary data and make better, more portable loader for
the data.
 
N

Nobody

I'm not sure why struct is "awful".

Because the size, alignment, and representation (byte-order,
ones-complement vs twos-complement vs sign-bit, etc) of fundamental types
are implementation-dependent, as are the rules which determine the layout
of a structure.
 
P

Paul N

I just need a way to get the first bytes
of the file read in as POD.  I actually did make Sheader into a class,
but that really makes little difference.  The following doesn't appear
to get the correct data into the class members:

#include <iostream>

class Header {
public:
  Header(){}
  Header(std::istream& in);
  std::eek:stream& print(std::eek:stream& out);
private:
  long  attr_start;
  long  form_start;
  short fsize;
  short attr_size;
  short lastfieldidx;
  short num_inputs;
  char  oldheader[200];

};

As others have said, it's a bit risky reading a struct in one go as a
chuck of binary. But I think you're asking for trouble if, as you do
above, you then change what the struct is. I'd be inclined to leave
the struct exacty how it was, and if you want it to do more things
then create a class to do these things that has the original struct as
a member.

The reliable thing to do would instead be to read in the data as a
succession of values and do the right thing with each one in turn, but
this is probably unjustified.
 
J

Juha Nieminen

Paavo Helde said:
There is nothing wrong with fread() itself. However, reading directly
into a struct is not so good idea, this may easily break down with the
next compiler version or platform change.

It's unlikely that a simple compiler version change will change the
geometry of a struct (because it would most probably break many things
even in the standard C library). However, it's true that a change in
platform (eg. 32-bit vs 64-bit, or eg. Intel vs. UltraSparc) will probably
break it.
 
H

hanukas

I have some ancient c code that reads files stored in raw binary
format.  The first 216 bytes are to be read into an instance of:

struct Sheader
{
        long    attr_start;
        long    form_start;
        short   fsize;
        short   attr_size;
        short   lastfieldidx;
        short   num_inputs;
        char    oldheader[200];

};

The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file);  Is there a better way to do this in C++?

You have to parse the data in the file into the struct; for that you
have many alternatives. Reading from the file byte-by-byte is rubbish
and not recommended. Allocate the buffer which is size of the header:

char* buffer = new char[216];

Then fill the buffer with block read. fread works just fine. Or, if
you have a framework to use which has memory mapped I/O, even better:
just lock the region of the file (the first 216 bytes?) and use the
acquired pointer to the mapped region.

Either way, you will end up having a pointer:

char* buffer; // points to the 216 bytes of header data

Then you parse the data. It's useful to create few helper functions
for the parsing, example:

long read_long(char*& p)
{
long value = p[0] | (p[1] << 8) | (p[2] << 16) | p[3] << 24);
p += 4;
return value;
}

Something like that. If you encapsulate the parser into a class, you
don't need to pass the pointer around (you will still be passing
instance of the parser; "this" pointer around but that's another
story).

Also, depending on the endianess of the file you might want to modify
the byte-order in above read_long function (/method).

That's the main gist of it pretty much. I use parser class like that
in all the code I wrote when dealing with binary objects. The
constructor takes pointer to the data and size (for overflow checking
purposes and such) and then just parse away! Main point is to just
dead with the data as region of memory, this way works nicely even
with game consoles like XBOX and PS3, old and new operating systems,
etc. I advocate this as the "best" approach, forget diddling about
with the C++ streams - sure feel free to use them to initialize the
pointer for the parser but it's not really the critical detail.

This is simple, efficient, portable .. there really is no downside. If
you however, read from file, parse, read from file, parse and repeat..
the code isn't any simpler to write but the performance will suck
eventually. If you pick up bad habits, you'll stick with them as they
grow into you - that's bad.

This isn't premature optimization in case someone suggests that. This
is just good practice. Have a nice day.
 
H

Hetware

You have to parse the data in the file into the struct; for that you
have many alternatives. Reading from the file byte-by-byte is rubbish
and not recommended. Allocate the buffer which is size of the header:

char* buffer = new char[216];

Then fill the buffer with block read. fread works just fine. Or, if
you have a framework to use which has memory mapped I/O, even better:
just lock the region of the file (the first 216 bytes?) and use the
acquired pointer to the mapped region.

Either way, you will end up having a pointer:

char* buffer; // points to the 216 bytes of header data

Then you parse the data. It's useful to create few helper functions
for the parsing, example:

long read_long(char*& p)
{
long value = p[0] | (p[1] << 8) | (p[2] << 16) | p[3] << 24);
p += 4;
return value;
}

Something like that. If you encapsulate the parser into a class, you
don't need to pass the pointer around (you will still be passing
instance of the parser; "this" pointer around but that's another
story).

Also, depending on the endianess of the file you might want to modify
the byte-order in above read_long function (/method).

That's the main gist of it pretty much. I use parser class like that
in all the code I wrote when dealing with binary objects. The
constructor takes pointer to the data and size (for overflow checking
purposes and such) and then just parse away! Main point is to just
dead with the data as region of memory, this way works nicely even
with game consoles like XBOX and PS3, old and new operating systems,
etc. I advocate this as the "best" approach, forget diddling about
with the C++ streams - sure feel free to use them to initialize the
pointer for the parser but it's not really the critical detail.

This is simple, efficient, portable .. there really is no downside. If
you however, read from file, parse, read from file, parse and repeat..
the code isn't any simpler to write but the performance will suck
eventually. If you pick up bad habits, you'll stick with them as they
grow into you - that's bad.

This isn't premature optimization in case someone suggests that. This
is just good practice. Have a nice day.

Your code still appears to depend on the specifics of the underlying architecture and compiler configuration. Since I can rely on the binary structure of the files as long as the platform remains the same, it's safe to in.read(reinterpret_cast<char*>(&h), sizeof(h)); I don't particularly like that approach, but it works for what I'm doing. I create struct instances to hold the data as it's read in. I then pass the individual data members as parameters to a corresponding class structure.

It's cumbersome and ugly, but gives me the advantage that the new data classes make no assumptions about the underlying data representation.

E.g., Header is a class corresponding to Sheader h struct.

std::ifstream in(_fileName.c_str(), std::ios::binary);
if(in) {
Sheader h;
in.read(reinterpret_cast<char*>(&h), sizeof(h));
_header=Header(h.attr_start
,h.form_start
,h.fsize
,h.attr_size
,h.lastfieldidx
,h.num_inputs
,std::string(h.oldheader, 200)
);

This code lives in the constructor of a Form class which has a Header member object. This isolates the ugliness to one function. I will need a corresponding binary serialization function to save the Form in the current format.

I will likely create an XML representation of the Form class. Unfortunately this will likely remain mostly an academic exercise. My immediate goal is to understand the semantics of the underlying data.
 
J

Jorgen Grahn

You have to parse the data in the file into the struct; for that you
have many alternatives. Reading from the file byte-by-byte is rubbish
and not recommended. Allocate the buffer which is size of the header:

char* buffer = new char[216];

Then fill the buffer with block read. fread works just fine. Or, if
you have a framework to use which has memory mapped I/O, even better:
just lock the region of the file (the first 216 bytes?) and use the
acquired pointer to the mapped region.

Either way, you will end up having a pointer:

char* buffer; // points to the 216 bytes of header data

Then you parse the data. It's useful to create few helper functions
for the parsing, example:

long read_long(char*& p)
{
long value = p[0] | (p[1] << 8) | (p[2] << 16) | p[3] << 24);
p += 4;
return value;
}

Something like that. If you encapsulate the parser into a class, you
don't need to pass the pointer around (you will still be passing
instance of the parser; "this" pointer around but that's another
story).

Also, depending on the endianess of the file you might want to modify
the byte-order in above read_long function (/method).

That's the main gist of it pretty much. I use parser class like that
in all the code I wrote when dealing with binary objects. The
constructor takes pointer to the data and size (for overflow checking
purposes and such) and then just parse away! Main point is to just
dead with the data as region of memory, this way works nicely even
with game consoles like XBOX and PS3, old and new operating systems,
etc. I advocate this as the "best" approach, forget diddling about
with the C++ streams - sure feel free to use them to initialize the
pointer for the parser but it's not really the critical detail.

This is simple, efficient, portable .. there really is no downside. If
you however, read from file, parse, read from file, parse and repeat..
the code isn't any simpler to write but the performance will suck
eventually. If you pick up bad habits, you'll stick with them as they
grow into you - that's bad.

This isn't premature optimization in case someone suggests that. This
is just good practice. Have a nice day.
Your code still appears to depend on the specifics of the underlying
architecture and compiler configuration. [...]

How, exactly? Replace his long with uint32_t (or unsigned int plus an
assertion that it's at least 32 bits) and his posting would describe
my own code, which I believe is fairly portable.

/Jorgen

PS. Please wrap your lines. Some of them were ~500 characters wide.
 
J

James Kanze

I have some ancient c code that reads files stored in raw binary
format. The first 216 bytes are to be read into an instance of:
struct Sheader
{
long attr_start;
long form_start;
short fsize;
short attr_size;
short lastfieldidx;
short num_inputs;
char oldheader[200];
};
The way the C code does this is by fread ((void *)header, sizeof
(struct Sheader), 1, file); Is there a better way to do this in C++?

Of course, that doesn't even work portably in C.
You have to parse the data in the file into the struct; for that you
have many alternatives. Reading from the file byte-by-byte is rubbish
and not recommended.

Why? std::istream has been designed so that things like
std::istream::get() can easily be inlined (since they don't call
a virtual function if there is data in the buffer), so the
overhead shouldn't be that bad. Other than that, it's really a
style issue as to whether you prefer to use std::istream::get or
play with pointers or iterators. (Having said that, I
personally prefer the pointer solution as well. Old habits,
etc.)
Allocate the buffer which is size of the header:
char* buffer = new char[216];
Then fill the buffer with block read. fread works just fine.

As does std::istream::read. In both cases, don't forget to open
the stream in binary mode.
Or, if
you have a framework to use which has memory mapped I/O, even better:
just lock the region of the file (the first 216 bytes?) and use the
acquired pointer to the mapped region.
Either way, you will end up having a pointer:
char* buffer; // points to the 216 bytes of header data
Then you parse the data. It's useful to create few helper functions
for the parsing, example:
long read_long(char*& p)
{
long value = p[0] | (p[1] << 8) | (p[2] << 16) | p[3] << 24);
p += 4;
return value;
}

Just a nit, but I tend to convert the pointer to unsigned char*
(or unsigned char const* when reading). The use of an operator
(<<) provokes integral promotion, and if char is signed (as it
usually is by default), you're likely to get some bits set that
you didn't expect.

More generally, I've usually seen the convention that plain char
is only used for characters; small integers are signed char, and
raw memory is unsigned char.
Something like that. If you encapsulate the parser into a class, you
don't need to pass the pointer around (you will still be passing
instance of the parser; "this" pointer around but that's another
story).
Also, depending on the endianess of the file you might want to modify
the byte-order in above read_long function (/method).
That's the main gist of it pretty much. I use parser class like that
in all the code I wrote when dealing with binary objects. The
constructor takes pointer to the data and size (for overflow checking
purposes and such) and then just parse away! Main point is to just
dead with the data as region of memory, this way works nicely even
with game consoles like XBOX and PS3, old and new operating systems,
etc. I advocate this as the "best" approach, forget diddling about
with the C++ streams - sure feel free to use them to initialize the
pointer for the parser but it's not really the critical detail.

Another variant I've seen used is to access the streambuf
directly. For raw data, that's often as convenient as going
through istream.
 
G

gwowen

You are using formatted input operators to read raw data (which is one
reason raw data is evil).

I now have a vision of Ian's computer in which his video, music and
photographs are stored as Base64 encoded XML ;)

There's nothing wrong with raw binary data as long as you:
i) specify the widths and endianness.
ii) Don't write/read whole structures where you can't be absolutely
100% sure the padding won't change... and probably not even then.

Next time you're working on an SoC with 32k of non-volatile storage,
you'll come round to binary data.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,262
Messages
2,571,047
Members
48,769
Latest member
Clifft

Latest Threads

Top