Writing a class (with data) to a binary file

S

Sebastian Kemi

How should a write a class to a file?
Would this example work:

object *myobject = 0;
tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object));

/ sebek
 
J

Jakob Bieling

Sebastian Kemi said:
How should a write a class to a file?
Would this example work:

object *myobject = 0;
tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object));


If you plan on reading the data ack into a class again, then no, it is
not guaranteed to work. Let your class have a write and a read method with
you use to write/read the object to/from file, so you are left with:

my_object.write (tfile);
my_object.read (tfile);

or something like that.

hth
 
W

WW

Sebastian said:
How should a write a class to a file?
Would this example work:

object *myobject = 0;
tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object));

Khm. Why do you dereference a NULL pointer?

No, it will not guaranteed to work. First of all the member function called
will possibly try to dereference your NULL pointer. Why possibly? Because
you are using reinterpret_cast to convert the pointer. The behavior of
reinterpret_cast is *always* implementation defined. It means that on
systems where the in-memory bit image of a char * is not the same as object
* (there are such systems) you will pass garbage into the function.

Having said all that please read the other reply as well. Usually it is not
a good idea to write out objects in their binary form. Especially polymorph
ones.
 
S

Sebastian Kemi

The NULL pointer was just an example. In fact it's a little more complexed
class.
 
G

Gianni Mariani

Sebastian said:
How should a write a class to a file?
Would this example work:

object *myobject = 0;
tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object));

You're trying to write binary data. You could theoretically write the
object in a form of text and read in the values. However, if you want
to write binary data these are the issues.

If you wish to read and write the class to the same program then the
only issue you'll have is pointers and references.

For example:

struct A
{
A * next;
int value;
};

If you write this to a file and read it back at a later time, you have
no guarentee that the "next" value is there. So you can't write
pointers out. One (of many) ways around this one is to so somthing like:

struct A
{
typedef int Aindex;
Aindex next;
int value;
};


Aindex is an index in an array of A* objects.

Another issue is hidden pointers. When you create a virtual method, the
compiler may create a "vtable" and this is a pointer.

struct B
{
virtual void Meth();

int value;
};

This struct contains implementation defined magic to manage the
"virtual" method. No guarentees that this will work.

So you say, these won't be issues right ? The next thing you want to do
is read the file on different architectures.

You'll notice that there are big and little endian architectures.

PowerPC, MIPS, HPPA, 680xx, Sparc : big endian

X86, VAX, MIPS, Alpha : little endian

PDP - wierd

Some processors support both endianness (like MIPS).

So writing and reading a binary valure will be an interesting exercise.
You're also now dealing with different packing from different compilers.


+---+---+---+---+
int x = 0x01020304; --> | 4 | 3 | 2 | 1 | little endian
+---+---+---+---+

+---+---+---+---+
int x = 0x01020304; --> | 1 | 3 | 3 | 4 | big endian
+---+---+---+---+

+---+---+---+---+
int x = 0x01020304; --> | 2 | 1 | 4 | 3 | PDP endian (not sure)
+---+---+---+---+

+---+---+---+---+
char z[4] = "ABCD"; --> | A | B | C | D | little and big endian
+---+---+---+---+


The 2 byte and 8 byte integer types also have the same issues.

int (on platform a) != int (on platform b) for all platforms

Since the size of the built-in data types are implementation defined,
you will need to create a bunch of type-defs.

typedef unsigned char f_u_int8;
typedef signed char f_s_int8;
typedef unsigned short f_u_int16;
.... you get the idea.

These typedefs are architecture dependant and in some cases you may have
to make up your own struct.


Alignment issues

Some machines perform much faster if they peform aligned accesses, some
machines do not support unaligned accesses. MIPS for example has two
instructions, one for doing aligned loads and stores and another for
doing unaligned loads and stores (the unaligned load and store are
actually 2 machine instructions each which transfer the top size and
bottom side of the value in separate cycles).

So now what.

Completly non-standard alert - most compilers support the #pragma pack
directive which forces whatever alignment you want and reading and
writing to these will cause the right thing to happen for your
particular architecture.


#pragma pack(1)

struct A
{
f_u_int8 x;
f_u_int32 y;
};

#pragma pack() // default packing

struct B
{
f_u_int8 x;
f_u_int32 y;
};

On architectures that do padding you'll find:
sizeof(A) == 5
sizeof(B) == 8

and alternative to this is to add your own padding so there will be no
alignment.


struct B
{
f_u_int8 x;
f_u_int8 padding[3];
f_u_int32 y;
};

But how about byte order. Well, every time you read and write the int
from these structs you'll need to swap the bytes fo your architecture.
Do a google for "networkorder gianni" and you'll find somthing.


Floats and doubles and long doubles I believe have endian issues as
well, I've never needed to check.

Finally, if you know you have a collection of objects you wish to write
to a file, you may use mapped files or read and write the entire file
rather than an object at a time. If you do this, you can layout the
objects yourself and you can employ "relative" pointers to create more
complex structures.

If you want to read and write binary data then the rules are.

a) No classes with virtual methods.
b) No absolute pointers and no references
c) Packed structures.
d) No virtual inheritance
e) Compatible type-defs

There are alternatives (as another poster suggested) and write a
serialize and deserialize routines, but you'll still face the same issues.

Even though this works on all the implementations of C and C++ compilers
I know, it is still implementation defined. The contingency is to place
this code in a very small section of your code and be prepared to have
to re-implement the entire chunk of code when you run into an
architecture that breaks your assumptions.

However, reading and writing binary files can have HUGE performance
gains. I once came across some numerical code where it would read and
write large datasets. These datasets were 40-100MB. The performance was
horrendus. Using mapped files and binary data made the reading and
writing virtually zero cost and it improved the performance of the
product by nearly 10x times and in some tests over 1000x. Be careful -
this is one application and the bottle neck was clearly identified.
This may not be where your application spends its time.

This post is a bit of a mish-mash but I think it covers all the issues.

Good luck.

Let me know if you have any other questions.
 
K

Kevin Goodsell

Sebastian said:
How should a write a class to a file?
Would this example work:

object *myobject = 0;
tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object));

As you might have gathered from the replies so far, there is no simple
answer. The most general answer is something like this:

1) Define a file format (or part of a format) to represent your object type.

2) Write the class's state out using the specified format.

Now, one problem with the method you demonstrated above is that it will
use a particular format, but that format is defined by the C++
implementation, not you. A different implementation, or an updated
version of the same implementation, or even the same implementation with
different options, may make the format different. This obviously poses a
few problems. Files may not read in correctly, examining the file is
made more complicated, comparing files from different version of the
program may be difficult, etc. Fundamentally, it's better if *you*
decide the format, and write code to handle the I/O.

Another problem is whether your object's state is properly recorded
using this method. There are two issues here: 1) Did you get the
*entire* state of the object? 2) Did you get *only* the state of the
object? For question 1, the answer may be 'no' if your class contains
things like pointers, references, objects that manage resources (like
vectors or file streams) or objects that themselves contain any of these
things. For question 2, the answer may be 'no' if your class contains
virtual functions (in which case you might output a bunch of junk you
don't really want in the file - usually the 'vtable', but this is an
implementation detail). Also, your object probably has some padding
bytes between elements that you don't really need in the file.

I'm afraid I haven't really given you an answer, but at least those are
things to think about. Generally, you should have Write and Read
functions in your class to encapsulate the processes of outputting and
inputing objects of that class. How exactly those functions work is up
to you.

-Kevin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top