Sebastian said:
How should a write a class to a file?
Would this example work:
object *myobject = 0;
tfile.write(reinterpret_cast<char *>(myobject), sizeof(*object));
You're trying to write binary data. You could theoretically write the
object in a form of text and read in the values. However, if you want
to write binary data these are the issues.
If you wish to read and write the class to the same program then the
only issue you'll have is pointers and references.
For example:
struct A
{
A * next;
int value;
};
If you write this to a file and read it back at a later time, you have
no guarentee that the "next" value is there. So you can't write
pointers out. One (of many) ways around this one is to so somthing like:
struct A
{
typedef int Aindex;
Aindex next;
int value;
};
Aindex is an index in an array of A* objects.
Another issue is hidden pointers. When you create a virtual method, the
compiler may create a "vtable" and this is a pointer.
struct B
{
virtual void Meth();
int value;
};
This struct contains implementation defined magic to manage the
"virtual" method. No guarentees that this will work.
So you say, these won't be issues right ? The next thing you want to do
is read the file on different architectures.
You'll notice that there are big and little endian architectures.
PowerPC, MIPS, HPPA, 680xx, Sparc : big endian
X86, VAX, MIPS, Alpha : little endian
PDP - wierd
Some processors support both endianness (like MIPS).
So writing and reading a binary valure will be an interesting exercise.
You're also now dealing with different packing from different compilers.
+---+---+---+---+
int x = 0x01020304; --> | 4 | 3 | 2 | 1 | little endian
+---+---+---+---+
+---+---+---+---+
int x = 0x01020304; --> | 1 | 3 | 3 | 4 | big endian
+---+---+---+---+
+---+---+---+---+
int x = 0x01020304; --> | 2 | 1 | 4 | 3 | PDP endian (not sure)
+---+---+---+---+
+---+---+---+---+
char z[4] = "ABCD"; --> | A | B | C | D | little and big endian
+---+---+---+---+
The 2 byte and 8 byte integer types also have the same issues.
int (on platform a) != int (on platform b) for all platforms
Since the size of the built-in data types are implementation defined,
you will need to create a bunch of type-defs.
typedef unsigned char f_u_int8;
typedef signed char f_s_int8;
typedef unsigned short f_u_int16;
.... you get the idea.
These typedefs are architecture dependant and in some cases you may have
to make up your own struct.
Alignment issues
Some machines perform much faster if they peform aligned accesses, some
machines do not support unaligned accesses. MIPS for example has two
instructions, one for doing aligned loads and stores and another for
doing unaligned loads and stores (the unaligned load and store are
actually 2 machine instructions each which transfer the top size and
bottom side of the value in separate cycles).
So now what.
Completly non-standard alert - most compilers support the #pragma pack
directive which forces whatever alignment you want and reading and
writing to these will cause the right thing to happen for your
particular architecture.
#pragma pack(1)
struct A
{
f_u_int8 x;
f_u_int32 y;
};
#pragma pack() // default packing
struct B
{
f_u_int8 x;
f_u_int32 y;
};
On architectures that do padding you'll find:
sizeof(A) == 5
sizeof(B) == 8
and alternative to this is to add your own padding so there will be no
alignment.
struct B
{
f_u_int8 x;
f_u_int8 padding[3];
f_u_int32 y;
};
But how about byte order. Well, every time you read and write the int
from these structs you'll need to swap the bytes fo your architecture.
Do a google for "networkorder gianni" and you'll find somthing.
Floats and doubles and long doubles I believe have endian issues as
well, I've never needed to check.
Finally, if you know you have a collection of objects you wish to write
to a file, you may use mapped files or read and write the entire file
rather than an object at a time. If you do this, you can layout the
objects yourself and you can employ "relative" pointers to create more
complex structures.
If you want to read and write binary data then the rules are.
a) No classes with virtual methods.
b) No absolute pointers and no references
c) Packed structures.
d) No virtual inheritance
e) Compatible type-defs
There are alternatives (as another poster suggested) and write a
serialize and deserialize routines, but you'll still face the same issues.
Even though this works on all the implementations of C and C++ compilers
I know, it is still implementation defined. The contingency is to place
this code in a very small section of your code and be prepared to have
to re-implement the entire chunk of code when you run into an
architecture that breaks your assumptions.
However, reading and writing binary files can have HUGE performance
gains. I once came across some numerical code where it would read and
write large datasets. These datasets were 40-100MB. The performance was
horrendus. Using mapped files and binary data made the reading and
writing virtually zero cost and it improved the performance of the
product by nearly 10x times and in some tests over 1000x. Be careful -
this is one application and the bottle neck was clearly identified.
This may not be where your application spends its time.
This post is a bit of a mish-mash but I think it covers all the issues.
Good luck.
Let me know if you have any other questions.