Saving and reloading a container to/from disk

J

jacob navia

Hi

What would be the best way to save and reload later a container
to/from disk in C++?

Thanks
 
J

jacob navia

Ian Collins a écrit :
It depends, there isn't really a best way. Do you want portability?

http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/index.html

Is a good place to start.

Thanks.

I downloaded the boost libraries, and compiled them in my machine.
Unziped the source code is 269MB. Compilation took 14 minutes.

Machine Mac-pro OS X with 8 CPUs and 12GB RAM.

Then, I compiled the example of the serialization.
The class being saved/restored looks like this:

original schedule
6:24 bob
0x0x100200440 34?135'52.56" 134?22'78.3" 24th Street and 10th Avenue
0x0x1002004d0 35?137'23.456" 133?35'54.12" State street and Cathedral
Vista Lane
0x0x100200530 35?136'15.456" 133?32'15.3" White House

when restored, the restored stuff looks like this:
6:24
0x0x100200e30 34?135'52.56" 134?22'78.3" 24th Street and 10th Avenue
0x0x100200f40 35?137'23.456" 133?35'54.12" State street and Cathedral
Vista Lane
0x0x1002012c0 35?136'15.456" 133?32'15.3" White House


As you can see the name "bob" is missing.

The same bug appears with all other saved/restored class instances.
I am not fluent in C++ to figure out this, sorry.

But maybe this is a bug in the example, I can't determine what
is the reason.

jacob
 
A

Andrew Poelstra

Hi

What would be the best way to save and reload later a container
to/from disk in C++?

Thanks

I'm not sure how best to deal with binary data, but for
numbers and text, I would use JSON (escaping special
characters as appropriate, etc).

It's well-understood, lightweight and simple, and
portable across many languages.

With binary data you could base64-encode it or something,
but you'll be looking at significant bloat for large
structures. Or you could NUL-separate fields, replacing
actual NUL characters with \001s, and actual \001s with
\001\001s.
 
J

jacob navia

Andrew Poelstra a écrit :
I'm not sure how best to deal with binary data, but for
numbers and text, I would use JSON (escaping special
characters as appropriate, etc).

It's well-understood, lightweight and simple, and
portable across many languages.

With binary data you could base64-encode it or something,
but you'll be looking at significant bloat for large
structures. Or you could NUL-separate fields, replacing
actual NUL characters with \001s, and actual \001s with
\001\001s.

Well, but that is a significant development. I thought that the
STL would provide something to save/restore a container.
 
A

Andrew Poelstra

Andrew Poelstra a écrit :

Well, but that is a significant development. I thought that the
STL would provide something to save/restore a container.

I haven't looked that deeply into it, but the only serialization
method I have heard of is to override the << >> operators so you
can work with istreams. From there you have to iterate over your
container, structuring the data as you see fit.

I can't of any way in general that STL containers could save
themselves without confusing the data with their own control
characters.
 
K

Kai-Uwe Bux

jacob said:
Andrew Poelstra a écrit :

Well, but that is a significant development. I thought that the
STL would provide something to save/restore a container.

That would be a little tricky because everything is templated. Now, suppose
you want to save/restore a vector<T>. The most natural thing would be to use
operator<< and operator>> for serializing and deserializing vector elements.
That, however, would suppose that for elements of type T both operations are
truly inverse. This does not even hold for the standard types (e.g., double
or std::string).

Then, for set<T,C>, it is not clear what to do about the comparison
predicate. Also: would you save/restore allocator objects or would you
decide to ignore the issue?

For simple cases, there is std::copy() and the use of stream iterators.
Also, containers can be initialized from a pair of iterators. Everything
more complex needs a custom solution.


Best

Kai-Uwe Bux
 
B

Brian

That would be a little tricky because everything is templated. Now, suppose
you want to save/restore a vector<T>. The most natural thing would be to use
operator<< and operator>> for serializing and deserializing vector elements.
That, however, would suppose that for elements of type T both operations are
truly inverse. This does not even hold for the standard types (e.g., double
or std::string).

Then, for set<T,C>, it is not clear what to do about the comparison
predicate. Also: would you save/restore allocator objects or would you
decide to ignore the issue?

I don't know any serialization library that does
anything with comparison predicates or the allocators.
It's certainly possible that users want to use
different comparison predicates in different contexts,
and that attempting to make them use the same one
would cause them problems.


Brian Wood
http://webEbenezer.net
(651) 251-9384
 
J

jacob navia

Kai-Uwe Bux a écrit :
That would be a little tricky because everything is templated. Now, suppose
you want to save/restore a vector<T>. The most natural thing would be to use
operator<< and operator>> for serializing and deserializing vector elements.
That, however, would suppose that for elements of type T both operations are
truly inverse. This does not even hold for the standard types (e.g., double
or std::string).

Excuse me but I do not understand that. If I write a double value into a
file, I will obtain the same value when I read it later if I store it in
binary form.

fwrite(&Double,1,sizeof(double),stream);
fread(&Double,1,sizeof(double),stream);

will leave the value of Double unchanged. Obviously if you use the same
CPU type for bothoperations.

I just do not see how that could be wrong. Maybe you care to explain?

Thanks
Then, for set<T,C>, it is not clear what to do about the comparison
predicate.

The container could be read in an incomplete way so that most values are
retrieved but function pointers aren't.
Also: would you save/restore allocator objects or would you
decide to ignore the issue?

See above.
 
I

Ian Collins

Kai-Uwe Bux a écrit :

Excuse me but I do not understand that. If I write a double value into a
file, I will obtain the same value when I read it later if I store it in
binary form.

fwrite(&Double,1,sizeof(double),stream);
fread(&Double,1,sizeof(double),stream);

will leave the value of Double unchanged. Obviously if you use the same
CPU type for bothoperations.

There's the issue, there isn't a universal interchange format for
doubles. For a serialisation scheme to be worthwhile, it would have to
be platform independent.
The container could be read in an incomplete way so that most values are
retrieved but function pointers aren't.

Then you loose the container's meta-data. That information has to be
stored somewhere, either with the data, or in the code that retrieves
it. It's a fair call not to serialise it, but it is a limitation.

I use JSON to serialise data between applications and languages
(particularly in web applications), but is only preserves the data part
(at least for C and C++, dynamic languages can recover the object
structure as well). Most of the data I find I wish to transfer or
archive tends to be in simple containers like std::vector where the
meta-data tends to be less important. For more complex data structures,
I use XML. But I still have to provide my own in and out operators for
each non-POD type.

What C++ really need for this is reflection!
 
R

robertwessel2

Kai-Uwe Bux a crit :


Excuse me but I do not understand that. If I write a double value into a
file, I will obtain the same value when I read it later if I store it in
binary form.

        fwrite(&Double,1,sizeof(double),stream);
        fread(&Double,1,sizeof(double),stream);

will leave the value of Double unchanged. Obviously if you use the same
CPU type for bothoperations.

I just do not see how that could be wrong. Maybe you care to explain?


Well, the C++ compiler for zOS can be run (with the equivalent of a
command line switch) to use either hex or binary (IEEE) float. Most
definitely not the same format, even if doubles are 64 bits long in
both cases. The saga of long doubles on Windows (and other x86
platforms) is another example, where some compilers treat long double
as a 64 bit IEEE double, and other use the 80 bit double extended
format.

Anyway, I'd say dumping a container to disk is significantly less
useful if it’s not portable. So binary formats present the usual
issues. At least a text-ish format should be an option, and then you
have issues trying to produce 100% faithful portable representations
of things like floats. For example, some IEEE implementations store
information about the type of NaN (beyond the QNaN/SNaN distinction)
in the mantissa.
 
K

Kai-Uwe Bux

jacob said:
Kai-Uwe Bux a écrit :

Excuse me but I do not understand that. If I write a double value into a
file, I will obtain the same value when I read it later if I store it in
binary form.

fwrite(&Double,1,sizeof(double),stream);
fread(&Double,1,sizeof(double),stream);

will leave the value of Double unchanged. Obviously if you use the same
CPU type for bothoperations.

I just do not see how that could be wrong. Maybe you care to explain?
[...]

a) I did not claim that your proposed code could be wrong. I made a claim
about operator<< and operator>> not being inverse for the type double. That
is a consequence of operator<< doing some rounding. For std::string it is a
consequence of the way white space is treated by operator<< and operator>>.

b) Your code will work for double (with the restrictions you mentioned about
being tied to the a given CPU, or more accurately to a particular binary
format for doubles). But that code will not work for many other types, e.g,
std::string. Thus, it does not address the main point I made, namely, that
the container of the standard library are templated.

c) Another case where the templating causes a problem is something like
this: Suppose you have a hierarchy of classes such as

class Student {};
class Freshman : public Student {};
class Sophomore : public Student {};
class Junior : public Student {};
class Senior : public Student {};

and

typedef vector< Student* > Class;

If you want to serialize objects of type Class, you have to decide what to
do about the pointers. Very likely, you want to dump some unique student id,
probably retrieved by some member function. That is very particular to the
specific problem and a generic solution is unlikely to match your needs.

d) Summary: in any particular case, there is a valid solution; but there
seems to be no _generic_ solution in sight. The standard library does not
even attempt to give a generic solution.

The container could be read in an incomplete way so that most values are
retrieved but function pointers aren't.


See above.

Yes, ignoring the issue is a _valid_ design decision in this case. I did
that too when I experimented with serialization.


Best

Kai-Uwe Bux
 
J

jacob navia

(e-mail address removed) a écrit :
Well, the C++ compiler for zOS can be run (with the equivalent of a
command line switch) to use either hex or binary (IEEE) float. Most
definitely not the same format, even if doubles are 64 bits long in
both cases. The saga of long doubles on Windows (and other x86
platforms) is another example, where some compilers treat long double
as a 64 bit IEEE double, and other use the 80 bit double extended
format.

(1) Obviously if you are working with zOS you rather use the same
command line switch for reading and writing your double data.
Why is that so difficult to understand?
(2) Obviously too, double data is not portable among compilers that
use different representations of the data. If one compiler
implements long double as 64 bits and the other as 80 bits
they aren't compatible and you should recompile both the reader
and the writer.
Why is that so difficult to understand?

Here we arrive at philosophical questions. C++ is all about bells
and whistles. Rather than providing a solution that would be very useful
for most users but would fail in some special cases it is decided that
nothing should be provided so that everyone rolls its own.

Modulo bugs the Boost solution seems to be working. Other solutions were
presented in this discussion (http://webEbenezer.net by Brian)
 
I

Ian Collins

(e-mail address removed) a écrit :

(1) Obviously if you are working with zOS you rather use the same
command line switch for reading and writing your double data.
Why is that so difficult to understand?

How do you enforce that?
(2) Obviously too, double data is not portable among compilers that
use different representations of the data. If one compiler
implements long double as 64 bits and the other as 80 bits
they aren't compatible and you should recompile both the reader
and the writer.

What if you don't have the source for one or both of them?
Why is that so difficult to understand?

It isn't difficult to understand, it's impractical. There isn't a
standard format for double.
Here we arrive at philosophical questions. C++ is all about bells
and whistles. Rather than providing a solution that would be very useful
for most users but would fail in some special cases it is decided that
nothing should be provided so that everyone rolls its own.

Have you read the responses here? The problem of serialisation is
complex and multi-layered. Sure a naive solution would work for a
subset of types, but that subset is small. As soon as you add any
complexity to your objects (even something as trivial as float or
pointers), the solution breaks down. So it isn't "very useful for most
users".
 
R

robertwessel2

(e-mail address removed) a écrit :




(1) Obviously if you are working with zOS you rather use the same
     command line switch for reading and writing your double data.
     Why is that so difficult to understand?
(2) Obviously too, double data is not portable among compilers that
     use different representations of the data. If one compiler
     implements long double as 64 bits and the other as 80 bits
     they aren't compatible and you should recompile both the reader
     and the writer.
     Why is that so difficult to understand?


It's not difficult to understand at all. The problem is that you
stated that it should not be a problem "if you use the same CPU type
for bothoperations. " Which is clearly incorrect. It's not even
consistent for a single compiler on a single OS on that one CPU type.

Here we arrive at philosophical questions. C++ is all about bells
and whistles. Rather than providing a solution that would be very useful
for most users but would fail in some special cases it is decided that
nothing should be provided so that everyone rolls its own.

Modulo bugs the Boost solution seems to be working. Other solutions were
presented in this discussion (http://webEbenezer.netby Brian)


The rest of my post (and other) point out that there are portablility
issues with binary formats (and those do actually matter to some of
us, even if *you* don't care), and that any common format will likely
have some issues with some of the odder corners of type
representations.
And FWIW, Boost:serialization, does use a text format.

Note that Java and .NET have relatively strong support for
serialization, but then they also include complete specifications of
the datatypes.

That being said, I would not object at all to adding something like
Boost:serialization to the STL...
 
J

jacob navia

Ian Collins a écrit :
How do you enforce that?

Very easy.

Each time you do that you crash or obtain wrong results.

:)

Why should the language protect the programmer from himself
from any possible error?

It is well known that if you compile a shared object (dll)
with structure alignment turned off, and you use it with a
main executable with structure alignment turned on at 16 bytes, passing
double data between the shared object and the main program will not
work.

How do you enforce that structure alignment is the same?

What if you don't have the source for one or both of them?


It isn't difficult to understand, it's impractical. There isn't a
standard format for double.

What?

And the IEEE-754 format?

Have you read the responses here? The problem of serialisation is
complex and multi-layered. Sure a naive solution would work for a
subset of types, but that subset is small. As soon as you add any
complexity to your objects (even something as trivial as float or
pointers), the solution breaks down. So it isn't "very useful for most
users".

OK. Let's agree that we disagree here.
 
J

jacob navia

(e-mail address removed) a écrit :
It's not difficult to understand at all. The problem is that you
stated that it should not be a problem "if you use the same CPU type
for bothoperations. " Which is clearly incorrect. It's not even
consistent for a single compiler on a single OS on that one CPU type.

With this logic, assignment of a "double" field in a structure should be
forbidden:

file.h
struct foo { char a; double b; };

file1.c
foo a;
extern foo b;

// ...

a.b = b.b; // this will not work

file2.c
foo b;

I compile file1.c with the compilation flag "No structure alignment".
I compile file2.c with the structure alignment to 16 bytes.

Consequence: Since that can't be enforced, assignment to a structure
field of type double from other structure should be forbidden.


You just can't protect the programmer from all possible mistakes.
 
G

gwowen

Hi

What would be the best way to save and reload later a container
to/from disk in C++?

Thanks

Here's a solution, what constitutes "best" depends on what you
consider important:

If the container contains Plain-Old-Data types or pointers to PODs --
first fread()/fwrite() the size() [if its variable]. After that just
iterate over the elements, and fread() / fwrite() the data,
dereferencing as appropriate as you go. If you've got pointers to
polymorphic types, make sure their base type has a [virtual]
serialize(), and each derived types implementation includes enough
extra header information the first element to determine its type

// Could easily be a static member function...
Base* unserialize()
{
FILE* file_descriptor = fopen("filename","rb");
// read header, determine DerivedType
switch(DerivedType){
case DerivedType1:
return DerivedType1::unserialize(file_descriptor);
case DerivedType2:
return DerivedType2::unserialize(file_descriptor);
/// etc
default:
throw(std::runtime_error("Unrecognised derived type header in
Base* unserialize()"));
}
}

Dumping the vtable / function pointers, even if you can find them, is
a recipe for disaster.
 
J

Jorgen Grahn

I'm not sure how best to deal with binary data, but for
numbers and text, I would use JSON (escaping special
characters as appropriate, etc).

It's well-understood, lightweight and simple, and
portable across many languages.

Don't know anything about JSON, but:
With binary data you could base64-encode it or something,
but you'll be looking at significant bloat for large
structures. Or you could NUL-separate fields, replacing
actual NUL characters with \001s, and actual \001s with
\001\001s.

What would this buy him? He's saving the data to file, not to paper
or a text-only medium like Usenet. Apart from being able to print it,
base64 has exactly the same weaknesses as whatever binary representation
lies under the surface.

To the original poster, I have no general answer. I'd recommend some
format suited to his application, not to its current implementation.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top