serialization

aegis · Nov 5, 2009

One serialization technique for writing instances of a class to a file
that I have seen is casting the "this" pointer to a "pointer to char".
(I don't know why someone would do this.). I'm fairly certain
this is not guaranteed to work. What are the relevant sections
in the std if this is the case?

Brian · Nov 5, 2009

One serialization technique for writing instances of a class to a file
that I have seen is casting the "this" pointer to a "pointer to char".
(I don't know why someone would do this.). I'm fairly certain
this is not guaranteed to work. What are the relevant sections
in the std if this is the case?

I don't know the relevant sections, but the serialization
approaches I'm aware of don't do anything like that.
Maybe some game programs have that, but I don't think it
is used much these days.

Brian Wood
www.webEbenezer.net

Eric Pruneau · Nov 6, 2009

aegis said:
One serialization technique for writing instances of a class to a file
that I have seen is casting the "this" pointer to a "pointer to char".
(I don't know why someone would do this.). I'm fairly certain
this is not guaranteed to work. What are the relevant sections
in the std if this is the case?

Let say you have a class like

class A
{
void DoSomethingA() {}
int aValue;
};

and consider also a class having one (or more) virtual function

class B
{
virtual void DoSomethingB() {}
int bValue;
};

Now assume the size (in byte) of an int is 4 as well as the size of a
pointer.

Then sizeof(A) will most likely be 4 but the size of B will most likely be
8.
So if you cast an instance of A to a char* and write 4 bytes to a file
starting at your pointer address then I guess you will store the value of
aValue to your file, which is probably what you want.

But since B has a virtual function, the class will also store a pointer to a
table (lets call it vftable). This table contain the address of B's virtual
function(s). In this case, the vftable will have one entry, the pointer to
the virtual function DoSomethingB. If you have 2 virtual functions in your
class, the table will have 2 entry and so on ( but the sizeof(B) will stay 8
since it hold the pointer to the table, not the table itself). So it is
pretty much useless to serialize this pointer since the day will need to
read your file, the address will probably not be pointing to the vftable you
want.

Note that this vftable thing is probably the most popular way to handle
virtual functions but it is not a standard. Meaning that a compiler writer
is free to use another way.

Eric Pruneau

aegis · Nov 7, 2009

Let say you have a class like

class A
{
void DoSomethingA() {}
int aValue;

};

and consider also a class having one (or more) virtual function

class B
{
virtual void DoSomethingB() {}
int bValue;

};

Now assume the size (in byte) of an int is 4 as well as the size of a
pointer.

Then sizeof(A) will most likely be 4 but the size of B will most likely be
8.
So if you cast an instance of A to a char* and write 4 bytes to a file
starting at your pointer address then I guess you will store the value of
aValue to your file, which is probably what you want.

But since B has a virtual function, the class will also store a pointer to a
table (lets call it vftable). This table contain the address of B's virtual
function(s). In this case, the vftable will have one entry, the pointer to
the virtual function DoSomethingB. If you have 2 virtual functions in your
class, the table will have 2 entry and so on ( but the sizeof(B) will stay 8
since it hold the pointer to the table, not the table itself). So it is
pretty much useless to serialize this pointer since the day will need to
read your file, the address will probably not be pointing to the vftable you
want.

Note that this vftable thing is probably the most popular way to handle
virtual functions but it is not a standard. Meaning that a compiler writer
is free to use another way.

Eric Pruneau

Good counter-point. I have recently found this recommended way in
another book. Although, the author uses a pointer to unsigned char.
so effectively he has some object and calls the write instance method:
bindata.((unsigned char *)&d, sizeof d);

Where d is of type "DataPoint". Now does C++ give the guarantee for
pointer to unsigned char? I believe we get this guarantee for
C. However, I know (char *) is definitely wrong.

So would we run into any undefined behavior in the case of
using a pointer to unsigned char? Or do we end up with
the issues you posed regardless of pointer type? i.e.,
the standard gives no guarantee for any pointer type.
Thus, the ideal way/portable way, would be to write out
the value of each member of the given class.

Thoughts?

Brian Wood · Nov 8, 2009

Thus, the ideal way/portable way, would be to write out
the value of each member of the given class.

I think so. Each member is handled separately and ideally
the process should be automated so users don't have to
maintain serialization functions by hand. Some existing
serialization libraries get the first part right, but
they don't automate the generation of the serialization
functions. To my knowledge only the C++ Middleware Writer,
http://www.webEbenezer.net/cgi-bin/samb.cgi ,
automates that step of the process.

Besides the fact that C++0x is behind schedule, there's
the fact that if it does eventually get finalized, it
won't have reflection support. That's a serious problem
in my opinion. I'm not a fan of Java or C#, but I think
their reflection support serves those languages well.

Brian Wood
http://webEbenezer.net

Joshua Maurice · Nov 8, 2009

I think so. Each member is handled separately and ideally
the process should be automated so users don't have to
maintain serialization functions by hand. Some existing
serialization libraries get the first part right, but
they don't automate the generation of the serialization
functions. To my knowledge only the C++ Middleware Writer,http://www.webEbenezer.net/cgi-bin/samb.cgi,
automates that step of the process.

Besides the fact that C++0x is behind schedule, there's
the fact that if it does eventually get finalized, it
won't have reflection support. That's a serious problem
in my opinion. I'm not a fan of Java or C#, but I think
their reflection support serves those languages well.

At the very least, IMAO, C++ should have reflection, but only at
compile time. Possibly / preferably through some template-like
facilities. Being able to iterate over members of a class at compile
time in a generic way would impose no additional costs, contrary to
the oft reason cited reason of "pay only for what you use".

My own company is forced to write its own fragile, intrusive
serialization framework because of this lack of C++ compile time
reflection.

Maxim Yegorushkin · Nov 8, 2009

At the very least, IMAO, C++ should have reflection, but only at
compile time. Possibly / preferably through some template-like
facilities. Being able to iterate over members of a class at compile
time in a generic way would impose no additional costs, contrary to
the oft reason cited reason of "pay only for what you use".

Absolutely true.

My own company is forced to write its own fragile, intrusive
serialization framework because of this lack of C++ compile time
reflection.

In the company where I work we use a perl script to generate reflection
from annotated C++ header files. You may be pleased to know that the
main feature of the generated files is reflect() function template to
iterate over the base sub-objects and members of an object.

Here is what an annotated header looks like:

struct /* @reflect_class */ A
{
int /* @reflect_member */ abc;
double /* @reflect_member */ def;
};

struct /* @reflect_class */ B
: /* @reflect_base */ A
{
A /* @reflect_member */ aaa;
};

For every reflectable class it generates the following:

And here is what generated reflection looks like:

#ifdef REFLECT_NAMESPACE_BEGIN
REFLECT_NAMESPACE_BEGIN
#endif

// start of generated code for class A

meta::Yes isReflectable(A const&);

The above function declaration (no implementation) allows for
IsReflectable<T> trait class which is used to tell reflectable classes
from non-reflectable at compile time (similar to boost type traits).

template<class T> struct BaseIndexOf;
template<> struct BaseIndexOf<A>
{
enum Type {
ENUM_NIL = -1
, ENUM_END
, ENUM_BEGIN = 0
};
};

template<class T> struct MemberIndexOf;
template<> struct MemberIndexOf<A>
{
enum Type {
ENUM_NIL = -1
, abc
, def
, ENUM_END
, ENUM_BEGIN = 0
};
};

Index of reflectable base classes and members accessible as
MemberIndexOf<A>::<member_name>. The enumeration is organized in such a
way that makes it easy to iterate over all members or base classes using
range [ENUM_BEGIN, ENUM_END).

template<class Functor>
void reflect(A& object, Functor& f)
{
f.onObjectBegin(object);
f.onMember(object, object.abc, MemberIndexOf<A>::abc);
f.onMember(object, object.def, MemberIndexOf<A>::def);
f.onObjectEnd(object);
}

template<class Functor>
bool reflect(A& object, Functor& f, MemberIndexOf<A>::Type member_index)
{
switch(member_index) {
case 0: f.onMember(object, object.abc, MemberIndexOf<A>::abc);
return true;
case 1: f.onMember(object, object.def, MemberIndexOf<A>::def);
return true;
default: return false;
}
}

These are the fundamental reflect function templates which iterate over
all or particular members. This function templates accept a functor that
gets invoked for members (onMember() call), base sub-objects
(onBaseSubobject() call, see reflect for B below), and object begin/end
so that the functor can handle object nesting.

The functor passed in reflect() does the actual job of
serializing/deserializing object. The simple beauty of this approach is
that there is only one functor class for every particular serialization
format. This functor handles any reflectable classes using the rest of
generated C++ code.

inline Sref toId(Type<A>)
{
return Sref("A", 1);
}

inline Sref toId(BaseIndexOf<A>::Type base_index)
{
switch(base_index) {
default: return Sref();
}
}

inline Sref toId(MemberIndexOf<A>::Type member_index)
{
switch(member_index) {
case MemberIndexOf<A>::abc: return Sref("abc", 3);
case MemberIndexOf<A>::def: return Sref("def", 3);
default: return Sref();
}
}

This are the functions to get base class and member identifiers using
the generated indexes. A functor uses the base/member index (passed by
reflect() in its onBaseSubobject/onMember callback) to get any meta
information associated with that particular base/member. It relies on
the fact that indexes (enums) are stongly typed so that function
overloading picks up the correct function overload for a particular
index type).

On practice, we use more annotation to associate more meta information
with members. And we generate annotation for enums.

// end of generated code for class A

// start of generated code for class B

meta::Yes isReflectable(B const&);

template<class T> struct BaseIndexOf;
template<> struct BaseIndexOf
{
enum Type {
ENUM_NIL = -1
, A
, ENUM_END
, ENUM_BEGIN = 0
};
};

Here class B has a reflectable base class A.

template<class T> struct MemberIndexOf;
template<> struct MemberIndexOf
{
enum Type {
ENUM_NIL = -1
, aaa
, ENUM_END
, ENUM_BEGIN = 0
};
};

And a reflectable member aaa.

template<class Functor>
void reflect(B& object, Functor& f)
{
f.onObjectBegin(object);
f.onBaseSubobject(static_cast<A&>(object), BaseIndexOf::A);
f.onMember(object, object.aaa, MemberIndexOf::aaa);
f.onObjectEnd(object);
}

template<class Functor>
bool reflect(B& object, Functor& f, MemberIndexOf::Type member_index)
{
switch(member_index) {
case 0: f.onMember(object, object.aaa, MemberIndexOf::aaa);
return true;
default: return false;
}
}

inline Sref toId(Type)
{
return Sref("B", 1);
}

inline Sref toId(BaseIndexOf::Type base_index)
{
switch(base_index) {
case BaseIndexOf::A: return Sref("A", 1);
default: return Sref();
}
}

inline Sref toId(MemberIndexOf::Type member_index)
{
switch(member_index) {
case MemberIndexOf::aaa: return Sref("aaa", 3);
default: return Sref();
}
}

// end of generated code for class B

#ifdef REFLECT_NAMESPACE_END
REFLECT_NAMESPACE_END
#endif

The perl script to parse annotation and generate the reflect code is
under 700 lines.

Brian Wood · Nov 9, 2009

So where were you guys during the past ten years, while the rest of us
were working on the new standard? It's a little late to complain that
your favorite feature isn't there.

I met Bjarne in 2003 and asked him if he would be willing to give
me a link to my site. He declined. So that didn't help as far
as me being motivated to pitch in. It was nice of him to meet
with me and I enjoyed meeting him, but I like eating also.
Anyway, I feel like I'm contributing something even if it isn't
in the form that some would like. Besides two or three compilers
that you can access on line, there's not been a lot of companies
putting out on line C++ related software development tools.
It has seemed to me that most companies have been trying to milk
their business model for too long.

Brian Wood
http://www.webEbenezer.net

Brian Wood · Nov 9, 2009

At the very least, IMAO, C++ should have reflection, but only at
compile time. Possibly / preferably through some template-like
facilities. Being able to iterate over members of a class at compile
time in a generic way would impose no additional costs, contrary to
the oft reason cited reason of "pay only for what you use".

Click to expand...

Absolutely true.

My own company is forced to write its own fragile, intrusive
serialization framework because of this lack of C++ compile time
reflection.

Click to expand...

In the company where I work we use a perl script to generate reflection
from annotated C++ header files. You may be pleased to know that the
main feature of the generated files is reflect() function template to
iterate over the base sub-objects and members of an object.

Here is what an annotated header looks like:

struct /* @reflect_class */ A
{
int /* @reflect_member */ abc;
double /* @reflect_member */ def;
};

struct /* @reflect_class */ B
: /* @reflect_base */ A
{
A /* @reflect_member */ aaa;
};

For every reflectable class it generates the following:

And here is what generated reflection looks like:

#ifdef REFLECT_NAMESPACE_BEGIN
REFLECT_NAMESPACE_BEGIN
#endif

// start of generated code for class A

meta::Yes isReflectable(A const&);

The above function declaration (no implementation) allows for
IsReflectable<T> trait class which is used to tell reflectable classes
from non-reflectable at compile time (similar to boost type traits).

template<class T> struct BaseIndexOf;
template<> struct BaseIndexOf<A>
{
enum Type {
ENUM_NIL = -1
, ENUM_END
, ENUM_BEGIN = 0
};
};

template<class T> struct MemberIndexOf;
template<> struct MemberIndexOf<A>
{
enum Type {
ENUM_NIL = -1
, abc
, def
, ENUM_END
, ENUM_BEGIN = 0
};
};

Index of reflectable base classes and members accessible as
MemberIndexOf<A>::<member_name>. The enumeration is organized in such a
way that makes it easy to iterate over all members or base classes using
range [ENUM_BEGIN, ENUM_END).

template<class Functor>
void reflect(A& object, Functor& f)
{
f.onObjectBegin(object);
f.onMember(object, object.abc, MemberIndexOf<A>::abc);
f.onMember(object, object.def, MemberIndexOf<A>::def);
f.onObjectEnd(object);
}

template<class Functor>
bool reflect(A& object, Functor& f, MemberIndexOf<A>::Type member_index)
{
switch(member_index) {
case 0: f.onMember(object, object.abc, MemberIndexOf<A>::abc);
return true;
case 1: f.onMember(object, object.def, MemberIndexOf<A>::def);
return true;
default: return false;
}
}

These are the fundamental reflect function templates which iterate over
all or particular members. This function templates accept a functor that
gets invoked for members (onMember() call), base sub-objects
(onBaseSubobject() call, see reflect for B below), and object begin/end
so that the functor can handle object nesting.

The functor passed in reflect() does the actual job of
serializing/deserializing object. The simple beauty of this approach is
that there is only one functor class for every particular serialization
format. This functor handles any reflectable classes using the rest of
generated C++ code.

inline Sref toId(Type<A>)
{
return Sref("A", 1);
}

inline Sref toId(BaseIndexOf<A>::Type base_index)
{
switch(base_index) {
default: return Sref();
}
}

inline Sref toId(MemberIndexOf<A>::Type member_index)
{
switch(member_index) {
case MemberIndexOf<A>::abc: return Sref("abc", 3);
case MemberIndexOf<A>::def: return Sref("def", 3);
default: return Sref();
}
}

This are the functions to get base class and member identifiers using
the generated indexes. A functor uses the base/member index (passed by
reflect() in its onBaseSubobject/onMember callback) to get any meta
information associated with that particular base/member. It relies on
the fact that indexes (enums) are stongly typed so that function
overloading picks up the correct function overload for a particular
index type).

On practice, we use more annotation to associate more meta information
with members. And we generate annotation for enums.

// end of generated code for class A

// start of generated code for class B

meta::Yes isReflectable(B const&);

template<class T> struct BaseIndexOf;
template<> struct BaseIndexOf
{
enum Type {
ENUM_NIL = -1
, A
, ENUM_END
, ENUM_BEGIN = 0
};
};

Here class B has a reflectable base class A.

template<class T> struct MemberIndexOf;
template<> struct MemberIndexOf
{
enum Type {
ENUM_NIL = -1
, aaa
, ENUM_END
, ENUM_BEGIN = 0
};
};

And a reflectable member aaa.

template<class Functor>
void reflect(B& object, Functor& f)
{
f.onObjectBegin(object);
f.onBaseSubobject(static_cast<A&>(object), BaseIndexOf::A);
f.onMember(object, object.aaa, MemberIndexOf::aaa);
f.onObjectEnd(object);
}

template<class Functor>
bool reflect(B& object, Functor& f, MemberIndexOf::Type member_index)
{
switch(member_index) {
case 0: f.onMember(object, object.aaa, MemberIndexOf::aaa);
return true;
default: return false;
}
}

inline Sref toId(Type)
{
return Sref("B", 1);
}

inline Sref toId(BaseIndexOf::Type base_index)
{
switch(base_index) {
case BaseIndexOf::A: return Sref("A", 1);
default: return Sref();
}
}

inline Sref toId(MemberIndexOf::Type member_index)
{
switch(member_index) {
case MemberIndexOf::aaa: return Sref("aaa", 3);
default: return Sref();
}
}

// end of generated code for class B

#ifdef REFLECT_NAMESPACE_END
REFLECT_NAMESPACE_END
#endif

The perl script to parse annotation and generate the reflect code is
under 700 lines.

That's interesting, but I don't like even 10 lines of PERL.
And if you aren't publishing it somehow, there's no way
to be getting the broad feedback needed to improve it.
I suggest rewriting it in C++ and making it available
on line.

Brian Wood
http://www.webEbenezer.net

naive serialization	4	Oct 15, 2010
Automating Serialization?	0	Nov 27, 2009
Object Serialization Issue	0	Aug 21, 2011
Different Serialization Technique In .NET	0	Sep 27, 2013
Serialization Framework	3	Dec 16, 2012
Object (de)serialization	12	Jan 25, 2010
core dump while Object Serialization of list <const base * >	1	Aug 20, 2011
Serialization	10	Nov 4, 2010

serialization

aegis

Brian

Eric Pruneau

aegis

Brian Wood

Joshua Maurice

Maxim Yegorushkin

Brian Wood

Brian Wood

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads