Re: Fastest way to serialize arbitrary objects ???

Discussion in 'C++' started by Brian, Apr 30, 2010.

  1. Brian

    Brian Guest

    On Apr 29, 8:04 pm, "Peter Olcott" <> wrote:
    > "Sherm Pendley" <> wrote in message
    >
    > news:...
    >
    > > "Peter Olcott" <> writes:

    >
    > >> I want to know some ideas to provide very fast object
    > >> serialization and deserialization for arbitrary objects.

    >
    > > Boost has a handy serialization library:

    >
    > >    <http://www.boost.org>

    >
    > > <http://www.boost.org/doc/libs/1_42_0/libs/serialization/doc/index.html>

    >
    > > sherm--

    >
    > I think that I figured out a way that is pretty simple and
    > fast.
    > I simply serialize everything to a single
    > std::vector<unsigned int>, and then write this out.
    >
    > I provide a quick way to determine the exact size of every
    > sub-object so that I can allocate the single std::vector all
    > at once, and each sub-object knows how to append itself to
    > the single std::vector<unsigned int>.



    There are at least a couple of different ways to approach
    this. The way I do it is to count the size of the message
    and then begin marshalling the data. So I make two passes
    over the types involved. There are some positive aspects
    to counting the size before marshalling data:

    1. I don't waste time putting all of the data into a
    buffer/vector only to find late in the process that
    the length of the message exceeds the maximum message
    length.

    2. I don't have to have buffers as big as the maximum
    message length.

    3. The first parts of the message can be dispatched to
    their destination without waiting for the whole message
    to be marshalled. Say the message is 200,000 bytes and
    the buffer is 16384 bytes. My approach frees the
    first parts of the message to go on their merry way
    without having to wait for the balance of the message
    to be formatted.

    Those are the upsides of my approach. The downside is the
    two passes through the objects. There may be some upside to
    the downside though in that the first pass is a cursory
    counting pass and may be helpful cache-wise since the
    second pass follows immediately after the first pass.

    There are some examples of what I'm talking about in
    the Send functions of this file --
    http://webEbenezer.net/misc/msg_shepherd.hh .


    Brian Wood
     
    Brian, Apr 30, 2010
    #1
    1. Advertising

  2. Brian

    James Kanze Guest

    On Apr 30, 3:01 am, Brian <> wrote:
    > On Apr 29, 8:04 pm, "Peter Olcott" <> wrote:


    > > I think that I figured out a way that is pretty simple and
    > > fast. I simply serialize everything to a single
    > > std::vector<unsigned int>, and then write this out.


    > > I provide a quick way to determine the exact size of every
    > > sub-object so that I can allocate the single std::vector all
    > > at once, and each sub-object knows how to append itself to
    > > the single std::vector<unsigned int>.


    > There are at least a couple of different ways to approach
    > this. The way I do it is to count the size of the message
    > and then begin marshalling the data. So I make two passes
    > over the types involved. There are some positive aspects
    > to counting the size before marshalling data:


    > 1. I don't waste time putting all of the data into a
    > buffer/vector only to find late in the process that
    > the length of the message exceeds the maximum message
    > length.


    > 2. I don't have to have buffers as big as the maximum
    > message length.


    > 3. The first parts of the message can be dispatched to
    > their destination without waiting for the whole message
    > to be marshalled. Say the message is 200,000 bytes and
    > the buffer is 16384 bytes. My approach frees the
    > first parts of the message to go on their merry way
    > without having to wait for the balance of the message
    > to be formatted.


    Another advantage is that you can define a protocol which puts
    the length of each object at its beginning. This can
    considerably speed up skipping an object you're not interested
    in.

    > Those are the upsides of my approach. The downside is the
    > two passes through the objects. There may be some upside to
    > the downside though in that the first pass is a cursory
    > counting pass and may be helpful cache-wise since the
    > second pass follows immediately after the first pass.


    Or if you have enough objects, it can hurt cache-wise by
    ensuring that the first objects you visited and will write will
    have been replaced in the cache by later objects:). (Tuning
    for cache behavior is incredibly tricky, and what is optimal for
    one machine may be sub-optimal for another, even if the two
    machines use the same basic architecture.)

    --
    James Kanze
     
    James Kanze, Apr 30, 2010
    #2
    1. Advertising

  3. Brian

    Brian Guest

    On Apr 30, 4:05 am, James Kanze <> wrote:

    > Another advantage is that you can define a protocol which puts
    > the length of each object at its beginning.  This can
    > considerably speed up skipping an object you're not interested
    > in.
    >
    >


    When do you do that? It sounds like you're skipping over
    an object on the receiving side. I prepend the message length
    and the size of containers and strings, but I can't think of
    much reason to skip over objects. At the marshalling/serial-
    ization level, I'm not sure if you could decide to skip
    something. I guess the application would have to tell the
    marshalling code to skip over something.


    Brian Wood
     
    Brian, Apr 30, 2010
    #3
  4. Brian

    James Kanze Guest

    On Apr 30, 7:29 pm, Brian <> wrote:
    > On Apr 30, 4:05 am, James Kanze <> wrote:


    > > Another advantage is that you can define a protocol which puts
    > > the length of each object at its beginning. This can
    > > considerably speed up skipping an object you're not interested
    > > in.


    > When do you do that?


    It depends. It's not that frequent, but I've needed it once or
    twice.

    > It sounds like you're skipping over an object on the receiving
    > side.


    Exactly.

    > I prepend the message length and the size of containers and
    > strings, but I can't think of much reason to skip over
    > objects.


    Two possible reasons: you write out a large set of possibly
    relevant objects, but each execution only needs a few of them,
    and for journal files---when recovering, you can skip
    transactions that are already in the database.

    > At the marshalling/serial- ization level, I'm not sure if you
    > could decide to skip something. I guess the application would
    > have to tell the marshalling code to skip over something.


    Typically, such applications use a special record format, with
    identifying information in the first couple of bytes of the
    record. The reading code reads this information, then decides
    whether it wants the record or not.

    --
    James Kanze
     
    James Kanze, May 1, 2010
    #4
  5. Brian

    Liviu Guest

    "James Kanze" <> wrote...
    > On Apr 30, 7:29 pm, Brian <> wrote:
    >> On Apr 30, 4:05 am, James Kanze <> wrote:

    >
    >>> Another advantage is that you can define a protocol which puts
    >>> the length of each object at its beginning. This can considerably
    >>> speed up skipping an object you're not interested in.

    >
    >> When do you do that?

    > [...]
    > Two possible reasons: you write out a large set of possibly
    > relevant objects, but each execution only needs a few of them,
    > and for journal files---when recovering, you can skip
    > transactions that are already in the database.


    Another reason is versioning (somewhat related to your first above).
    Formats are not immutable, but with an appropriate serialization
    scheme older versions can be prepared beforehand to read back
    future formats and skip over the parts which don't apply/translate.

    Liviu
     
    Liviu, May 2, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike Larkin
    Replies:
    1
    Views:
    6,097
    Alvin Bruney [MVP - ASP.NET]
    May 23, 2005
  2. Gordz
    Replies:
    3
    Views:
    3,959
    John Oakes
    Jun 7, 2004
  3. Siegfried Ertl
    Replies:
    1
    Views:
    302
    Roedy Green
    Aug 5, 2003
  4. Honestmath
    Replies:
    5
    Views:
    571
    Honestmath
    Dec 13, 2004
  5. Rui Maciel
    Replies:
    5
    Views:
    309
    Liviu
    May 2, 2010
Loading...

Share This Page