serializing an arbitrary data structure into a flat buffer (raw contiguousmemory block)

Discussion in 'C Programming' started by Alfonso Morra, Oct 3, 2005.

  1. Hi,

    I am writing a messaging library which will allow me to send a generic
    message structure with custom "payloads".

    In many cases, a message must store a non-linear data structure (i.e.
    "payload") using pointers. Examples of these are binary trees, hash
    tables etc. Thus, the message itself contains only a pointer to the
    actual data. When the message is sent to the same processor, these
    pointers point to the original locations, which are within the address
    space of the same processor. However, when such a message is sent to
    other processors, these pointers will point to invalid locations.

    I need a way to ``serialize'' (or pack) my message structures into a
    contiguous raw memory block (and then be able to de-serialize or
    "unpack" them at the other end.

    I just need a simple example, using a simple structure that contains
    pointers (say a ptr to another struct, or a char*) so that I can build
    on from that.

    Searches on Google over the last few days have yielded nothig useful.

    Thanks
     
    Alfonso Morra, Oct 3, 2005
    #1
    1. Advertising

  2. Alfonso Morra

    Zara Guest

    Re: serializing an arbitrary data structure into a flat buffer (rawcontiguous memory block)

    Alfonso Morra wrote:
    > Hi,
    >
    > I am writing a messaging library which will allow me to send a generic
    > message structure with custom "payloads".
    >
    > In many cases, a message must store a non-linear data structure (i.e.
    > "payload") using pointers. Examples of these are binary trees, hash
    > tables etc. Thus, the message itself contains only a pointer to the
    > actual data. When the message is sent to the same processor, these
    > pointers point to the original locations, which are within the address
    > space of the same processor. However, when such a message is sent to
    > other processors, these pointers will point to invalid locations.
    >
    > I need a way to ``serialize'' (or pack) my message structures into a
    > contiguous raw memory block (and then be able to de-serialize or
    > "unpack" them at the other end.
    >
    > I just need a simple example, using a simple structure that contains
    > pointers (say a ptr to another struct, or a char*) so that I can build
    > on from that.
    >
    > Searches on Google over the last few days have yielded nothig useful.
    >
    > Thanks
    >

    A typical solution would be to store first all common data for the
    message structure. Next, an "integer" (whichever integral type you
    prefer) with the count of elements contained. Then, for every element,
    the common part of it, followed by a count element, followed bu every
    sub-element... Thus you will have no pointers and all (relevant) data
    contained in the message.

    In some cases, when elements have different structures, you may need to
    prefix them with a type tag and/or a size field.

    For instance:

    <Total-size><message structure fields><number of elements>
    {for every element}
    <Element size><element type tag><element fields><number of
    sub-elements>
    {for every sub-element}
    ...
    <some kind of element checksum>
    <some kind of total checksum>
     
    Zara, Oct 3, 2005
    #2
    1. Advertising

  3. Re: serializing an arbitrary data structure into a flat buffer (rawcontiguous memory block)

    Zara wrote:

    > Alfonso Morra wrote:
    >
    >> Hi,
    >>
    >> I am writing a messaging library which will allow me to send a generic
    >> message structure with custom "payloads".
    >>
    >> In many cases, a message must store a non-linear data structure (i.e.
    >> "payload") using pointers. Examples of these are binary trees, hash
    >> tables etc. Thus, the message itself contains only a pointer to the
    >> actual data. When the message is sent to the same processor, these
    >> pointers point to the original locations, which are within the address
    >> space of the same processor. However, when such a message is sent to
    >> other processors, these pointers will point to invalid locations.
    >>
    >> I need a way to ``serialize'' (or pack) my message structures into a
    >> contiguous raw memory block (and then be able to de-serialize or
    >> "unpack" them at the other end.
    >>
    >> I just need a simple example, using a simple structure that contains
    >> pointers (say a ptr to another struct, or a char*) so that I can build
    >> on from that.
    >>
    >> Searches on Google over the last few days have yielded nothig useful.
    >>
    >> Thanks
    >>

    > A typical solution would be to store first all common data for the
    > message structure. Next, an "integer" (whichever integral type you
    > prefer) with the count of elements contained. Then, for every element,
    > the common part of it, followed by a count element, followed bu every
    > sub-element... Thus you will have no pointers and all (relevant) data
    > contained in the message.
    >
    > In some cases, when elements have different structures, you may need to
    > prefix them with a type tag and/or a size field.
    >
    > For instance:
    >
    > <Total-size><message structure fields><number of elements>
    > {for every element}
    > <Element size><element type tag><element fields><number of
    > sub-elements>
    > {for every sub-element}
    > ...
    > <some kind of element checksum>
    > <some kind of total checksum>


    Thanks - but this is not what I'm looking for. Your code looks like some
    kind of markup language. What I want is a byte stream (i.e. binary data).

    For those reading - I am not concerned with endianess and other low
    level details (its not necessary for my purposes).
     
    Alfonso Morra, Oct 3, 2005
    #3
  4. Alfonso Morra

    Zara Guest

    Re: serializing an arbitrary data structure into a flat buffer (rawcontiguous memory block)

    Alfonso Morra wrote:
    >
    >
    > Zara wrote:
    >
    >> Alfonso Morra wrote:
    >>
    >>> Hi,
    >>>
    >>> I am writing a messaging library which will allow me to send a
    >>> generic message structure with custom "payloads".
    >>>

    (...)
    >>>
    >>> I need a way to ``serialize'' (or pack) my message structures into a
    >>> contiguous raw memory block (and then be able to de-serialize or
    >>> "unpack" them at the other end.
    >>>
    >>>

    (...)
    >>
    >> For instance:
    >>
    >> <Total-size><message structure fields><number of elements>
    >> {for every element}
    >> <Element size><element type tag><element fields><number of
    >> sub-elements>
    >> {for every sub-element}
    >> ...
    >> <some kind of element checksum>
    >> <some kind of total checksum>

    >
    > Thanks - but this is not what I'm looking for. Your code looks like some
    > kind of markup language. What I want is a byte stream (i.e. binary data).
    >
    > For those reading - I am not concerned with endianess and other low
    > level details (its not necessary for my purposes).
    >


    Well, although my instance is full of < and >, it is not a mark-up
    language. Suppose this:

    struct node {
    char *name;
    node * next;
    };

    struct structure {
    char *string;
    node *list;
    } my_structure;

    and let it be:

    my_structure
    "Root node"
    list---------->node 1
    "It's me"
    next------------>node 2
    "It's I"
    next->NULL

    the message, in binary, could look something like:

    00 23 00 09 52 6f 6f 54 20 6e 6f 64 65 00 02 00
    0a 01 00 07 49 54 27 53 20 6d 65 00 09 01 00 06
    49 54 27 53 20 49

    Which has all of the data contained in it, except the checksums (I don`t
    feel like putting them in), and it supposes a little-endian, ASCII machine.

    Now, if you bother with looking at it, you will see it fits with your
    specs, and is described by my former message

    Regards
     
    Zara, Oct 3, 2005
    #4
  5. Re: serializing an arbitrary data structure into a flat buffer (rawcontiguous memory block)

    Zara wrote:

    > Alfonso Morra wrote:
    >
    >>
    >>
    >> Zara wrote:
    >>
    >>> Alfonso Morra wrote:
    >>>
    >>>> Hi,
    >>>>
    >>>> I am writing a messaging library which will allow me to send a
    >>>> generic message structure with custom "payloads".
    >>>>

    > (...)
    >
    >>>>
    >>>> I need a way to ``serialize'' (or pack) my message structures into a
    >>>> contiguous raw memory block (and then be able to de-serialize or
    >>>> "unpack" them at the other end.
    >>>>
    >>>>

    > (...)
    >
    >>>
    >>> For instance:
    >>>
    >>> <Total-size><message structure fields><number of elements>
    >>> {for every element}
    >>> <Element size><element type tag><element fields><number of
    >>> sub-elements>
    >>> {for every sub-element}
    >>> ...
    >>> <some kind of element checksum>
    >>> <some kind of total checksum>

    >>
    >>
    >> Thanks - but this is not what I'm looking for. Your code looks like
    >> some kind of markup language. What I want is a byte stream (i.e.
    >> binary data).
    >>
    >> For those reading - I am not concerned with endianess and other low
    >> level details (its not necessary for my purposes).
    >>

    >
    > Well, although my instance is full of < and >, it is not a mark-up
    > language. Suppose this:
    >
    > struct node {
    > char *name;
    > node * next;
    > };
    >
    > struct structure {
    > char *string;
    > node *list;
    > } my_structure;
    >
    > and let it be:
    >
    > my_structure
    > "Root node"
    > list---------->node 1
    > "It's me"
    > next------------>node 2
    > "It's I"
    > next->NULL
    >
    > the message, in binary, could look something like:
    >
    > 00 23 00 09 52 6f 6f 54 20 6e 6f 64 65 00 02 00
    > 0a 01 00 07 49 54 27 53 20 6d 65 00 09 01 00 06
    > 49 54 27 53 20 49
    >
    > Which has all of the data contained in it, except the checksums (I don`t
    > feel like putting them in), and it supposes a little-endian, ASCII machine.
    >
    > Now, if you bother with looking at it, you will see it fits with your
    > specs, and is described by my former message
    >
    > Regards
    >


    You've completely lost me now. I have no idea how you arived at the hex
    dump from your two structures. What I'm really after is a simple example
    (or a link to a site where I can see an example of serializing a simple
    struct containing pointers). I have searchedGoogle over the last three
    days - to no avail.

    It does not have to be anything too complicated. Simply so that I can
    build on it and use it as the starting point for serializing my
    stuctures - although I have a rough idea of what you're doing, I am
    unfortunately, unable to build on your examples thus far.
     
    Alfonso Morra, Oct 3, 2005
    #5
  6. Re: serializing an arbitrary data structure into a flat buffer (raw contiguous memory block)

    In article <dhqll3$h5i$-infra.bt.com>, Alfonso Morra <> writes:
    >
    > In many cases, a message must store a non-linear data structure (i.e.
    > "payload") using pointers. Examples of these are binary trees, hash
    > tables etc. Thus, the message itself contains only a pointer to the
    > actual data. When the message is sent to the same processor, these
    > pointers point to the original locations, which are within the address
    > space of the same processor. However, when such a message is sent to
    > other processors, these pointers will point to invalid locations.
    >
    > I need a way to ``serialize'' (or pack) my message structures into a
    > contiguous raw memory block (and then be able to de-serialize or
    > "unpack" them at the other end.


    This is not a trivial problem, but it's not a particularly difficult
    one, either.

    In some cases the original data structure, or a suitable facsimile,
    can be reconstructed from the data alone. This is often the case
    with hash tables, sorted lists, binary trees, and so forth. The
    sending side simply sends the nodes and the receiving side inserts
    them into the appropriate data structure.

    In the general case, however, you need a mechanism for preserving
    associations between pieces of data. Pointers are such a mechanism,
    but they (almost always[1]) represent a system-specific, and usually
    process-specific, mapping, and in any case the information that a
    portable C program can extract from them is limited.

    So the obvious solution is to have your serialization process replace
    the pointers with some portable representation of the associations
    between items. One very simple approach is to serialize all of the
    items into a single block of malloc'd memory (using a pointer to
    unsigned char), and replace the pointers with offsets into that
    block. The deserializer extracts items, remembering the locations it
    has extracted them to, and converts from offsets back into pointers.

    A better scheme is probably to label each item with a unique
    identifier and replace each pointer with the identifier of the object
    it points to. This is essentially the same as the "offset" scheme
    except that it makes the mapping explicit (offsets are really just
    unique IDs). That increases the information available to the
    deserializer, which makes it more robust - it's easier for it to
    detect malformed input. Transporting data and converting it among
    representations are fragile, vulnerable operations, and you want to
    make them as robust as possible.

    > I just need a simple example, using a simple structure that contains
    > pointers (say a ptr to another struct, or a char*) so that I can build
    > on from that.


    It's difficult to provide a robust, portable, short example, because
    this is not a problem that lends itself to short, portable code.
    Portable data representations require marshalling and unmarshalling
    from and to the local system's representation. Furthermore, to
    really handle the general case, you have to keep a map from object
    addresses to IDs while serializing (so that each pointer can be
    converted to its ID), and a reverse map while deserializing.

    Here's an outline for the serializer:

    - Walk the data, creating a unique ID for each item and mapping
    it to the item's address. You'll have to choose what data structure
    to use for the map; a hash table (keyed by address) is an obvious
    choice, but might not be worth the overhead and complexity.

    - As each item is serialized, prefix it with its ID (and, presumably,
    type information and any other metadata your system needs to provide).

    - In the serialized representation, replace each pointer field with
    the ID of the pointed-to object.

    This two-pass approach is simpler than a single pass, which would
    have to remember the locations in the serialized data of pointer/ID
    fields that referred to objects that hadn't yet been assigned an ID,
    so you could fill those in later.

    The deserializer would use a similar two-pass process, first
    allocating areas for each item and building a map between area and ID
    in the process, then deserializing each item into its area and
    setting pointer fields using the map.


    1. There are esoteric architectures which use "fat" pointers that
    contain more information than simply an offset into address space,
    but that's an implementation detail that's not useful in portable C
    programming.

    --
    Michael Wojcik

    Against all odds, over a noisy telephone line, tapped by the tax authorities
    and the secret police, Alice will happily attempt, with someone she doesn't
    trust, whom she can't hear clearly, and who is probably someone else, to
    fiddle her tax return and to organise a coup d'etat, while at the same time
    minimising the cost of the phone call. -- John Gordon
     
    Michael Wojcik, Oct 4, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ksenia Marasanova

    convert flat structure into hierarchical one

    Ksenia Marasanova, Sep 26, 2004, in forum: Python
    Replies:
    3
    Views:
    554
    Ksenia Marasanova
    Sep 27, 2004
  2. Alfonso Morra

    serialization of structure into a raw memory block

    Alfonso Morra, Oct 1, 2005, in forum: C Programming
    Replies:
    5
    Views:
    505
    Michael Wojcik
    Oct 2, 2005
  3. Alfonso Morra
    Replies:
    9
    Views:
    348
  4. Alfonso Morra
    Replies:
    0
    Views:
    385
    Alfonso Morra
    Oct 3, 2005
  5. Gavin Kistner
    Replies:
    6
    Views:
    116
    Ezra Zygmuntowicz
    Nov 17, 2005
Loading...

Share This Page