Serialisation/deserialisation of argument/results

Discussion in 'C Programming' started by Francois Grieu, May 5, 2011.

  1. Hi,

    I have system X needing to call procedures on a system Y
    acting as server (say, "system Y, please decipher this data
    using the key of index n in your keystore and return me
    the result, or an error code if the key is not found").

    I can exchange variable-size 8-bit-byte blocks between
    X (acting as client) and Y (acting as server). My problem
    is conversion of parameters and result to/from 8-bit-byte
    blocks, for a significant and evolving set of functions,
    often with variable-size parameters. Y must be highly
    resistant to deliberate injection of mis-formatted requests,
    same for X on mis-formatted responses. Speed matters to a
    degree since Y may use a PowerPC 603 and X may be a
    multi-gigahertz Amd64 CPU connected thru PCIe.

    The C code for X and Y must be highly portable across systems.
    They do not necessary share the same data type size and
    endianness, and some crash on unaligned memory access.
    I'm willing to assume CHAR_BIT==8, but not much more.

    The current code is rather ad-hoc: for each new function the
    parameters and result are serialized on the originating side,
    and parsed on the receiving side (wich is the harder part),
    with no real method. On the parsing side, we have torrents of

    unsigned char* inData;
    long vIdx = inData[4]<<8 | inData[5];
    (sometime the index is an enum) or
    long vIdx ;
    vIdx = inData[j++]<<8;
    vIdx |= inData[j++];
    (I'm not making the use of "long" for 2-byte index)
    and this is typically interleaved with error checking code
    (although optimizing for error cases is pointless).

    This is error prone, in particular it is hard to avoid parsing
    beyond the end of the received data (I have spotted cases where
    the test for length is off-by-one, especially with the second
    parsing method) and not introduce practically untestable
    dependencies on basic type size.

    I have the feeling (did not benchmark) that calling a parsing
    function for each argument would incur a significant speed penalty
    (especially if we want to avoid globals, which is desirable since
    Y is multi-threaded with no support for thread-local globals)
    and I do not want to depend on if the compiler for Y supports
    inline functions. Thus I'm leaning towards
    - a general parsing function parsing all the arguments into
    a struct, returning a pointer for variable-size or long
    arguments, and doing that according to a description of
    the expected input and/or struct interpreted at runtime
    - some clever (but readable and robust) use of macros
    - some simple C code generating scheme for the serialization
    and de-serialization.

    Any idea/reference/pointer?

    Francois Grieu
     
    Francois Grieu, May 5, 2011
    #1
    1. Advertising

  2. Francois Grieu <> writes:

    > I have system X needing to call procedures on a system Y
    > acting as server (say, "system Y, please decipher this data
    > using the key of index n in your keystore and return me
    > the result, or an error code if the key is not found").
    >
    > I can exchange variable-size 8-bit-byte blocks between
    > X (acting as client) and Y (acting as server). My problem
    > is conversion of parameters and result to/from 8-bit-byte
    > blocks, for a significant and evolving set of functions,
    > often with variable-size parameters. Y must be highly
    > resistant to deliberate injection of mis-formatted requests,
    > same for X on mis-formatted responses. Speed matters to a
    > degree since Y may use a PowerPC 603 and X may be a
    > multi-gigahertz Amd64 CPU connected thru PCIe.
    >
    > The C code for X and Y must be highly portable across systems.
    > They do not necessary share the same data type size and
    > endianness, and some crash on unaligned memory access.
    > I'm willing to assume CHAR_BIT==8, but not much more.
    >
    > The current code is rather ad-hoc: for each new function the
    > parameters and result are serialized on the originating side,
    > and parsed on the receiving side (wich is the harder part),
    > with no real method. On the parsing side, we have torrents of
    >
    > unsigned char* inData;
    > long vIdx = inData[4]<<8 | inData[5];
    > (sometime the index is an enum) or
    > long vIdx ;
    > vIdx = inData[j++]<<8;
    > vIdx |= inData[j++];
    > (I'm not making the use of "long" for 2-byte index)
    > and this is typically interleaved with error checking code
    > (although optimizing for error cases is pointless).
    >
    > This is error prone, in particular it is hard to avoid parsing
    > beyond the end of the received data (I have spotted cases where
    > the test for length is off-by-one, especially with the second
    > parsing method) and not introduce practically untestable
    > dependencies on basic type size.
    >
    > I have the feeling (did not benchmark) that calling a parsing
    > function for each argument would incur a significant speed penalty
    > (especially if we want to avoid globals, which is desirable since
    > Y is multi-threaded with no support for thread-local globals)
    > and I do not want to depend on if the compiler for Y supports
    > inline functions. Thus I'm leaning towards
    > - a general parsing function parsing all the arguments into
    > a struct, returning a pointer for variable-size or long
    > arguments, and doing that according to a description of
    > the expected input and/or struct interpreted at runtime
    > - some clever (but readable and robust) use of macros
    > - some simple C code generating scheme for the serialization
    > and de-serialization.
    >
    > Any idea/reference/pointer?


    This is not really a C question. The key term to search for is "remote
    procedure call" or RPC. Many moons ago I wrote a portable RPC mechanism
    for C programs, but there are now lots of these and some are widely
    available (Sun RPC being one of the most widely distributed).

    --
    Ben.
     
    Ben Bacarisse, May 5, 2011
    #2
    1. Advertising

  3. Le 05/05/2011 15:01, Ben Bacarisse wrote:
    > Francois Grieu<> writes:
    >
    >> I have system X needing to call procedures on a system Y
    >> acting as server (say, "system Y, please decipher this data
    >> using the key of index n in your keystore and return me
    >> the result, or an error code if the key is not found").
    >>
    >> I can exchange variable-size 8-bit-byte blocks between
    >> X (acting as client) and Y (acting as server). My problem
    >> is conversion of parameters and result to/from 8-bit-byte
    >> blocks, for a significant and evolving set of functions,
    >> often with variable-size parameters. Y must be highly
    >> resistant to deliberate injection of mis-formatted requests,
    >> same for X on mis-formatted responses. Speed matters to a
    >> degree since Y may use a PowerPC 603 and X may be a
    >> multi-gigahertz Amd64 CPU connected thru PCIe.
    >>
    >> The C code for X and Y must be highly portable across systems.
    >> They do not necessary share the same data type size and
    >> endianness, and some crash on unaligned memory access.
    >> I'm willing to assume CHAR_BIT==8, but not much more.
    >>
    >> The current code is rather ad-hoc: for each new function the
    >> parameters and result are serialized on the originating side,
    >> and parsed on the receiving side (wich is the harder part),
    >> with no real method. On the parsing side, we have torrents of
    >>
    >> unsigned char* inData;
    >> long vIdx = inData[4]<<8 | inData[5];
    >> (sometime the index is an enum) or
    >> long vIdx ;
    >> vIdx = inData[j++]<<8;
    >> vIdx |= inData[j++];
    >> (I'm not making the use of "long" for 2-byte index)
    >> and this is typically interleaved with error checking code
    >> (although optimizing for error cases is pointless).
    >>
    >> This is error prone, in particular it is hard to avoid parsing
    >> beyond the end of the received data (I have spotted cases where
    >> the test for length is off-by-one, especially with the second
    >> parsing method) and not introduce practically untestable
    >> dependencies on basic type size.
    >>
    >> I have the feeling (did not benchmark) that calling a parsing
    >> function for each argument would incur a significant speed penalty
    >> (especially if we want to avoid globals, which is desirable since
    >> Y is multi-threaded with no support for thread-local globals)
    >> and I do not want to depend on if the compiler for Y supports
    >> inline functions. Thus I'm leaning towards
    >> - a general parsing function parsing all the arguments into
    >> a struct, returning a pointer for variable-size for long
    >> arguments, and doing that according to a description of
    >> the expected input and/or struct interpreted at runtime
    >> - some clever (but readable and robust) use of macros
    >> - some simple C code generating scheme for the serialization
    >> and de-serialization.
    >>
    >> Any idea/reference/pointer?

    >
    > This is not really a C question. The key term to search for is "remote
    > procedure call" or RPC. Many moons ago I wrote a portable RPC mechanism
    > for C programs, but there are now lots of these and some are widely
    > available (Sun RPC being one of the most widely distributed).


    My problem is clean, robust, portable, lightweight, efficient
    serialization/de-serialization of C arguments and result in
    the context of an (assumed working) RPC framework where a remote
    procedure has a single argument and result: a variable size
    8-bit-byte buffer.

    I see it as largely C-specific, because C has lax rules on type sizes,
    arithmetic, packing, alignment. And C has a relatively precisely
    defined preprocessor that may help.

    Francois Grieu
     
    Francois Grieu, May 5, 2011
    #3
  4. Francois Grieu <> writes:

    > Le 05/05/2011 15:01, Ben Bacarisse wrote:
    >> Francois Grieu<> writes:

    <snip>
    >>> Any idea/reference/pointer?

    >>
    >> This is not really a C question. The key term to search for is "remote
    >> procedure call" or RPC. Many moons ago I wrote a portable RPC mechanism
    >> for C programs, but there are now lots of these and some are widely
    >> available (Sun RPC being one of the most widely distributed).

    >
    > My problem is clean, robust, portable, lightweight, efficient
    > serialization/de-serialization of C arguments and result in
    > the context of an (assumed working) RPC framework where a remote
    > procedure has a single argument and result: a variable size
    > 8-bit-byte buffer.


    Looking at other RPC mechanisms is still a good idea. Most will solve
    the type packing/unpacking problem so seeing how they do it could help.
    Some separate the data serialisation code from the call mechanism so you
    might even be able to borrow that code.

    Mine used a tiny interface language to specify the types in enough
    detail to allow the call to be made; and a utility processed this
    specification to generated stub functions that packed an unpacked the
    arguments and results.

    > I see it as largely C-specific, because C has lax rules on type sizes,
    > arithmetic, packing, alignment. And C has a relatively precisely
    > defined preprocessor that may help.


    Not at all. It can also be done in a language that is sufficiently
    reflexive. RPC in Lisp is easy if you have a raw send and receive
    protocol. You may have a C question in that C lacks enough information
    in the type system to be able to tell, at a glance, what should be sent
    (e.g. is a char * s null terminated string or a pointer to a character
    array of some fixed size) but that issue is shared with other
    languages.

    --
    Ben.
     
    Ben Bacarisse, May 5, 2011
    #4
  5. On 05/05/2011 16:31, Ben Bacarisse wrote:
    > Francois Grieu<> writes:
    >
    >> On 05/05/2011 15:01, Ben Bacarisse wrote:
    >>> Francois Grieu<> writes:

    > <snip>
    >>>> Any idea/reference/pointer?
    >>>
    >>> This is not really a C question. The key term to search for is "remote
    >>> procedure call" or RPC. Many moons ago I wrote a portable RPC mechanism
    >>> for C programs, but there are now lots of these and some are widely
    >>> available (Sun RPC being one of the most widely distributed).

    >>
    >> My problem is clean, robust, portable, lightweight, efficient
    >> serialization/de-serialization of C arguments and result in
    >> the context of an (assumed working) RPC framework where a remote
    >> procedure has a single argument and result: a variable size
    >> 8-bit-byte buffer.

    >
    > Looking at other RPC mechanisms is still a good idea. Most will solve
    > the type packing/unpacking problem so seeing how they do it could help.
    > Some separate the data serialisation code from the call mechanism so you
    > might even be able to borrow that code.
    >
    > Mine used a tiny interface language to specify the types in enough
    > detail to allow the call to be made; and a utility processed this
    > specification to generated stub functions that packed an unpacked the
    > arguments and results.


    I had this in mind with "some simple C code generating scheme for the
    serialization and de-serialization"; perhaps we could perform this
    with the C preprocessor itself (rather than a separate program)?
    That's why the question has a relation to C.

    Francois Grieu
     
    Francois Grieu, May 6, 2011
    #5
  6. Francois Grieu

    Shao Miller Guest

    On May 6, 6:42 am, Francois Grieu <> wrote:
    > I had this in mind with "some simple C code generating scheme for the
    > serialization and de-serialization"; perhaps we could perform this
    > with the C preprocessor itself (rather than a separate program)?
    > That's why the question has a relation to C.


    For your pleasure or displeasure, here's a toy which uses the
    preprocessor to attempt to achieve automatically-generated "struct
    descriptors" and [de]serialization means. Just a toy, mind you:


    http://git.zytor.com/?p=users/sha0/...7;hb=95e4b5dedc01d6392ea4401fa1016a4c1418c467

    I hope it's interesting. :)
     
    Shao Miller, May 9, 2011
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Nick Gilbert
    Replies:
    4
    Views:
    1,509
  2. Roedy Green

    Serialisation inefficiency

    Roedy Green, Sep 16, 2003, in forum: Java
    Replies:
    8
    Views:
    397
    Robert Olofsson
    Sep 18, 2003
  3. Michael Binz
    Replies:
    0
    Views:
    314
    Michael Binz
    Oct 29, 2003
  4. VisionSet
    Replies:
    0
    Views:
    358
    VisionSet
    Jun 9, 2004
  5. VisionSet

    deserialisation problem

    VisionSet, Jan 30, 2006, in forum: Java
    Replies:
    4
    Views:
    377
    VisionSet
    Jan 31, 2006
Loading...

Share This Page