Serialisation/deserialisation of argument/results

F

Francois Grieu

Hi,

I have system X needing to call procedures on a system Y
acting as server (say, "system Y, please decipher this data
using the key of index n in your keystore and return me
the result, or an error code if the key is not found").

I can exchange variable-size 8-bit-byte blocks between
X (acting as client) and Y (acting as server). My problem
is conversion of parameters and result to/from 8-bit-byte
blocks, for a significant and evolving set of functions,
often with variable-size parameters. Y must be highly
resistant to deliberate injection of mis-formatted requests,
same for X on mis-formatted responses. Speed matters to a
degree since Y may use a PowerPC 603 and X may be a
multi-gigahertz Amd64 CPU connected thru PCIe.

The C code for X and Y must be highly portable across systems.
They do not necessary share the same data type size and
endianness, and some crash on unaligned memory access.
I'm willing to assume CHAR_BIT==8, but not much more.

The current code is rather ad-hoc: for each new function the
parameters and result are serialized on the originating side,
and parsed on the receiving side (wich is the harder part),
with no real method. On the parsing side, we have torrents of

unsigned char* inData;
long vIdx = inData[4]<<8 | inData[5];
(sometime the index is an enum) or
long vIdx ;
vIdx = inData[j++]<<8;
vIdx |= inData[j++];
(I'm not making the use of "long" for 2-byte index)
and this is typically interleaved with error checking code
(although optimizing for error cases is pointless).

This is error prone, in particular it is hard to avoid parsing
beyond the end of the received data (I have spotted cases where
the test for length is off-by-one, especially with the second
parsing method) and not introduce practically untestable
dependencies on basic type size.

I have the feeling (did not benchmark) that calling a parsing
function for each argument would incur a significant speed penalty
(especially if we want to avoid globals, which is desirable since
Y is multi-threaded with no support for thread-local globals)
and I do not want to depend on if the compiler for Y supports
inline functions. Thus I'm leaning towards
- a general parsing function parsing all the arguments into
a struct, returning a pointer for variable-size or long
arguments, and doing that according to a description of
the expected input and/or struct interpreted at runtime
- some clever (but readable and robust) use of macros
- some simple C code generating scheme for the serialization
and de-serialization.

Any idea/reference/pointer?

Francois Grieu
 
B

Ben Bacarisse

Francois Grieu said:
I have system X needing to call procedures on a system Y
acting as server (say, "system Y, please decipher this data
using the key of index n in your keystore and return me
the result, or an error code if the key is not found").

I can exchange variable-size 8-bit-byte blocks between
X (acting as client) and Y (acting as server). My problem
is conversion of parameters and result to/from 8-bit-byte
blocks, for a significant and evolving set of functions,
often with variable-size parameters. Y must be highly
resistant to deliberate injection of mis-formatted requests,
same for X on mis-formatted responses. Speed matters to a
degree since Y may use a PowerPC 603 and X may be a
multi-gigahertz Amd64 CPU connected thru PCIe.

The C code for X and Y must be highly portable across systems.
They do not necessary share the same data type size and
endianness, and some crash on unaligned memory access.
I'm willing to assume CHAR_BIT==8, but not much more.

The current code is rather ad-hoc: for each new function the
parameters and result are serialized on the originating side,
and parsed on the receiving side (wich is the harder part),
with no real method. On the parsing side, we have torrents of

unsigned char* inData;
long vIdx = inData[4]<<8 | inData[5];
(sometime the index is an enum) or
long vIdx ;
vIdx = inData[j++]<<8;
vIdx |= inData[j++];
(I'm not making the use of "long" for 2-byte index)
and this is typically interleaved with error checking code
(although optimizing for error cases is pointless).

This is error prone, in particular it is hard to avoid parsing
beyond the end of the received data (I have spotted cases where
the test for length is off-by-one, especially with the second
parsing method) and not introduce practically untestable
dependencies on basic type size.

I have the feeling (did not benchmark) that calling a parsing
function for each argument would incur a significant speed penalty
(especially if we want to avoid globals, which is desirable since
Y is multi-threaded with no support for thread-local globals)
and I do not want to depend on if the compiler for Y supports
inline functions. Thus I'm leaning towards
- a general parsing function parsing all the arguments into
a struct, returning a pointer for variable-size or long
arguments, and doing that according to a description of
the expected input and/or struct interpreted at runtime
- some clever (but readable and robust) use of macros
- some simple C code generating scheme for the serialization
and de-serialization.

Any idea/reference/pointer?

This is not really a C question. The key term to search for is "remote
procedure call" or RPC. Many moons ago I wrote a portable RPC mechanism
for C programs, but there are now lots of these and some are widely
available (Sun RPC being one of the most widely distributed).
 
F

Francois Grieu

Francois Grieu said:
I have system X needing to call procedures on a system Y
acting as server (say, "system Y, please decipher this data
using the key of index n in your keystore and return me
the result, or an error code if the key is not found").

I can exchange variable-size 8-bit-byte blocks between
X (acting as client) and Y (acting as server). My problem
is conversion of parameters and result to/from 8-bit-byte
blocks, for a significant and evolving set of functions,
often with variable-size parameters. Y must be highly
resistant to deliberate injection of mis-formatted requests,
same for X on mis-formatted responses. Speed matters to a
degree since Y may use a PowerPC 603 and X may be a
multi-gigahertz Amd64 CPU connected thru PCIe.

The C code for X and Y must be highly portable across systems.
They do not necessary share the same data type size and
endianness, and some crash on unaligned memory access.
I'm willing to assume CHAR_BIT==8, but not much more.

The current code is rather ad-hoc: for each new function the
parameters and result are serialized on the originating side,
and parsed on the receiving side (wich is the harder part),
with no real method. On the parsing side, we have torrents of

unsigned char* inData;
long vIdx = inData[4]<<8 | inData[5];
(sometime the index is an enum) or
long vIdx ;
vIdx = inData[j++]<<8;
vIdx |= inData[j++];
(I'm not making the use of "long" for 2-byte index)
and this is typically interleaved with error checking code
(although optimizing for error cases is pointless).

This is error prone, in particular it is hard to avoid parsing
beyond the end of the received data (I have spotted cases where
the test for length is off-by-one, especially with the second
parsing method) and not introduce practically untestable
dependencies on basic type size.

I have the feeling (did not benchmark) that calling a parsing
function for each argument would incur a significant speed penalty
(especially if we want to avoid globals, which is desirable since
Y is multi-threaded with no support for thread-local globals)
and I do not want to depend on if the compiler for Y supports
inline functions. Thus I'm leaning towards
- a general parsing function parsing all the arguments into
a struct, returning a pointer for variable-size for long
arguments, and doing that according to a description of
the expected input and/or struct interpreted at runtime
- some clever (but readable and robust) use of macros
- some simple C code generating scheme for the serialization
and de-serialization.

Any idea/reference/pointer?

This is not really a C question. The key term to search for is "remote
procedure call" or RPC. Many moons ago I wrote a portable RPC mechanism
for C programs, but there are now lots of these and some are widely
available (Sun RPC being one of the most widely distributed).

My problem is clean, robust, portable, lightweight, efficient
serialization/de-serialization of C arguments and result in
the context of an (assumed working) RPC framework where a remote
procedure has a single argument and result: a variable size
8-bit-byte buffer.

I see it as largely C-specific, because C has lax rules on type sizes,
arithmetic, packing, alignment. And C has a relatively precisely
defined preprocessor that may help.

Francois Grieu
 
B

Ben Bacarisse

Francois Grieu said:
My problem is clean, robust, portable, lightweight, efficient
serialization/de-serialization of C arguments and result in
the context of an (assumed working) RPC framework where a remote
procedure has a single argument and result: a variable size
8-bit-byte buffer.

Looking at other RPC mechanisms is still a good idea. Most will solve
the type packing/unpacking problem so seeing how they do it could help.
Some separate the data serialisation code from the call mechanism so you
might even be able to borrow that code.

Mine used a tiny interface language to specify the types in enough
detail to allow the call to be made; and a utility processed this
specification to generated stub functions that packed an unpacked the
arguments and results.
I see it as largely C-specific, because C has lax rules on type sizes,
arithmetic, packing, alignment. And C has a relatively precisely
defined preprocessor that may help.

Not at all. It can also be done in a language that is sufficiently
reflexive. RPC in Lisp is easy if you have a raw send and receive
protocol. You may have a C question in that C lacks enough information
in the type system to be able to tell, at a glance, what should be sent
(e.g. is a char * s null terminated string or a pointer to a character
array of some fixed size) but that issue is shared with other
languages.
 
F

Francois Grieu

Looking at other RPC mechanisms is still a good idea. Most will solve
the type packing/unpacking problem so seeing how they do it could help.
Some separate the data serialisation code from the call mechanism so you
might even be able to borrow that code.

Mine used a tiny interface language to specify the types in enough
detail to allow the call to be made; and a utility processed this
specification to generated stub functions that packed an unpacked the
arguments and results.

I had this in mind with "some simple C code generating scheme for the
serialization and de-serialization"; perhaps we could perform this
with the C preprocessor itself (rather than a separate program)?
That's why the question has a relation to C.

Francois Grieu
 
S

Shao Miller

I had this in mind with "some simple C code generating scheme for the
serialization and de-serialization"; perhaps we could perform this
with the C preprocessor itself (rather than a separate program)?
That's why the question has a relation to C.

For your pleasure or displeasure, here's a toy which uses the
preprocessor to attempt to achieve automatically-generated "struct
descriptors" and [de]serialization means. Just a toy, mind you:


http://git.zytor.com/?p=users/sha0/...7;hb=95e4b5dedc01d6392ea4401fa1016a4c1418c467

I hope it's interesting. :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top