Optimize a serial protocol for data exchange between two C applications

P

pozzugno

I have two C applications running on embedded platforms. Actually they communicate through a serial connection (38400bps) with a proprietary binary protocol.

The master application sends commands to the slave application requesting its status or changing some settings. The answer to the status request is a sequence of binary data: numerical values in 1 or 2 or 4 bytes (in some cases, organized in arrays), small null-terminated strings, bitmask and so on.

At first this approach was good, but now I'm finding many problems. Mostly I have to keep the two applications synchronized: if I add a parameter (maybe in the middle of the answer, because an array size is increased) or enlarge a numerical values (from 1 to 2 bytes) or something else, I have to change both applications. I usually want to have a retro-compatible master application (master should communicate well with new or older slaves application), so the code will be filled with many ifs (if the slave answer version is 1.0 than there is one parameter here, if it is 1.2 the array size is 3 and not 2...)

Is there a better approach to exchange data on the wire between two C applications, considering the limitations of memory, the low speed and the length of messages of a small embedded platform?
 
S

Shao Miller

Is there a better approach to exchange data on the wire between two C applications, considering the limitations of memory, the low speed and the length of messages of a small embedded platform?

You could use sentinel values (like strings use the null terminator) or
send the array size before sending the array. - Shao
 
G

glen herrmannsfeldt

At first this approach was good, but now I'm finding many problems.
Mostly I have to keep the two applications synchronized: if I add
a parameter (maybe in the middle of the answer, because an array
size is increased) or enlarge a numerical values (from 1 to 2 bytes)
or something else, I have to change both applications.
I usually want to have a retro-compatible master application
(master should communicate well with new or older slaves
application), so the code will be filled with many ifs
(if the slave answer version is 1.0 than there is one parameter
here, if it is 1.2 the array size is 3 and not 2...)
Is there a better approach to exchange data on the wire between
two C applications, considering the limitations of memory,
the low speed and the length of messages of a small embedded
platform?

You might look at XDR, which is meant for communication between
unlike devices. It may or may not do what you want, but it is
conveniently well standardized. (It is in a few internet RFCs.)

-- glen
 
P

pozzugno

You might look at XDR, which is meant for communication between
unlike devices. It may or may not do what you want, but it is
conveniently well standardized. (It is in a few internet RFCs.)

Thank you for your suggestion. Anyway XDR seems to encode everything with a 4-bytes unit base. I have many small integer values and many bytes would be added without info.

I'd like to use a more compact serialization format and simple to implement in embedded applications.
BSON? JSON? Protocol Buffers? Other suggestions?
 
L

Les Cargill

I have two C applications running on embedded platforms. Actually
they communicate through a serial connection (38400bps) with a
proprietary binary protocol.


Binary protocols are in general a form of premature optimization. But
we've all done it... it's not really a protocol; it's an interpreter.

The master application sends commands to the slave application
requesting its status or changing some settings. The answer to the
status request is a sequence of binary data: numerical values in 1 or
2 or 4 bytes (in some cases, organized in arrays), small
null-terminated strings, bitmask and so on.

At first this approach was good, but now I'm finding many problems.
Mostly I have to keep the two applications synchronized: if I add a
parameter (maybe in the middle of the answer, because an array size
is increased) or enlarge a numerical values (from 1 to 2 bytes) or
something else, I have to change both applications. I usually want to
have a retro-compatible master application (master should communicate
well with new or older slaves application), so the code will be
filled with many ifs (if the slave answer version is 1.0 than there
is one parameter here, if it is 1.2 the array size is 3 and not
2...)

Is there a better approach to exchange data on the wire between two C
applications, considering the limitations of memory, the low speed
and the length of messages of a small embedded platform?


As to having multiple protocol versions:

- Have only new commands added. V1.2 of the protocol is a
proper superset of V1.1

- Use dispatch through callback tables to support different "shapes"
within the protocol.

- try to manage the protocol as a stack, as a separately compiled and
"libraried" unit with its own regression suite. The regression suite
might be able to spawn a server and a client who both use a shared
memory to emulate a serial port. This allows it to run on a
workstation, which might reduce the cost of maintenance.

- use a scripting language to build a test driver for each version,
in each role.
 
B

BartC

I have two C applications running on embedded platforms. Actually they
communicate through a serial connection (38400bps) with a proprietary
binary protocol.

The master application sends commands to the slave application requesting
its status or changing some settings. The answer to the status request is
a sequence of binary data: numerical values in 1 or 2 or 4 bytes (in some
cases, organized in arrays), small null-terminated strings, bitmask and so
on.

At first this approach was good, but now I'm finding many problems. Mostly
I have to keep the two applications synchronized: if I add a parameter
(maybe in the middle of the answer, because an array size is increased) or
enlarge a numerical values (from 1 to 2 bytes) or something else, I have
to change both applications. I usually want to have a retro-compatible
master application (master should communicate well with new or older
slaves application), so the code will be filled with many ifs (if the
slave answer version is 1.0 than there is one parameter here, if it is 1.2
the array size is 3 and not 2...)

So at present the format of the returned data is hard-coded? That is, the
number of items, and the format of each item (1, 2 or 4 bytes etc).

The first problem then is the format of each item. In this case just tag
each value: preceded it with a byte indicating whether it is 1, 2 or 4 bytes
(sometimes this can be done without needing a separate byte) and perhaps
whether it's a string etc.

This could work if the reader knows what sequence to expect, and perhaps the
type, if not the exact format, of each item.

But if the type of an item can change, or you add a new one in the middle,
or even at the end, then the reader might be able to receive it, but it will
no longer understand what the items mean!

Then you will need to add information about the meaning of each item (an
identifier) and it gets more complicated. You mentioned serialisation and
JSON. But it's difficult to give advice without knowing more details or the
limitations you have (speed-wise for example). (Personally I wouldn't use
3rd party solutions; they tend to have a big learning curve, add more
dependencies, and usually completely dwarf my entire application. But that
might be just me...)
 
E

Eric Sosman

Thank you for your suggestion. Anyway XDR seems to encode everything with a 4-bytes unit base. I have many small integer values and many bytes would be added without info.

I'd like to use a more compact serialization format and simple to implement in embedded applications.
BSON? JSON? Protocol Buffers? Other suggestions?

The fact that you're coding in C seems irrelevant to the
question, which is all about the wire format. You might get
better "outside the box" answers in a general programming
forum than you will here.

From your description, it looks like you have three concerns:
flexibility, density, and simplicity. I doubt there's any one
choice that gets close to all three corners of the triangle at
the same time, so you'll probably have to settle for something
that's a little bit rigid and/or bloated and/or complex -- and
it's up to you to figure out what the trade-offs are.

I've used protocol buffers (though not in C) and found them
flexible and reasonably dense. I don't know whether implementations
in C exist; if not, I think a full-fledged implementation might be
something of a challenge (protobuffs aren't fully polymorphic, but
they benefit a lot from some O-O capabilities). Still, perhaps
you could study protobuffs and write yourself a subsetted version
that would meet your needs without being too daunting.

If it were up to me (it's not, of course), I think I'd go
for XDR despite concerns about bloat. It can be pretty flexible,
and since support libraries already exist it scores well on
simplicity. However, I'd make a point of segregating the
encoding and decoding from the rest of the program, so I could
change my mind later. Once the implementation settles down I'd
imagine flexibility might become less important, and at that
point I could switch to a purpose-built high-density protocol.
But while things are still in flux, I don't think I'd worry so
much about the format density unless it's really awful.
 
G

glen herrmannsfeldt

(e-mail address removed) wrote:

(snip, I wrote)
Thank you for your suggestion. Anyway XDR seems to encode everything
with a 4-bytes unit base. I have many small integer values and many
bytes would be added without info.

I haven't looked at it so recently, but that sounds right for
scalars. For arrays, I thought it packed them more. I am not
sure at all by now for structs.

-- glen
 
N

Noob

pozzugno said:
I have two C applications running on embedded platforms. Actually
they communicate through a serial connection (38400bps) with a
proprietary binary protocol.

You could try posting in comp.arch.embedded, they might provide
some insight.
 
G

Greg Martin

(e-mail address removed) wrote:

(snip, I wrote)


I haven't looked at it so recently, but that sounds right for
scalars. For arrays, I thought it packed them more. I am not
sure at all by now for structs.

-- glen

AFAIK each member of an array is 4 bytes. Strings however are encoded
with single bytes preceded by a 4 byte length field and padded to a 4
byte boundary by 0 value bytes.
 
G

Greg Martin

Thank you for your suggestion. Anyway XDR seems to encode everything with a 4-bytes unit base. I have many small integer values and many bytes would be added without info.

I'd like to use a more compact serialization format and simple to implement in embedded applications.
BSON? JSON? Protocol Buffers? Other suggestions?

JSON has certain advantages. Since everything is text it is extensible
by interpretation. It's delimited by {} so boundaries are clear.
Interpreting fields is up to the protocol interpreter since the data
won't tell you its type, however it is reasonably easy to work out a
scheme of some sort.

e.g. {"int_array":[1,3,34,245],"string":"Hello, World","double":2.34}

It is certainly easy to come up with a more compact protocol and sane
updates to a specification are required in any case but it's easily
extensible and debugged and widely used.
 
B

Bart van Ingen Schenau

Is there a better approach to exchange data on the wire between two C
applications, considering the limitations of memory, the low speed and
the length of messages of a small embedded platform?

Yes, there are better possibilities.
The easiest way to get a dense, binary, flexible protocol is to define
the protocol elements as Type/Length/Value triplets.
This means that each parameter in a request or response message consists
of a Type-code identifying the parameter, a Length byte stating the
length (in bytes) of the following data and the Value of the parameter.

For full compatibility between revisions of the protocol, you need to
observe a few rules:
- The Type-code field must have a fixed length, otherwise an
implementation can't know how to skip unsupported parameters. You should
leave plenty of room for adding new Type-codes (reserve at least one byte
more than you actually need for the current set of parameters).
- The Length field must have a fixed width, otherwise an implementation
doesn't know where the next parameter after an unsupported one starts.
For this reason, the Length field must always be encoded as bytes.
- The Type-code must identify the parameter type and its encoding (if
multiple incompatible encodings are possible). You may cheat a bit with
the Length field to omit leading 0-bytes on a field that encodes a single
integer (and thus later extend a byte to a two- or four-byte value), but
the documentation should read something like: "Type-code 0x0042: Value of
register T1; 4 bytes".
- Type-codes can never be reused or changed. Once they are assigned a
meaning, that meaning is set in stone. If you find later that it does not
fit anymore (e.g. data structure is extended), use a new Type-code. If
both old and new receivers must be supported, it might be best to use the
new code only for the additional data.

The parsers for these protocols work like this:
1. Read sizeof(Type-code) bytes as Type-code
2. Read sizeof(Length) bytes as Length
3. Read Length bytes as Value
4. Call the handler function that was registered for the Type-code (if
any)
5. Repeat steps 1 to 4 until end of the message
6. (Optional) Signal the end of the message to the handlers.

Bart v Ingen Schenau
 
P

pozz

Il 20/12/2012 15:09, Eric Sosman ha scritto:
The fact that you're coding in C seems irrelevant to the
question, which is all about the wire format. You might get
better "outside the box" answers in a general programming
forum than you will here.

Yes, I'll make this question also on comp.arch.embedded. I asked here,
because I was trying to find some read-to-use C libraries. If they
doesn't exist, I'll try to write my own.

From your description, it looks like you have three concerns:
flexibility, density, and simplicity. I doubt there's any one
choice that gets close to all three corners of the triangle at
the same time, so you'll probably have to settle for something
that's a little bit rigid and/or bloated and/or complex -- and
it's up to you to figure out what the trade-offs are.

I've used protocol buffers (though not in C) and found them
flexible and reasonably dense. I don't know whether implementations
in C exist; if not, I think a full-fledged implementation might be
something of a challenge (protobuffs aren't fully polymorphic, but
they benefit a lot from some O-O capabilities). Still, perhaps
you could study protobuffs and write yourself a subsetted version
that would meet your needs without being too daunting.

It seems there are a couple of C implementations of protobuf, maybe
somewhat limited. Aside this porting problem, do you think protobuf
could be a good solution in my case?

If it were up to me (it's not, of course), I think I'd go
for XDR despite concerns about bloat. It can be pretty flexible,
and since support libraries already exist it scores well on
simplicity.

One thing I didn't understood (I'm sorry if I ask silly questions).
A XDR message is a sequence of parameters (integers, strings,
structures...). The receiver should know the exact sequence of
parameters generated by the transmitter. So, if a new parameter is
added in the middle of the message in a new version of transmitter can't
be correctly decoded by an old version of the receiver.

I think protobuf solve this problem with an ID associated to each
parameter.
 
P

pozz

Il 20/12/2012 18:15, Greg Martin ha scritto:
Thank you for your suggestion. Anyway XDR seems to encode everything
with a 4-bytes unit base. I have many small integer values and many
bytes would be added without info.

I'd like to use a more compact serialization format and simple to
implement in embedded applications.
BSON? JSON? Protocol Buffers? Other suggestions?

JSON has certain advantages. Since everything is text it is extensible
by interpretation. It's delimited by {} so boundaries are clear.
Interpreting fields is up to the protocol interpreter since the data
won't tell you its type, however it is reasonably easy to work out a
scheme of some sort.

e.g. {"int_array":[1,3,34,245],"string":"Hello, World","double":2.34}

It is certainly easy to come up with a more compact protocol and sane
updates to a specification are required in any case but it's easily
extensible and debugged and widely used.

I can't use strings to encode everything: the message size should
increase too much for my embedded platform.
 
I

Ian Collins

pozz said:
Il 20/12/2012 18:15, Greg Martin ha scritto:
JSON has certain advantages. Since everything is text it is extensible
by interpretation. It's delimited by {} so boundaries are clear.
Interpreting fields is up to the protocol interpreter since the data
won't tell you its type, however it is reasonably easy to work out a
scheme of some sort.

e.g. {"int_array":[1,3,34,245],"string":"Hello, World","double":2.34}

It is certainly easy to come up with a more compact protocol and sane
updates to a specification are required in any case but it's easily
extensible and debugged and widely used.

I can't use strings to encode everything: the message size should
increase too much for my embedded platform.

If you want the simplicity and flexibility of JSON but more compact
messages, look at BSON.
 
E

Eric Sosman

[...]
It seems there are a couple of C implementations of protobuf, maybe
somewhat limited. Aside this porting problem, do you think protobuf
could be a good solution in my case?

"Yes," or at least "Worth considering." The original design
goal of protocol buffers was to address two of what I think are
your three main concerns: flexibility and information density.
Simplicity got sacrificed (to some extent, anyhow) in favor of
the other two, but if you can find an existing C implementation --
or if you can use a mixture of C and C++ in your application --
the implementation complexity may not be that big a barrier.

Still and all: It's your call, not ours.
 
J

JimB

Shao said:
You could use sentinel values (like strings use the null terminator)
or send the array size before sending the array. - Shao

That is a lie. You don't want to rape (rape? what is rape? Ask Barack Obama,
cuz he said "rape is rape"). the ... but you go along. You don't want to
rape, but Society says, "take your turn".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top