When to stop - how do I work out when to stop processing


A

Angus

Hello

This is part coding and part protocol design. I am developing a very simple
protocol and working out how to parse out received data into messages. I
can process the stream of bytes just about but I think I need help in
design. My data is in tlv format and I might get 1 or many tlv's to parse.
I know the length of the entire byte stream.

This is what I have so far:
#include <stdio.h>

const char* getType(size_t type)
{
switch(type)
{
case 0: return "uint8_t";
case 1: return "byte array";
default: return "unknown type";
}
}

int main()
{
//message consists of 2 * tlv's - a 10 byte string and a 1 byte unsigned
char
//msg1=null terminated "123456789", msg2=8
unsigned char buffer[] =
{1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
int len = sizeof(buffer) / sizeof(buffer[0]);

unsigned char* mybytes = buffer;

//parse buffer received into tlv messages
for(size_t i = 0; i < len; ++i)
{
size_t tlvtype = *mybytes++;
int tlvsize = *mybytes++;
switch(tlvtype)
{
case 0: //unsigned 1 byte
{
char c = *mybytes++;
printf("%s data type: %c, size: %i\n", getType(tlvtype), c,
tlvsize);
break;
}
case 1: //string of bytes
{
char* data = (char*)malloc(tlvsize);
for(int i = 0; i < tlvsize; ++i)
*data++ = *mybytes++;

data -= tlvsize;
printf("%s data type: %s, size: %i\n", getType(tlvtype), data,
tlvsize);
free(data);
break;
}
}
}

return 0;
}

But the for loop is totally wrong. I have no end marker but I don't think I
need one. I want to basically work through the byte stream until all
received bytes are processed. I could just keep a counter of how many bytes
processed? Any suggestions on how to do this sort of thing efficiently?

Angus
 
Ad

Advertisements

B

Ben Bacarisse

Angus said:
This is part coding and part protocol design. I am developing a very simple
protocol and working out how to parse out received data into messages. I
can process the stream of bytes just about but I think I need help in
design. My data is in tlv format and I might get 1 or many tlv's to parse.
I know the length of the entire byte stream.
But the for loop is totally wrong. I have no end marker but I don't think I
need one. I want to basically work through the byte stream until all
received bytes are processed. I could just keep a counter of how many bytes
processed? Any suggestions on how to do this sort of thing
efficiently?

You are thinking bottom up -- throwing more and more low-level
operations together to get more and more complex programs. This method
won't get you to where you want to go. You need to think top-down.
What is ultimate goal? How can that be broken down by inventing
functions that do some useful part of the job? How can these be further
broken down?

It is unlikely that anyone can guess the right organisation for your
program because we don't know the main purpose. The most useful
organisation and structures will depend on whether you are going to
count the data items, or transform them, or store them, or combine
them... Without some sort of end in sight, programming well is
impossible. [I note in passing that one thing that experienced
programmers can do is program to poorly defined goals because they can
abstract out the common features of the most likely goals, and program
to these in way that won't interfere with the design when it is
finalised.]

One other point: don't worry about efficiency. I know it is important,
but I suspect you don't yet have the right intuition to be able to worry
about it this early. It will just get in the way. Worry about writing
the program in a clear and easy-to-understand way.

I notice that you did not take up the suggestion that was made -- to use
memcpy (or strcpy). Why?
 
T

Tim Rentsch

Angus said:
This is part coding and part protocol design. I am developing a very simple
protocol and working out how to parse out received data into messages. I
can process the stream of bytes just about but I think I need help in
design. My data is in tlv format and I might get 1 or many tlv's to parse.
I know the length of the entire byte stream.

This is what I have so far:
#include <stdio.h>

const char* getType(size_t type)
{
switch(type)
{
case 0: return "uint8_t";
case 1: return "byte array";
default: return "unknown type";
}
}

int main()
{
//message consists of 2 * tlv's - a 10 byte string and a 1 byte unsigned char
//msg1=null terminated "123456789", msg2=8
unsigned char buffer[] = {1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
int len = sizeof(buffer) / sizeof(buffer[0]);

unsigned char* mybytes = buffer;

//parse buffer received into tlv messages
for(size_t i = 0; i < len; ++i)
{
size_t tlvtype = *mybytes++;
int tlvsize = *mybytes++;
switch(tlvtype)
{
case 0: //unsigned 1 byte
{
char c = *mybytes++;
printf("%s data type: %c, size: %i\n", getType(tlvtype), c, tlvsize);
break;
}
case 1: //string of bytes
{
char* data = (char*)malloc(tlvsize);
for(int i = 0; i < tlvsize; ++i)
*data++ = *mybytes++;

data -= tlvsize;
printf("%s data type: %s, size: %i\n", getType(tlvtype), data, tlvsize);
free(data);
break;
}
}
}

return 0;
}

But the for loop is totally wrong. I have no end marker but I don't think I
need one. I want to basically work through the byte stream until all
received bytes are processed. I could just keep a counter of how many bytes
processed? Any suggestions on how to do this sort of thing efficiently?

This example code suffers from a common problem, in that it tries
to do too many things at once (more specifically, too many things
in a single function). Assuming you have decided to have 'types'
and 'lengths' be one byte each, and that an entire buffer is
available in memory for processing, a good first cut at a
top-level function might go something like this (disclaimer --
not compiled):

void
process_tlv_buffer( unsigned char *buffer, size_t total_bytes ){
size_t done = 0;

while( done + 2 <= total_bytes ){
unsigned type = buffer[done];
unsigned length = buffer[done+1];
process_one_tlv( type, length, buffer + done + 2 );
done += 2 + length;
}

if( done != total_bytes ){
/* error condition -- tlv buffer malformed */
}
}

This solves the high-level problem, and leaves only the
one function 'process_one_tlv()' left to do (and that
function should be straightforward to write).

Notes:

1. Some people may prefer a moving-pointer approach rather than
an indexing approach, expecting the former to be more efficient.
IME compilers are now good enough so the indexing approach will
be just as fast or faster than doing pointer manipulation.
However, if doing direct pointer manipulation is still desired,
it's easy to transform the above function into that, by adding
an extra variable for example.

2. If the 'type' is exactly one eight-bit byte long, we can fold
the type-dispatching into the 'process_one_tlv()' call by having
an array of function pointers, like this:

process_tlvs[type]( type, length, buffer + done + 2 )

with 'process_tlvs' being an array of pointer-to-function.
Calling through a function pointer array like this eliminates
the need for a 'switch()' type dispatch (but we still pass
the 'type' in case some value types are similar and might
benefit from having their processing functions folded into
a single function with tests on 'type' for differentiation).

3. If the 'type' values are sparse rather than dense, the array
approach can still be used by writing a function that converts a
'type' value to a dense "type index", with an extra "type index"
value used to mean "this type not known".

4. It should be obvious how to make the 'type' field or
'length' field have a width greater than one, if that is
desired.

5. A very different function is required if we cannot
assume that an entire TLV buffer is available in memory
for processing. This means the decision about whether
the whole TVL buffer will be available is an important
decision and should be made before trying to write code
for the lower-level processing.

6. To the last question about "how to do this sort of thing
efficiently", I repeat the standard mantra -- first make it work
in a clear and simple way, and do performance improvements second
(and then only if performance is a problem, measuring first to
see where the real problem is). The general rule is that one
should spend as little time as possible on low-level performance
issues, and only in subsequent phases after the program is
completely working. The simple function definition given
above will run plenty fast enough in 99+% of applications
that would use this kind of processing.


(Style suggestion: keep source lines under 70 characters
so that they don't get line-wrapped by news-processing
programs.)
 
J

Joachim Schmitz

Richard said:
Angus said:
Hello

This is part coding and part protocol design. I am developing a
very simple protocol and working out how to parse out received data
into messages. I can process the stream of bytes just about but I
think I need help in design. My data is in tlv format and I might
get 1 or many tlv's to parse. I know the length of the entire byte
stream. This is what I have so far:
#include <stdio.h>

const char* getType(size_t type)
{
switch(type)
{
case 0: return "uint8_t";
case 1: return "byte array";
default: return "unknown type";
}
}

int main()
{
//message consists of 2 * tlv's - a 10 byte string and a 1 byte
unsigned char
//msg1=null terminated "123456789", msg2=8
unsigned char buffer[] =
{1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
int len = sizeof(buffer) / sizeof(buffer[0]);

unsigned char* mybytes = buffer;

//parse buffer received into tlv messages
for(size_t i = 0; i < len; ++i)
{
size_t tlvtype = *mybytes++;
int tlvsize = *mybytes++;
switch(tlvtype)
{
case 0: //unsigned 1 byte
{
char c = *mybytes++;
printf("%s data type: %c, size: %i\n", getType(tlvtype),
c, tlvsize);
break;
}
case 1: //string of bytes
{
char* data = (char*)malloc(tlvsize);

Why are you casting?

Probably because he forgot to #include <stdlib.h> and therefor got a warning
from the compiler.

Bye, Jojo
 
D

Denis McMahon

But the for loop is totally wrong. I have no end marker but I don't think I
need one. I want to basically work through the byte stream until all
received bytes are processed. I could just keep a counter of how many bytes
processed? Any suggestions on how to do this sort of thing efficiently?

Assuming you know the total length, and tlv means type-length-value

Presumably:

type is 1 byte
length is 1 byte
value is length bytes

so for each tlv you process, you read 2 + length bytes?

so:

remainder = tlv_data_len;

while (remainder > 0) {
read tlv_type & tlv_len
switch (type) {
case blah: {do something; break}
}
remainder = remainder - (2 + tlvlen);
}

Note also that %c will display the character value, the character value
of 8 isn't a printable character in most character sets. Maybe changing
the '%c' for either '%#04x' or '0x%02x' would be better for a single
byte value?

Of course, if your code was working, it might confusing you because
you're expecting to see '8' and what you get is a non printing character?

Why do you call a function to tell you the data type when you're already
inside a switch that's based on the data type?

I did the following:

/* ========== code starts ========== */

#include <stdio.h>
#include <stdlib.h>

int main()
{
/* message consists of 2 * tlv's - a 10 byte string
and a 1 byte unsigned char */
/* msg1=null terminated "123456789", msg2=8 */
unsigned char buffer[] =
{1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
int len = sizeof(buffer) / sizeof(buffer[0]);
int remainder = len;
unsigned char* mybytes = buffer;
/* parse buffer received into tlv messages */
while (remainder > 0)
{
size_t tlvtype = *mybytes++;
int tlvsize = *mybytes++;
switch(tlvtype)
{
case 0: /* unsigned 1 byte */
{
char c = *mybytes++;
printf("uint8_t, value: 0x%02x, size: %i\n", c, tlvsize);
break;
}
case 1: /* string or array of bytes */
{
char* data = (char*)malloc(tlvsize);
int i;
for(i = 0; i < tlvsize; ++i) *data++ = *mybytes++;
data -= tlvsize;
printf("byte array, value: '%s', size: %i\n", data, tlvsize);
free(data);
break;
}
}
remainder = remainder - (2 + tlvsize);
}
return 0;
}

/* =========== code ends =========== */

I bet it's not perfect, but at least:

$ gcc -Wall -pedantic -Wextra test23.c -o t23
$ ./t23
byte array, value: '123456789', size: 10
uint8_t, value: 0x08, size: 1
$

Rgds

Denis McMahon
 
Ad

Advertisements

B

Ben Bacarisse

I did the following:

/* ========== code starts ========== */

#include <stdio.h>
#include <stdlib.h>

int main()
{
/* message consists of 2 * tlv's - a 10 byte string
and a 1 byte unsigned char */
/* msg1=null terminated "123456789", msg2=8 */
unsigned char buffer[] =
{1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
int len = sizeof(buffer) / sizeof(buffer[0]);
int remainder = len;
unsigned char* mybytes = buffer;
/* parse buffer received into tlv messages */
while (remainder > 0)
{
size_t tlvtype = *mybytes++;
int tlvsize = *mybytes++;
switch(tlvtype)
{
case 0: /* unsigned 1 byte */
{
char c = *mybytes++;
printf("uint8_t, value: 0x%02x, size: %i\n", c, tlvsize);
break;
}
case 1: /* string or array of bytes */
{
char* data = (char*)malloc(tlvsize);
int i;
for(i = 0; i < tlvsize; ++i) *data++ = *mybytes++;

What does everyone have against memcpy!
data -= tlvsize;
printf("byte array, value: '%s', size: %i\n", data, tlvsize);
free(data);
break;
}
}
remainder = remainder - (2 + tlvsize);

I accept that this is a sketch, but it's probably worth pointing out
that this way you don't detect errors. If there is not enough data,
remainder will go negative, but only after you've tried to access
non-existent bytes.

There are lots of ways to get round this. One could loop:

while (remainder > 2 && remainder >= 2 + mybytes[1]) { ... }

so that after the loop remainder > 0 indicates an error.

Another issue is that remainder - (2 + tlvsize) is of type size_t and
can't ever be negative. If there is not enough data, this expression
will evaluate to a large positive number. On most systems, setting an
int to this value does what one would like, but it might not do so.
Since remainder and i are both int and the size comes from a char,
size_t would seem to the wrong choice here.
}
return 0;
}

<snip>
 
Ad

Advertisements


Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top