When to stop - how do I work out when to stop processing

Discussion in 'C Programming' started by Angus, Jul 18, 2010.

  1. Angus

    Angus Guest

    Hello

    This is part coding and part protocol design. I am developing a very simple
    protocol and working out how to parse out received data into messages. I
    can process the stream of bytes just about but I think I need help in
    design. My data is in tlv format and I might get 1 or many tlv's to parse.
    I know the length of the entire byte stream.

    This is what I have so far:
    #include <stdio.h>

    const char* getType(size_t type)
    {
    switch(type)
    {
    case 0: return "uint8_t";
    case 1: return "byte array";
    default: return "unknown type";
    }
    }

    int main()
    {
    //message consists of 2 * tlv's - a 10 byte string and a 1 byte unsigned
    char
    //msg1=null terminated "123456789", msg2=8
    unsigned char buffer[] =
    {1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
    int len = sizeof(buffer) / sizeof(buffer[0]);

    unsigned char* mybytes = buffer;

    //parse buffer received into tlv messages
    for(size_t i = 0; i < len; ++i)
    {
    size_t tlvtype = *mybytes++;
    int tlvsize = *mybytes++;
    switch(tlvtype)
    {
    case 0: //unsigned 1 byte
    {
    char c = *mybytes++;
    printf("%s data type: %c, size: %i\n", getType(tlvtype), c,
    tlvsize);
    break;
    }
    case 1: //string of bytes
    {
    char* data = (char*)malloc(tlvsize);
    for(int i = 0; i < tlvsize; ++i)
    *data++ = *mybytes++;

    data -= tlvsize;
    printf("%s data type: %s, size: %i\n", getType(tlvtype), data,
    tlvsize);
    free(data);
    break;
    }
    }
    }

    return 0;
    }

    But the for loop is totally wrong. I have no end marker but I don't think I
    need one. I want to basically work through the byte stream until all
    received bytes are processed. I could just keep a counter of how many bytes
    processed? Any suggestions on how to do this sort of thing efficiently?

    Angus
     
    Angus, Jul 18, 2010
    #1
    1. Advertising

  2. "Angus" <> writes:

    > This is part coding and part protocol design. I am developing a very simple
    > protocol and working out how to parse out received data into messages. I
    > can process the stream of bytes just about but I think I need help in
    > design. My data is in tlv format and I might get 1 or many tlv's to parse.
    > I know the length of the entire byte stream.

    <snip code>
    > But the for loop is totally wrong. I have no end marker but I don't think I
    > need one. I want to basically work through the byte stream until all
    > received bytes are processed. I could just keep a counter of how many bytes
    > processed? Any suggestions on how to do this sort of thing
    > efficiently?


    You are thinking bottom up -- throwing more and more low-level
    operations together to get more and more complex programs. This method
    won't get you to where you want to go. You need to think top-down.
    What is ultimate goal? How can that be broken down by inventing
    functions that do some useful part of the job? How can these be further
    broken down?

    It is unlikely that anyone can guess the right organisation for your
    program because we don't know the main purpose. The most useful
    organisation and structures will depend on whether you are going to
    count the data items, or transform them, or store them, or combine
    them... Without some sort of end in sight, programming well is
    impossible. [I note in passing that one thing that experienced
    programmers can do is program to poorly defined goals because they can
    abstract out the common features of the most likely goals, and program
    to these in way that won't interfere with the design when it is
    finalised.]

    One other point: don't worry about efficiency. I know it is important,
    but I suspect you don't yet have the right intuition to be able to worry
    about it this early. It will just get in the way. Worry about writing
    the program in a clear and easy-to-understand way.

    I notice that you did not take up the suggestion that was made -- to use
    memcpy (or strcpy). Why?

    --
    Ben.
     
    Ben Bacarisse, Jul 18, 2010
    #2
    1. Advertising

  3. Angus

    Tim Rentsch Guest

    "Angus" <> writes:

    > This is part coding and part protocol design. I am developing a very simple
    > protocol and working out how to parse out received data into messages. I
    > can process the stream of bytes just about but I think I need help in
    > design. My data is in tlv format and I might get 1 or many tlv's to parse.
    > I know the length of the entire byte stream.
    >
    > This is what I have so far:
    > #include <stdio.h>
    >
    > const char* getType(size_t type)
    > {
    > switch(type)
    > {
    > case 0: return "uint8_t";
    > case 1: return "byte array";
    > default: return "unknown type";
    > }
    > }
    >
    > int main()
    > {
    > //message consists of 2 * tlv's - a 10 byte string and a 1 byte unsigned char
    > //msg1=null terminated "123456789", msg2=8
    > unsigned char buffer[] = {1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
    > int len = sizeof(buffer) / sizeof(buffer[0]);
    >
    > unsigned char* mybytes = buffer;
    >
    > //parse buffer received into tlv messages
    > for(size_t i = 0; i < len; ++i)
    > {
    > size_t tlvtype = *mybytes++;
    > int tlvsize = *mybytes++;
    > switch(tlvtype)
    > {
    > case 0: //unsigned 1 byte
    > {
    > char c = *mybytes++;
    > printf("%s data type: %c, size: %i\n", getType(tlvtype), c, tlvsize);
    > break;
    > }
    > case 1: //string of bytes
    > {
    > char* data = (char*)malloc(tlvsize);
    > for(int i = 0; i < tlvsize; ++i)
    > *data++ = *mybytes++;
    >
    > data -= tlvsize;
    > printf("%s data type: %s, size: %i\n", getType(tlvtype), data, tlvsize);
    > free(data);
    > break;
    > }
    > }
    > }
    >
    > return 0;
    > }
    >
    > But the for loop is totally wrong. I have no end marker but I don't think I
    > need one. I want to basically work through the byte stream until all
    > received bytes are processed. I could just keep a counter of how many bytes
    > processed? Any suggestions on how to do this sort of thing efficiently?


    This example code suffers from a common problem, in that it tries
    to do too many things at once (more specifically, too many things
    in a single function). Assuming you have decided to have 'types'
    and 'lengths' be one byte each, and that an entire buffer is
    available in memory for processing, a good first cut at a
    top-level function might go something like this (disclaimer --
    not compiled):

    void
    process_tlv_buffer( unsigned char *buffer, size_t total_bytes ){
    size_t done = 0;

    while( done + 2 <= total_bytes ){
    unsigned type = buffer[done];
    unsigned length = buffer[done+1];
    process_one_tlv( type, length, buffer + done + 2 );
    done += 2 + length;
    }

    if( done != total_bytes ){
    /* error condition -- tlv buffer malformed */
    }
    }

    This solves the high-level problem, and leaves only the
    one function 'process_one_tlv()' left to do (and that
    function should be straightforward to write).

    Notes:

    1. Some people may prefer a moving-pointer approach rather than
    an indexing approach, expecting the former to be more efficient.
    IME compilers are now good enough so the indexing approach will
    be just as fast or faster than doing pointer manipulation.
    However, if doing direct pointer manipulation is still desired,
    it's easy to transform the above function into that, by adding
    an extra variable for example.

    2. If the 'type' is exactly one eight-bit byte long, we can fold
    the type-dispatching into the 'process_one_tlv()' call by having
    an array of function pointers, like this:

    process_tlvs[type]( type, length, buffer + done + 2 )

    with 'process_tlvs' being an array of pointer-to-function.
    Calling through a function pointer array like this eliminates
    the need for a 'switch()' type dispatch (but we still pass
    the 'type' in case some value types are similar and might
    benefit from having their processing functions folded into
    a single function with tests on 'type' for differentiation).

    3. If the 'type' values are sparse rather than dense, the array
    approach can still be used by writing a function that converts a
    'type' value to a dense "type index", with an extra "type index"
    value used to mean "this type not known".

    4. It should be obvious how to make the 'type' field or
    'length' field have a width greater than one, if that is
    desired.

    5. A very different function is required if we cannot
    assume that an entire TLV buffer is available in memory
    for processing. This means the decision about whether
    the whole TVL buffer will be available is an important
    decision and should be made before trying to write code
    for the lower-level processing.

    6. To the last question about "how to do this sort of thing
    efficiently", I repeat the standard mantra -- first make it work
    in a clear and simple way, and do performance improvements second
    (and then only if performance is a problem, measuring first to
    see where the real problem is). The general rule is that one
    should spend as little time as possible on low-level performance
    issues, and only in subsequent phases after the program is
    completely working. The simple function definition given
    above will run plenty fast enough in 99+% of applications
    that would use this kind of processing.


    (Style suggestion: keep source lines under 70 characters
    so that they don't get line-wrapped by news-processing
    programs.)
     
    Tim Rentsch, Jul 18, 2010
    #3
  4. Richard Heathfield wrote:
    > Angus wrote:
    >> Hello
    >>
    >> This is part coding and part protocol design. I am developing a
    >> very simple protocol and working out how to parse out received data
    >> into messages. I can process the stream of bytes just about but I
    >> think I need help in design. My data is in tlv format and I might
    >> get 1 or many tlv's to parse. I know the length of the entire byte
    >> stream. This is what I have so far:
    >> #include <stdio.h>
    >>
    >> const char* getType(size_t type)
    >> {
    >> switch(type)
    >> {
    >> case 0: return "uint8_t";
    >> case 1: return "byte array";
    >> default: return "unknown type";
    >> }
    >> }
    >>
    >> int main()
    >> {
    >> //message consists of 2 * tlv's - a 10 byte string and a 1 byte
    >> unsigned char
    >> //msg1=null terminated "123456789", msg2=8
    >> unsigned char buffer[] =
    >> {1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
    >> int len = sizeof(buffer) / sizeof(buffer[0]);
    >>
    >> unsigned char* mybytes = buffer;
    >>
    >> //parse buffer received into tlv messages
    >> for(size_t i = 0; i < len; ++i)
    >> {
    >> size_t tlvtype = *mybytes++;
    >> int tlvsize = *mybytes++;
    >> switch(tlvtype)
    >> {
    >> case 0: //unsigned 1 byte
    >> {
    >> char c = *mybytes++;
    >> printf("%s data type: %c, size: %i\n", getType(tlvtype),
    >> c, tlvsize);
    >> break;
    >> }
    >> case 1: //string of bytes
    >> {
    >> char* data = (char*)malloc(tlvsize);

    >
    > Why are you casting?


    Probably because he forgot to #include <stdlib.h> and therefor got a warning
    from the compiler.

    Bye, Jojo
     
    Joachim Schmitz, Jul 18, 2010
    #4
  5. On 18/07/10 11:26, Angus wrote:

    > But the for loop is totally wrong. I have no end marker but I don't think I
    > need one. I want to basically work through the byte stream until all
    > received bytes are processed. I could just keep a counter of how many bytes
    > processed? Any suggestions on how to do this sort of thing efficiently?


    Assuming you know the total length, and tlv means type-length-value

    Presumably:

    type is 1 byte
    length is 1 byte
    value is length bytes

    so for each tlv you process, you read 2 + length bytes?

    so:

    remainder = tlv_data_len;

    while (remainder > 0) {
    read tlv_type & tlv_len
    switch (type) {
    case blah: {do something; break}
    }
    remainder = remainder - (2 + tlvlen);
    }

    Note also that %c will display the character value, the character value
    of 8 isn't a printable character in most character sets. Maybe changing
    the '%c' for either '%#04x' or '0x%02x' would be better for a single
    byte value?

    Of course, if your code was working, it might confusing you because
    you're expecting to see '8' and what you get is a non printing character?

    Why do you call a function to tell you the data type when you're already
    inside a switch that's based on the data type?

    I did the following:

    /* ========== code starts ========== */

    #include <stdio.h>
    #include <stdlib.h>

    int main()
    {
    /* message consists of 2 * tlv's - a 10 byte string
    and a 1 byte unsigned char */
    /* msg1=null terminated "123456789", msg2=8 */
    unsigned char buffer[] =
    {1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
    int len = sizeof(buffer) / sizeof(buffer[0]);
    int remainder = len;
    unsigned char* mybytes = buffer;
    /* parse buffer received into tlv messages */
    while (remainder > 0)
    {
    size_t tlvtype = *mybytes++;
    int tlvsize = *mybytes++;
    switch(tlvtype)
    {
    case 0: /* unsigned 1 byte */
    {
    char c = *mybytes++;
    printf("uint8_t, value: 0x%02x, size: %i\n", c, tlvsize);
    break;
    }
    case 1: /* string or array of bytes */
    {
    char* data = (char*)malloc(tlvsize);
    int i;
    for(i = 0; i < tlvsize; ++i) *data++ = *mybytes++;
    data -= tlvsize;
    printf("byte array, value: '%s', size: %i\n", data, tlvsize);
    free(data);
    break;
    }
    }
    remainder = remainder - (2 + tlvsize);
    }
    return 0;
    }

    /* =========== code ends =========== */

    I bet it's not perfect, but at least:

    $ gcc -Wall -pedantic -Wextra test23.c -o t23
    $ ./t23
    byte array, value: '123456789', size: 10
    uint8_t, value: 0x08, size: 1
    $

    Rgds

    Denis McMahon
     
    Denis McMahon, Jul 18, 2010
    #5
  6. Denis McMahon <> writes:

    > On 18/07/10 11:26, Angus wrote:
    >
    >> But the for loop is totally wrong. I have no end marker but I don't think I
    >> need one. I want to basically work through the byte stream until all
    >> received bytes are processed. I could just keep a counter of how many bytes
    >> processed? Any suggestions on how to do this sort of thing efficiently?

    <snip>
    > I did the following:
    >
    > /* ========== code starts ========== */
    >
    > #include <stdio.h>
    > #include <stdlib.h>
    >
    > int main()
    > {
    > /* message consists of 2 * tlv's - a 10 byte string
    > and a 1 byte unsigned char */
    > /* msg1=null terminated "123456789", msg2=8 */
    > unsigned char buffer[] =
    > {1,10,'1','2','3','4','5','6','7','8','9',0,0,1,8};
    > int len = sizeof(buffer) / sizeof(buffer[0]);
    > int remainder = len;
    > unsigned char* mybytes = buffer;
    > /* parse buffer received into tlv messages */
    > while (remainder > 0)
    > {
    > size_t tlvtype = *mybytes++;
    > int tlvsize = *mybytes++;
    > switch(tlvtype)
    > {
    > case 0: /* unsigned 1 byte */
    > {
    > char c = *mybytes++;
    > printf("uint8_t, value: 0x%02x, size: %i\n", c, tlvsize);
    > break;
    > }
    > case 1: /* string or array of bytes */
    > {
    > char* data = (char*)malloc(tlvsize);
    > int i;
    > for(i = 0; i < tlvsize; ++i) *data++ = *mybytes++;


    What does everyone have against memcpy!

    > data -= tlvsize;
    > printf("byte array, value: '%s', size: %i\n", data, tlvsize);
    > free(data);
    > break;
    > }
    > }
    > remainder = remainder - (2 + tlvsize);


    I accept that this is a sketch, but it's probably worth pointing out
    that this way you don't detect errors. If there is not enough data,
    remainder will go negative, but only after you've tried to access
    non-existent bytes.

    There are lots of ways to get round this. One could loop:

    while (remainder > 2 && remainder >= 2 + mybytes[1]) { ... }

    so that after the loop remainder > 0 indicates an error.

    Another issue is that remainder - (2 + tlvsize) is of type size_t and
    can't ever be negative. If there is not enough data, this expression
    will evaluate to a large positive number. On most systems, setting an
    int to this value does what one would like, but it might not do so.
    Since remainder and i are both int and the size comes from a char,
    size_t would seem to the wrong choice here.

    > }
    > return 0;
    > }


    <snip>
    --
    Ben.
     
    Ben Bacarisse, Jul 18, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Henry
    Replies:
    5
    Views:
    501
    Jim Cheshire [MSFT]
    Jul 26, 2004
  2. =?Utf-8?B?cHJha2FzaHNoYQ==?=

    Reg: Stop processing when esc key is pressed

    =?Utf-8?B?cHJha2FzaHNoYQ==?=, Nov 30, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    718
    Eliyahu Goldin
    Nov 30, 2004
  3. =?Utf-8?B?T2xlZw==?=

    how to stop page processing on server

    =?Utf-8?B?T2xlZw==?=, Nov 18, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    1,360
    =?Utf-8?B?T2xlZw==?=
    Nov 18, 2005
  4. GS
    Replies:
    8
    Views:
    696
    Karl Seguin [MVP]
    Jan 8, 2006
  5. Hubert Hung-Hsien Chang
    Replies:
    2
    Views:
    533
    Michael Foord
    Sep 17, 2004
Loading...

Share This Page