Calculate length of byte string with embedded nulls

Discussion in 'C Programming' started by Angus, Jan 4, 2007.

  1. Angus

    Angus Guest

    Hello

    I have a stream of bytes - unsigned char*. But the 'string' may contain
    embedded nulls. So not like a traditional c string terminated with a null.

    I need to calculate the length of these arrays but can't use strlen because
    it just stops counting at the first null it finds. so how to do it?

    Angus
     
    Angus, Jan 4, 2007
    #1
    1. Advertising

  2. Angus

    jacob navia Guest

    Angus a écrit :
    > Hello
    >
    > I have a stream of bytes - unsigned char*. But the 'string' may contain
    > embedded nulls. So not like a traditional c string terminated with a null.
    >
    > I need to calculate the length of these arrays but can't use strlen because
    > it just stops counting at the first null it finds. so how to do it?
    >
    > Angus
    >
    >


    There is no way to do it since you have no algorithm to determine
    its length.
     
    jacob navia, Jan 4, 2007
    #2
    1. Advertising

  3. Angus wrote:
    > Hello
    >
    > I have a stream of bytes - unsigned char*. But the 'string' may contain
    > embedded nulls. So not like a traditional c string terminated with a null.
    >
    > I need to calculate the length of these arrays but can't use strlen because
    > it just stops counting at the first null it finds. so how to do it?


    If this stream is of a specific format and has the length embedded in
    it, you can extract it. How to do this depends on the format.
    Otherwise, if the length is not kept elsewhere, you need to keep track
    of it yourself.
     
    =?utf-8?B?SGFyYWxkIHZhbiBExLNr?=, Jan 4, 2007
    #3
  4. Angus said:

    > Hello
    >
    > I have a stream of bytes - unsigned char*. But the 'string' may contain
    > embedded nulls. So not like a traditional c string terminated with a
    > null.
    >
    > I need to calculate the length of these arrays but can't use strlen
    > because
    > it just stops counting at the first null it finds. so how to do it?


    Well, now you know what null is for. :)

    Whenever you read data, you need to establish a protocol for stopping. If
    you're reading a text file, typically you stop (or at least pause for
    thought) when you hit a newline. If you're reading an email feed, you stop
    when you get ".\r\n". If you're copying a string, you stop at the null
    terminator. All of these are termination protocols.

    Clearly, you need a terminating protocol, too. If no particular value ('\0',
    '\n') or combination of values (".\r\n") suggests itself as a sentinel,
    then you have little option but to insist that your data feed is
    accompanied by relevant information regarding its length.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Jan 4, 2007
    #4
  5. Angus

    santosh Guest

    Angus wrote:
    > Hello
    >
    > I have a stream of bytes - unsigned char*. But the 'string' may contain
    > embedded nulls. So not like a traditional c string terminated with a null.
    >
    > I need to calculate the length of these arrays but can't use strlen because
    > it just stops counting at the first null it finds. so how to do it?


    Without a condition for termination, there's no way to determine the
    end of the stream. As the programmer of the application you should be
    knowing this condition. If the array is passed in from a third-party
    library, they ought to have documented the same. If both are false,
    then your code is broken.
     
    santosh, Jan 4, 2007
    #5
  6. Angus skrev:
    > Hello
    >
    > I have a stream of bytes - unsigned char*. But the 'string' may contain
    > embedded nulls. So not like a traditional c string terminated with a null.
    >
    > I need to calculate the length of these arrays but can't use strlen because
    > it just stops counting at the first null it finds. so how to do it?


    Just keep track of the number of characters you store in the buffer and
    pass that value along with the buffer.


    August
     
    August Karlstrom, Jan 4, 2007
    #6
  7. "Angus" <> wrote in message
    news:enjcu8$qj9$1$...
    >
    > I have a stream of bytes - unsigned char*. But the 'string' may contain
    > embedded nulls. So not like a traditional c string terminated with a
    > null.
    >
    > I need to calculate the length of these arrays but can't use strlen
    > because
    > it just stops counting at the first null it finds. so how to do it?


    As other posters have indicated, the assumption of \0 termination is "baked
    into" much of the 'C' programming language.

    I believe this type of string (an array of characters where each character
    may contain any value without restriction) is called a "binary string" in
    other languages.

    The standard 'C' library functions won't work on this type of string.

    You could keep track of the length separately from the string.

    A second approach is to use an encoding for the string to represent the data
    without using \0. The most obvious way to do this is to encode the bytes as
    hexadecimal characters, i.e. \0 would be represented as '0' followed by
    another '0'. That keeps everything simple, as the length of this kind of
    string is double the length of the data. And all the 'C' library functions
    will work.
     
    David T. Ashley, Jan 4, 2007
    #7
  8. David T. Ashley said:

    <snip>
    >
    > I believe this type of string (an array of characters where each character
    > may contain any value without restriction) is called a "binary string" in
    > other languages.
    >
    > The standard 'C' library functions won't work on this type of string.


    memcpy, memset, memmove, memchr, memcmp, fread, fwrite, qsort, bsearch are
    all counter-examples.

    > You could keep track of the length separately from the string.


    That is necessary if no sentinel is given.

    > A second approach is to use an encoding for the string to represent the
    > data
    > without using \0. The most obvious way to do this is to encode the bytes
    > as hexadecimal characters, i.e. \0 would be represented as '0' followed by
    > another '0'. That keeps everything simple, as the length of this kind of
    > string is double the length of the data. And all the 'C' library
    > functions will work.


    Base-64 encoding would work, too, and wouldn't be quite so noisy. But it's
    better by far to keep track of the size.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Jan 4, 2007
    #8
  9. >>>>> "DTA" == David T Ashley <> writes:

    DTA> As other posters have indicated, the assumption of \0
    DTA> termination is "baked into" much of the 'C' programming
    DTA> language.

    Much of the standard library, you mean.

    DTA> The standard 'C' library functions won't work on this type of
    DTA> string.

    But it's a simple matter of programming to implement your own
    functions to do this, or to use a library someone else has written.

    DTA> You could keep track of the length separately from the
    DTA> string.

    This is pretty much exactly what you have to do, unless you use
    another marker to indicate end-of-string.

    Charlton




    --
    Charlton Wilbur
     
    Charlton Wilbur, Jan 4, 2007
    #9
  10. Angus

    pete Guest

    Angus wrote:
    >
    > Hello
    >
    > I have a stream of bytes - unsigned char*.


    If it's a text stream,
    then I suspect that you may be wanting to calculate
    the length of the "line" rather than the length of a string.
    Lines of text are terminated by a newline character ('\n').
    The way to find the length of the line is to do it
    while the line is being read.

    > But the 'string' may contain embedded nulls.
    > So not like a traditional c string terminated with a null.
    >
    > I need to calculate the length of these arrays
    > but can't use strlen because
    > it just stops counting at the first null it finds. so how to do it?



    --
    pete
     
    pete, Jan 4, 2007
    #10
  11. Angus

    bert Guest

    Angus wrote:
    > Hello
    >
    > I have a stream of bytes - unsigned char*. But the 'string' may contain
    > embedded nulls. So not like a traditional c string terminated with a null.
    >
    > I need to calculate the length of these arrays but can't use strlen because
    > it just stops counting at the first null it finds. so how to do it?


    As other posters have said, you have to know what
    bytes actually represent the end of the array, then
    write your own code to search the array to locate them.

    The only time that I encountered such an array,
    its rule was that a single embedded null was part
    of it, but two adjacent nulls were its terminator.
    --
     
    bert, Jan 4, 2007
    #11
  12. Charlton Wilbur <> writes:
    >>>>>> "DTA" == David T Ashley <> writes:

    >
    > DTA> As other posters have indicated, the assumption of \0
    > DTA> termination is "baked into" much of the 'C' programming
    > DTA> language.
    >
    > Much of the standard library, you mean.


    And the treatment of string literals.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
     
    Keith Thompson, Jan 4, 2007
    #12
  13. bert said:

    >
    > Angus wrote:
    >> Hello
    >>
    >> I have a stream of bytes - unsigned char*. But the 'string' may contain
    >> embedded nulls. So not like a traditional c string terminated with a
    >> null.
    >>
    >> I need to calculate the length of these arrays but can't use strlen
    >> because
    >> it just stops counting at the first null it finds. so how to do it?

    >
    > As other posters have said, you have to know what
    > bytes actually represent the end of the array, then
    > write your own code to search the array to locate them.
    >
    > The only time that I encountered such an array,
    > its rule was that a single embedded null was part
    > of it, but two adjacent nulls were its terminator.


    The problem with such a scheme is that it renders impossible the in-band
    representation of two consecutive null bytes. One way around this would be
    to use the null character as an escape character, with a subsequent '0'
    character representing a null byte, but a subsequent null character
    representing the end of the data.

    Of course, if you're going to do that, you might as well use some other
    character to represent the escape character (e.g. '\\'), with '\\' '\\'
    representing backslash, '\\' '0' representing the null byte, and a genuine
    null byte representing the end of the data. This does, however, render it
    necessary to translate the escape sequences.

    All in all, it is a better scheme by far simply to provide the length
    information in advance of, or in parallel with, the data, thus rendering
    translation unnecessary.

    --
    Richard Heathfield
    "Usenet is a strange place" - dmr 29/7/1999
    http://www.cpax.org.uk
    email: rjh at the above domain, - www.
     
    Richard Heathfield, Jan 4, 2007
    #13
  14. Angus

    CBFalconer Guest

    Richard Heathfield wrote:
    > Angus said:
    >
    >> I have a stream of bytes - unsigned char*. But the 'string' may
    >> contain embedded nulls. So not like a traditional c string
    >> terminated with a null.
    >>
    >> I need to calculate the length of these arrays but can't use
    >> strlen because it just stops counting at the first null it finds.
    >> so how to do it?

    >
    > Well, now you know what null is for. :)
    >
    > Whenever you read data, you need to establish a protocol for
    > stopping. If you're reading a text file, typically you stop (or at
    > least pause for thought) when you hit a newline. If you're reading
    > an email feed, you stop when you get ".\r\n". If you're copying a
    > string, you stop at the null terminator. All of these are
    > termination protocols.
    >
    > Clearly, you need a terminating protocol, too. If no particular
    > value ('\0', '\n') or combination of values (".\r\n") suggests
    > itself as a sentinel, then you have little option but to insist
    > that your data feed is accompanied by relevant information
    > regarding its length.


    However a special case is exemplified by:

    char foobar[] = "foo\0bar\0gup\0etc";
    ...
    fwrite(foobar, 1, sizeof(foobar), f);

    --
    Chuck F (cbfalconer at maineline dot net)
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net>
     
    CBFalconer, Jan 5, 2007
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mitchua
    Replies:
    5
    Views:
    2,834
    Eric J. Roode
    Jul 17, 2003
  2. Sam
    Replies:
    3
    Views:
    14,205
    Karl Seguin
    Feb 17, 2005
  3. Replies:
    5
    Views:
    692
    John W. Kennedy
    Jan 11, 2007
  4. Replies:
    6
    Views:
    986
  5. kackson

    Calculate content-length

    kackson, Apr 26, 2004, in forum: ASP .Net Web Services
    Replies:
    3
    Views:
    325
    Jan Tielens
    Apr 26, 2004
Loading...

Share This Page