Calculate length of byte string with embedded nulls

A

Angus

Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Angus
 
J

jacob navia

Angus a écrit :
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Angus

There is no way to do it since you have no algorithm to determine
its length.
 
G

Guest

Angus said:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

If this stream is of a specific format and has the length embedded in
it, you can extract it. How to do this depends on the format.
Otherwise, if the length is not kept elsewhere, you need to keep track
of it yourself.
 
R

Richard Heathfield

Angus said:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?

Well, now you know what null is for. :)

Whenever you read data, you need to establish a protocol for stopping. If
you're reading a text file, typically you stop (or at least pause for
thought) when you hit a newline. If you're reading an email feed, you stop
when you get ".\r\n". If you're copying a string, you stop at the null
terminator. All of these are termination protocols.

Clearly, you need a terminating protocol, too. If no particular value ('\0',
'\n') or combination of values (".\r\n") suggests itself as a sentinel,
then you have little option but to insist that your data feed is
accompanied by relevant information regarding its length.
 
S

santosh

Angus said:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Without a condition for termination, there's no way to determine the
end of the stream. As the programmer of the application you should be
knowing this condition. If the array is passed in from a third-party
library, they ought to have documented the same. If both are false,
then your code is broken.
 
A

August Karlstrom

Angus skrev:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

Just keep track of the number of characters you store in the buffer and
pass that value along with the buffer.


August
 
D

David T. Ashley

Angus said:
I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a
null.

I need to calculate the length of these arrays but can't use strlen
because
it just stops counting at the first null it finds. so how to do it?

As other posters have indicated, the assumption of \0 termination is "baked
into" much of the 'C' programming language.

I believe this type of string (an array of characters where each character
may contain any value without restriction) is called a "binary string" in
other languages.

The standard 'C' library functions won't work on this type of string.

You could keep track of the length separately from the string.

A second approach is to use an encoding for the string to represent the data
without using \0. The most obvious way to do this is to encode the bytes as
hexadecimal characters, i.e. \0 would be represented as '0' followed by
another '0'. That keeps everything simple, as the length of this kind of
string is double the length of the data. And all the 'C' library functions
will work.
 
R

Richard Heathfield

David T. Ashley said:

I believe this type of string (an array of characters where each character
may contain any value without restriction) is called a "binary string" in
other languages.

The standard 'C' library functions won't work on this type of string.

memcpy, memset, memmove, memchr, memcmp, fread, fwrite, qsort, bsearch are
all counter-examples.
You could keep track of the length separately from the string.

That is necessary if no sentinel is given.
A second approach is to use an encoding for the string to represent the
data
without using \0. The most obvious way to do this is to encode the bytes
as hexadecimal characters, i.e. \0 would be represented as '0' followed by
another '0'. That keeps everything simple, as the length of this kind of
string is double the length of the data. And all the 'C' library
functions will work.

Base-64 encoding would work, too, and wouldn't be quite so noisy. But it's
better by far to keep track of the size.
 
C

Charlton Wilbur

DTA> As other posters have indicated, the assumption of \0
DTA> termination is "baked into" much of the 'C' programming
DTA> language.

Much of the standard library, you mean.

DTA> The standard 'C' library functions won't work on this type of
DTA> string.

But it's a simple matter of programming to implement your own
functions to do this, or to use a library someone else has written.

DTA> You could keep track of the length separately from the
DTA> string.

This is pretty much exactly what you have to do, unless you use
another marker to indicate end-of-string.

Charlton
 
P

pete

Angus said:
Hello

I have a stream of bytes - unsigned char*.

If it's a text stream,
then I suspect that you may be wanting to calculate
the length of the "line" rather than the length of a string.
Lines of text are terminated by a newline character ('\n').
The way to find the length of the line is to do it
while the line is being read.
 
B

bert

Angus said:
Hello

I have a stream of bytes - unsigned char*. But the 'string' may contain
embedded nulls. So not like a traditional c string terminated with a null.

I need to calculate the length of these arrays but can't use strlen because
it just stops counting at the first null it finds. so how to do it?

As other posters have said, you have to know what
bytes actually represent the end of the array, then
write your own code to search the array to locate them.

The only time that I encountered such an array,
its rule was that a single embedded null was part
of it, but two adjacent nulls were its terminator.
--
 
K

Keith Thompson

Charlton Wilbur said:
DTA> As other posters have indicated, the assumption of \0
DTA> termination is "baked into" much of the 'C' programming
DTA> language.

Much of the standard library, you mean.

And the treatment of string literals.
 
R

Richard Heathfield

bert said:
As other posters have said, you have to know what
bytes actually represent the end of the array, then
write your own code to search the array to locate them.

The only time that I encountered such an array,
its rule was that a single embedded null was part
of it, but two adjacent nulls were its terminator.

The problem with such a scheme is that it renders impossible the in-band
representation of two consecutive null bytes. One way around this would be
to use the null character as an escape character, with a subsequent '0'
character representing a null byte, but a subsequent null character
representing the end of the data.

Of course, if you're going to do that, you might as well use some other
character to represent the escape character (e.g. '\\'), with '\\' '\\'
representing backslash, '\\' '0' representing the null byte, and a genuine
null byte representing the end of the data. This does, however, render it
necessary to translate the escape sequences.

All in all, it is a better scheme by far simply to provide the length
information in advance of, or in parallel with, the data, thus rendering
translation unnecessary.
 
C

CBFalconer

Richard said:
Angus said:


Well, now you know what null is for. :)

Whenever you read data, you need to establish a protocol for
stopping. If you're reading a text file, typically you stop (or at
least pause for thought) when you hit a newline. If you're reading
an email feed, you stop when you get ".\r\n". If you're copying a
string, you stop at the null terminator. All of these are
termination protocols.

Clearly, you need a terminating protocol, too. If no particular
value ('\0', '\n') or combination of values (".\r\n") suggests
itself as a sentinel, then you have little option but to insist
that your data feed is accompanied by relevant information
regarding its length.

However a special case is exemplified by:

char foobar[] = "foo\0bar\0gup\0etc";
...
fwrite(foobar, 1, sizeof(foobar), f);
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top