Null terminated strings: bad or good?

Keith Thompson · Jan 7, 2009

Golden California Girls said:
I find it just as unreasonable that a valid character can't be included in a
string. A true perversion if there ever was one.

Do you mean the null character?

In practice, the need to store a null character in a string seems to
be rare. People have been using null-terminated string formats for
decades; C didn't originate the idea. Certainly being able to store
embedded null characters is an advantage, but I don't think the lack
of this ability is a "true perversion".

Kenny McCormack · Jan 7, 2009

Richard said:
adopters then all well and good. But he isn't. I mean, hell, we ALL make
mistakes. However, he is here to bully, provoke and generally piss
people off with his arrogant and, at times, nonsensical ramblings and
obvious lack of real practical C in the real world.

You have a problem with that?

Tony · Jan 8, 2009

CBFalconer said:
None of that appears in C. The writing from a string can be done
by:

int wrtstring(char *s, FILE *f) {
char ch;
int err;

while (ch = *s++) {
if (EOF == (err = putc(ch, f))) break;
}
return err;
}

Note that putc can be a macro, and thus can use the file system
buffers directly. This can make the overall writing very
efficient. However it must not have side effects on the
arguments. Thus the ch is required. If no error occurs wrtstring
returns the last char written.

The above wrtstring function is pretty much how I now have it implemented
but I'm seriously considering making that way a special case that handles C
strings and making the default case be a counted string. I'm not sure if
that will buy me anything other than getting away from the C paradigm
though.

Tony

Tony · Jan 8, 2009

Richard said:
Standard beginner error #1 : You forgot the input condition where *s ==
0. Result is UB and dirty underpants flying out your nostrils or
whatever it is.

We won't bother being so pedantic as mentioning if S is NULL to start
with.

Surely he was just showing the main gist of the null-terminated string
writing and didn't intend to suggest that the given function was completed
production code.

Tony

Tony · Jan 8, 2009

CBFalconer said:
None of that appears in C.

With point 2 above, I was referring to the fact that strlen() has to count
all the characters everytime you want to get the length. Other
implementations simply do some pointer arithmetic or return a maintained
integer value for the count.

I don't even know what I was thinking with point 1. Probably some kind of
fixed size record IO.

Tony

Tony · Jan 8, 2009

Keith Thompson said:
Tony said:

Keith Thompson said:

[...]
I don't see that as an issue. Everything doesn't have to be scalable
to the largest integer size on a machine. Strings to me are of
"reasonable" length. For instance. If there is a period marking the
end of a sentence, then that is probably one string. A whole file of
sentences and paragraphs, is not a string. 32 bits for a length
field is just because it's easy to use on a 32-bit platform. If
someone needs a billion byte string, well I'm not even going to try
to conceive of that because it sounds silly.

Limiting a feature to what you consider to be "reasonable" is likely
to prevent perfectly valid uses. I've written programs (not in C)
that slurp the entire contents of a file into memory as a single
string. Why should that be disallowed because you think it's "silly"?

Click to expand...

It's not about "disallowing" anything. It's about not perverting the
common
case for the exceptional case.

Click to expand...

In C as it's currently defined, a string's length may be as large
as SIZE_MAX-1, though an implementation may impose a smaller limit
on object sizes.

If you implement a string package that can only handle strings up to,
say, 65535 bytes, imposes what I think is an unnecessary limitation.
A limit of 255 bytes is, I believe, quite unreasonable. Supporting
huge strings shouldn't be much more difficult that supporting short
strings. There's no need to "pervert" anything.

But when would such a huge string be used? Imagine calling strlen() on a
HUGE string! Another abstraction is probably appropriate (a simple buffer of
characters without null termination?) for the entire number of characters in
a file.

Tony

Tony · Jan 8, 2009

CBFalconer said:
Sorry, you're wrong. A string does not end on a period, or a '\n',
etc. It ends on the first '\0'.

That's just the C paradigm. It's unnatural. I can see the reason for
character buffers, but null terminating them is bizarre.

Nothing (other than SIZE_MAX)
limits the length of the string. The reason is simple - the
standard so specifies.

Thus, assuming adequate memory, a whole text file can be stored in
a single string.

That's bizarre, IMO.

Tony

Tony · Jan 8, 2009

Perhaps strings should be akin to width-specified integers:

string16 (a string with up to 65536 chars)
string32 ... etc.

"This is almost not a bad idea, but the major problem with it is
string16 and string32 are now different types, when they arguably
should not be."

True. It was a bad quick thought.

"These are normally known as "C-style" strings. The main advantage is
the length of the string is limited only by available memory, and the
length field is not stored with the string, thus conserving storage
space."

The "main advantage" above, is actually a disadvantage. It causes
programmers to write code that is succeptible to buffer overrun attacks.

"Storing the length with the string does not protect against buffer
overrun attacks."

It does increase "length awareness" though and keeps focus on the issue that
way.

Storage space conservation? Only in the exceptional case nowadays.

"Not on embedded systems."

As someone who targets desktops/laptops and servers only, embedded platforms
seem like a special case.

" The ATMega168 I have sitting on my desk right
now has 512 bytes of read-only storage for data, and 16kB more
available for program + data. Every byte counts. These are not
exceptional cases; it's fairly common hardware, albeit somewhat
specialized."

"Common case" (as I was considering it to be): desktop/laptop/server.

[snipped: I don't want to continue the counted string discussion because I
haven't implemented it that way]

"The main disadvantage of C-style strings is computing the length is O
(n)"

I'd say there are a FEW issues and that is just one of them.

"Please elaborate."

I would if I could remember the other issues I was reflecting on.

Tony

Tony · Jan 8, 2009

"Or you could just use null-terminated strings, which always require 1
byte more than the length of the string, and don't have any of the
string parsing issues mentioned above. It is never possible to have a
counted string that takes up less space than a null-terminated string,
although it is possible to have them take the same amount of space."

That is true: the space efficiency of C-style strings can't be beat. It just
seems odd to be doing all that character iteration to find the damn null all
the time such as in strlen(). And also having to think about null at all is
seems like unnecessary baggage. Other implementations of strings seem better
to me except from the space penalty perspective. The answer is probably that
a single concept of "string" is inappropriate.

Tony

Tony · Jan 8, 2009

Tony said:
My C++ string class is a length and data ptr (pretty much, since the
underpinnings are an array class).

I misthought/misspoke: my implementation is not that.

Tony

Tony · Jan 8, 2009

String representations, like many things in life, are a compromise.
Zero terminated strings have some pros and cons. So do counted
strings (read the rest of the thread for some).

If you *really* can't live with zero terminated strings then write (or
find)
a library that doesn't use them!

I have encapsulated nul terminated strings but am considering recognizing
those as a special case instead of them being the underpinnings of the code.

Tony

Tony · Jan 8, 2009

Bartc said:
Just curious: what do you do with your length+char-array string

I don't have one of those (I misspoke).

when you need to pass it to an OS function that needs a zero-terminated
one (or Asciiz as they used to be called)?

Zero-terminate it on the fly in the char* conversion operator. (I use C++).

Tony

Tony · Jan 8, 2009

jacob navia said:
The string has folllowing fields:
1) length: The number of data characters in the string.
2) capacity: The length of the allocated string buffer for this string
3) Flags (used to implement read only strings and other goodies)
4) A pointer to the data characters.

Which is probably better than my 3 pointer implementation when on a 64-bit
platform. (Now I want 32-bit pointers to be still available on a 64-bit
platform!).

Tony

Tony · Jan 8, 2009

Tony said:
I use C++ and very often "ponder" the defficiencies of the language that
are there for reasons of backward compatibility with C. My C++ string
class IS a length and a ptr to data.

Damn, I was pretty convinced of that incorrect thought apparently. (Have
another drink Tony!).

Tony

Tony · Jan 8, 2009

Tony said:
"why don't you use std::string?"

Too unwieldly (read: "committee-designed camel").

Click to expand...

Definition of 'camel': a horse designed by a comittee.

Keith Thompson · Jan 8, 2009

Tony said:
But when would such a huge string be used? Imagine calling strlen() on a
HUGE string! Another abstraction is probably appropriate (a simple buffer of
characters without null termination?) for the entire number of characters in
a file.

So don't call strlen() on it -- or at least don't call it more than
once.

You do have to be aware of efficiency concerns when working with C
strings. Something like this:

for (i = 0; i < strlen(s); i ++) {
... s ...
}

which might be reasonable if strlen() operated on counted strings can
be horrendously inefficient. So do this instead:

const size_t len = strlen(s);
for (i = 0; i < len; i ++) {
... s ...
}

Nothing in the definition of C's null-terminated strings prevents you
from building huge ones. Why impose unnecessary limits?

Guest · Jan 8, 2009

Keith Thompson said:

perhaps this is a bit of playing with definitions,
but I sometimes have to manipulate byte streams (or octet
streams if I'm feeling really pedantic) and I tend not to
think of them as "strings".

It does, however, mean that BLOBs can't be stored as strings.

They'd have to have been using them for /quite a few/ decades to
have been using them before C.

yes? and so? People have been writing programs, some of which
manipulate byte streams, for a lot longer than C has existed.
Really.

Guest · Jan 8, 2009

"Or you could just use null-terminated strings, which always require 1
byte more than the length of the string, and don't have any of the
string parsing issues mentioned above. It is never possible to have a
counted string that takes up less space than a null-terminated string,
although it is possible to have them take the same amount of space."

That is true: the space efficiency of C-style strings can't be beat.

off the top of my head I can think of several ways to
compress the typical string of ASCII characters.

It just
seems odd to be doing all that character iteration to find the damn null all
the time such as in strlen(). And also having to think about null at all is
seems like unnecessary baggage.

I don't think about it. have you *heard* of an Abstract Data Type?

Other implementations of strings seem better
to me except

good. So use one and stop wittering on about it.

from the space penalty perspective. The answer is probably that
a single concept of "string" is inappropriate.

woop-i-doop

Tim Rentsch · Jan 8, 2009

Eric Sosman said:
JC said:

JC wrote:
These are normally known as "C-style" strings. The main advantage is
the length of the string is limited only by available memory, and the
length field is not stored with the string, thus conserving storage
space.
You'll notice I did heed your desire
for me to reply in-thread

Click to expand...

Thanks!

Please cite the C standard for your claim that the length of a C string
is limited *only* by available memory.

Click to expand...

In fact, the maximum length of a string is not even limited by
available memory. [7.1.1.1] (in C99, TC2) defines a "string" and does
not define a limit on its length. There is no number that exists such
that if the length of a string exceeded that number, it would not be
considered a "string" as defined by the standard. The maximum length
of a string is actually infinite.

Click to expand...

In a freestanding implementation, perhaps, but in a hosted
implementation the strlen() function must be able to return "the
number of characters that precede the terminating null character"
(7.21.6.3p3). Since it returns this count as a size_t value, it
follows that the count cannot exceed SIZE_MAX, a finite number.

I've read 7.21.6.3p3, but I don't see why you say strlen() must be
able to return the length of any string. Perhaps it's true that /if/
strlen() returns then the result must equal the length of the string,
but why must it return? In particular, if strlen() were implemented
as

size_t
strlen( const char *s ){
size_t r;
for( r = 0; s[r]; r++ ) {}
return r;
}

that doesn't prevent the running (or termination) of any strictly
conforming program. So it's possible for an implementation to allow
strings longer than SIZE_MAX characters, yet still be a conforming
implementation.

Tim Rentsch · Jan 8, 2009

[snip]

The requirement that sizeof(type) always give the correct size has quite
different implications. Since it will always be possible to declare
types that have a size greater than SIZE_MAX, no matter what value
SIZE_MAX has, there's no course of action that an implementation has
available to it to ensure meeting that requirement. I therefore consider
that requirement to be a defect in the standard. I imagine that the
actual intent was that sizeof(type) is only required to give the correct
size when the correct size is smaller than SIZE_MAX. However, the
standard as currently written contains no wording that actually allows
sizeof to ever fail to give the correct size.

Talking about the sizeof operator, 6.5.3.4 p 4 says

The value of the result is implementation-defined ...

As long as SIZE_MAX is at least 65535, an implementation could
define sizeof(T), where T would otherwise have a size > SIZE_MAX,
to be SIZE_MAX. This definition for sizeof doesn't affect the
behavior of any strictly conforming program, so an implementation
defining sizeof this way could still be a conforming implementation.

Working with NON-NULL terminated strings	4	Jul 14, 2007
Reading null terminated strings in Java	9	Feb 4, 2009
pointer to NULL terminated array of pointer	8	Aug 30, 2012
How to put a null check on this code	0	Jan 4, 2022
Using <algorithm> with null-terminated arrays	4	Dec 18, 2010
strncpy() and null terminated strings	4	Apr 8, 2004
Hello all! Noob here with completely unrealistic ambitions. Happy to join the crew and get good enough to help others.	4	Aug 13, 2024
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022

Null terminated strings: bad or good?

Keith Thompson

Kenny McCormack

Tony

Tony

Tony

Tony

Tony

Tony

Tony

Tony

Tony

Tony

Tony

Tony

Tony

Keith Thompson

Guest

Guest

Tim Rentsch

Tim Rentsch

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads