Null terminated strings: bad or good?

T

Tim Rentsch

Harald van =?UTF-8?b?RMSzaw==?= said:
You snipped the actual question, which was to cite a specific clause in
the standard that such an implementation would violate.

Once again, assume the following (I'll change the example a bit):

sizeof (short) == 2

The following:
short *ptr = calloc(SIZE_MAX, sizeof *ptr);
succeeds, causing ptr to point to the first element of an array
object whose size is SIZE_MAX*2 bytes. You can successfully read
and write all SIZE_MAX short objects in the allocated object, from
ptr[0] to ptr[SIZE_MAX-1].

sizeof (short[SIZE_MAX]) is rejected with a diagnostic at compile
time. (I would expect this to happen for any compiler where sizeof
(short) > 1.)

You claim that this hypothetical implementation is non-conforming. What
specific clause of the standard does it violate?

I don't know who posted this first, but fill the memory, call strlen, and
any result it can give renders the implementation nonconforming as
strlen's description has no exception for strings longer than SIZE_MAX
characters.

It doesn't have to be nonconforming; see my response to Eric Sosman.
 
T

Tim Rentsch

JC said:
Additionally:




It looks like there can be more than one way to work through this
logic. I believe this is the root of any disagreements.

(a) On one hand, you can say if sizeof *must* return the size of an
object, and sizeof returns a size_t, then in order to satisfy the
constraints of sizeof, an object's size *must* fit in a size_t. That
is the argument you, James Kuyper, and others are putting forward.

(b) On the other hand, it seems that you can say if sizeof returns the
size of an object, and sizeof returns a size_t, then sizeof is
undefined if an object's size can not fit in a size_t, and that sizeof
is not the limiting factor on an object's size. I do not see anything
in the standard that disallows this line of reasoning either. This is
the argument I am putting forward.

It appears that both arguments are equally valid. With (b), to deduce
what the maximum size of an object can be, you must look elsewhere in
the standard, as sizeof's definition is insufficient to determine it.
AFAIK there are no other statements of the object's maximum size. It
would follow, then, with (b), that while the maximum theoretical size
of an object is infinite (i.e. there's no number such that if the size
of an object exceeded that number, it would no longer be considered an
"object"), the actual maximum size of an object is limited by the
largest amount of memory you could obtain. The largest amount of
memory you can obtain, AFAIK, is by calloc(SIZE_MAX,SIZE_MAX). This
would have to be on a system where the number of bits in a pointer was
at least double the number of bits in a size_t (e.g. 64-bit pointers,
32-bit size_t).

A conforming implementation could have SIZE_MAX == 65536, but
still accept

char x[10000000000000000000000000000000000000000000];

and even run the program (on the special GINORMOUS model of the
DeathStar2000). This implementation is allowed (and can be
conforming) because the result of sizeof is implementation-defined,
and any program that would otherwise have a sizeof result > 65535
isn't a strictly conforming program, so an implementation can
decide whatever it wants to for implementation-defined behavior
in such programs, and still be a conforming implementation.
 
B

Bartc

Now I'm wondering if I should jettison maintaining the null and with it C
compatibility. Or at least have a separate beast called CString for the
null-terminated encapsulation. Or simply have the overloaded char*
operator null terminate on the fly when interfacing "legacy code". The
latter sounds pretty good.

It sounds pretty slow too if the memory used by the string isn't /quite/
enough to put the zero in. And if you usually have spare bytes at the end,
you might as well always put a zero in there.

However if there are embedded zeros in the string, your char* converter
might have more work (and more decisions) to do.
 
B

Bartc

Tony said:
That's bizarre, IMO.

Some people naturally think of strings as short strings (words, names,
messages, textlines, filenames and so on). For this purpose, a 256-character
limit is a bit tight but with perhaps 1K strings you can do a lot of useful
work.

But others naturally want to use strings to store anything, of any size,
including gigabyte-sized files.

So if you're writing libraries (or implementing languages), you have to
allow for these two extremes.
 
R

Rui Maciel

Tony said:
But when would such a huge string be used? Imagine calling strlen() on a
HUGE string! Another abstraction is probably appropriate (a simple buffer
of characters without null termination?) for the entire number of
characters in a file.

I believe the action performed by the strlen() routine is always present in other string-type objects in some way or form, which means that the problem that you are trying to blame on null-terminated strings is also present in other string-like implementations. You may not need to run it explicitly but it is always there.

You may not like the idea of running strlen() every time you want to know the length of a null-terminated string but, as it was already said, that isn't something you are forced to do, let alone need.


Rui Maciel
 
J

James Kuyper

Richard said:
Keith Thompson said: ....

They'd have to have been using them for /quite a few/ decades to
have been using them before C.

Is that not in fact the case?
 
R

Richard

off the top of my head I can think of several ways to
compress the typical string of ASCII characters.

Depending heavily on the string contents of course. I would be
interested to hear of the several ways from the top of your head which
retain efficiency and work with the std C library.
I don't think about it. have you *heard* of an Abstract Data Type?

I don't believe you. You HAVE to think about it in C. Either that or
your malloc's have some serious bugs.
good. So use one and stop wittering on about it.

Aha. The true reason for Nick's interjection.
woop-i-doop

Your helpful contributions once more noted.
 
R

Richard

Tony said:
Surely he was just showing the main gist of the null-terminated string
writing and didn't intend to suggest that the given function was completed
production code.

Tony

One can only judge by the code given. Normally I would not have
mentioned it, but CBF is a pretty nasty piece of work who won't "help"
anyone who does not post full and complete code. One can only assume he
meant the code as the one true way. But regardless, that entire function
was nonsense because of the UB. Now, I don't really think that, but
that's the way we are trained by the c.l.c regs to think.
 
K

Kenny McCormack

Richard said:
Your helpful contributions once more noted.

I think we are all in agreement that "Nick Keighley" hasn't contributed
to anything other than the sewer system.
 
D

Dik T. Winter

>
> They'd have to have been using them for /quite a few/ decades to
> have been using them before C.

One type of file on CDC computers in the (early) seventies already was a text
file with Z-type records. Now you may wonder what the Z refers to. Yup,
zero-terminated...
 
H

Hallvard B Furuseth

Wojtek said:
CBFalconer wrote in message
How about this. No exceptions are mentioned, thus it covers all.

You'd think that; but what about sizeof(char[SIZE_MAX][2])?

What about it? Since sizeof shall return the size in bytes, that array
type is an error. At least gcc and sun CC think so, and it makes sense
to me.


Variable-length arrays are a problem though:
6.5.3.4 The sizeof operator
... snip ...
[#2] The sizeof operator yields the size (in bytes) of its operand,
(...) If the type of the operand is a variable length array type,
the operand is evaluated; (...)

int foo(size_t x) { long a[x]; return sizeof(a); }

Now the value of foo(SIZE_MAX) is wrong - if such a large auto
array doesn't produce a run-time exception, anyway.
 
W

Wojtek Lerch

Hallvard B Furuseth said:
Wojtek said:
CBFalconer wrote in message
Keith Thompson wrote:
You claim that "sizeof returns the size of ANY object". I see
nothing in the standard that directly supports this claim. If you
can prove it from the standard, please do so. I'm not interested
in any response that doesn't include one or more specific citations
from the standard.

How about this. No exceptions are mentioned, thus it covers all.

You'd think that; but what about sizeof(char[SIZE_MAX][2])?

What about it? Since sizeof shall return the size in bytes, that array
type is an error. At least gcc and sun CC think so, and it makes sense
to me.

What do you mean by "error"? A constraint violation? That would imply that
there's a constraint somewhere in the standard that is violated -- do you
happen to know the exact chapter and verse?

And is it an "error" to use that type in any context, or just as an operand
of sizeof?
Variable-length arrays are a problem though:
6.5.3.4 The sizeof operator
... snip ...
[#2] The sizeof operator yields the size (in bytes) of its operand,
(...) If the type of the operand is a variable length array type,
the operand is evaluated; (...)

int foo(size_t x) { long a[x]; return sizeof(a); }

If you add "typedef" in front of "long a[x]", the implementation won't have
an excuse to abort the program due to lack of memory:
Now the value of foo(SIZE_MAX) is wrong - if such a large auto
array doesn't produce a run-time exception, anyway.

As far as I can tell, the standard doesn't explicitly forbid naming types
larger than SIZE_MAX bytes, or even applying sizeof to such types -- it just
describes the semantics of sizeof in a way that is logically impossible for
such types.
 
H

Hallvard B Furuseth

Wojtek said:
Hallvard B Furuseth said:
Wojtek said:
You'd think that; but what about sizeof(char[SIZE_MAX][2])?

What about it? Since sizeof shall return the size in bytes, that array
type is an error. At least gcc and sun CC think so, and it makes sense
to me.

What do you mean by "error"? A constraint violation? That would imply
that there's a constraint somewhere in the standard that is violated --
do you happen to know the exact chapter and verse?

Whoops, good point. I've been too long away from C standardese.
And is it an "error" to use that type in any context, or just as an
operand of sizeof?

Those compilers reject 'typedef long foo[SIZE_MAX];'.

[Rearranging the reply a bit]
As far as I can tell, the standard doesn't explicitly forbid naming
types larger than SIZE_MAX bytes, or even applying sizeof to such types
-- it just describes the semantics of sizeof in a way that is logically
impossible for such types.

Yup.

For fixed-size types, two natural fixes would be to make either the type
or the sizeof() a constraint violation. I think rejecting the type
makes sense. However:
Variable-length arrays are a problem though:
6.5.3.4 The sizeof operator
... snip ...
[#2] The sizeof operator yields the size (in bytes) of its operand,
(...) If the type of the operand is a variable length array type,
the operand is evaluated; (...)

int foo(size_t x) { long a[x]; return sizeof(a); }

If you add "typedef" in front of "long a[x]", the implementation won't
have an excuse to abort the program due to lack of memory:

Right again.
 
K

Keith Thompson

Hallvard B Furuseth said:
Wojtek said:
CBFalconer wrote in message
Keith Thompson wrote:
You claim that "sizeof returns the size of ANY object". I see
nothing in the standard that directly supports this claim. If you
can prove it from the standard, please do so. I'm not interested
in any response that doesn't include one or more specific citations
from the standard.

How about this. No exceptions are mentioned, thus it covers all.

You'd think that; but what about sizeof(char[SIZE_MAX][2])?

What about it? Since sizeof shall return the size in bytes, that array
type is an error. At least gcc and sun CC think so, and it makes sense
to me.
[...]

But what kind of error? It's not a syntax error, and I don't see any
constraint that it violates.
 
W

Wojtek Lerch

Hallvard B Furuseth said:
Wojtek Lerch writes:
but what about sizeof(char[SIZE_MAX][2])?
What about it? Since sizeof shall return the size in bytes, that array
type is an error. At least gcc and sun CC think so, and it makes sense
to me.

But what kind of error? It's not a syntax error, and I don't see any
constraint that it violates.

The envirnomental limits require the implementation to accept an
object of at least 32767 bytes in C89 and 65535 bytes in C99.

That's not quite accurate; implementations are required to accept at
least one *program* that declares an object of that size (among other
things).
SIZE_MAX (introduced in C99) is guaranteed to be at least 65535.
Therefore, no implementation is required to accept char [SIZE_MAX][2],
because it exceeds environmental limits.

But "char[SIZE_MAX][2]" is a type, not an object.
 
V

vippstar

Hallvard B Furuseth said:
Wojtek said:
CBFalconer wrote in message
Keith Thompson wrote:
You claim that "sizeof returns the size of ANY object".  I see
nothing in the standard that directly supports this claim.  If you
can prove it from the standard, please do so.  I'm not interested
in any response that doesn't include one or more specific citations
from the standard.
How about this.  No exceptions are mentioned, thus it covers all.
You'd think that; but what about sizeof(char[SIZE_MAX][2])?
What about it?  Since sizeof shall return the size in bytes, that array
type is an error.  At least gcc and sun CC think so, and it makes sense
to me.

[...]

But what kind of error?  It's not a syntax error, and I don't see any
constraint that it violates.

The envirnomental limits require the implementation to accept an
object of at least 32767 bytes in C89 and 65535 bytes in C99.
SIZE_MAX (introduced in C99) is guaranteed to be at least 65535.
Therefore, no implementation is required to accept char [SIZE_MAX][2],
because it exceeds environmental limits.
 
H

Hallvard B Furuseth

Wojtek said:
The envirnomental limits require the implementation to accept an
object of at least 32767 bytes in C89 and 65535 bytes in C99.

That's not quite accurate; implementations are required to accept at
least one *program* that declares an object of that size (among other
things).
SIZE_MAX (introduced in C99) is guaranteed to be at least 65535.
Therefore, no implementation is required to accept char [SIZE_MAX][2],
because it exceeds environmental limits.

But it's not required to reject it either. It's OK to accept it
provided it behaves in a logically logically impossible way, as Wojtek
pointed out elsewhere...
But "char[SIZE_MAX][2]" is a type, not an object.

Ow. Thus allowing (complete) types that cannot be instantiated?
Is that a bug or a feature in the standard?
 
T

Tim Rentsch

James Kuyper said:
Is that not in fact the case?

The PDP-10, introduced in 1968, used null-terminated strings (the
ASCIZ assembler directive meant "ASCII zero (terminated)").
Also, since the PDP-10 was architecturally almost identical to
the earlier PDP-6 (introduced in 1964), most likely the PDP-6
used these also.
 
S

Stephen Sprunk

Tony said:
But when would such a huge string be used? Imagine calling strlen() on a
HUGE string! Another abstraction is probably appropriate (a simple buffer of
characters without null termination?) for the entire number of characters in
a file.

If running strlen() is going to cause performance problems, keep track
of the length in a separate variable. Counted strings always have that
extra bookkeeping to do, and null-terminated strings allow you to ignore
it in the common case of short strings where it doesn't help.

S
 
K

Keith Thompson

Hallvard B Furuseth said:
Wojtek Lerch writes:
CBFalconer wrote in message
Keith Thompson wrote:
You claim that "sizeof returns the size of ANY object".  I see
nothing in the standard that directly supports this claim.  If you
can prove it from the standard, please do so.  I'm not interested
in any response that doesn't include one or more specific citations
from the standard.
How about this.  No exceptions are mentioned, thus it covers all.
You'd think that; but what about sizeof(char[SIZE_MAX][2])?
What about it?  Since sizeof shall return the size in bytes, that array
type is an error.  At least gcc and sun CC think so, and it makes sense
to me.

[...]

But what kind of error?  It's not a syntax error, and I don't see any
constraint that it violates.

The envirnomental limits require the implementation to accept an
object of at least 32767 bytes in C89 and 65535 bytes in C99.
SIZE_MAX (introduced in C99) is guaranteed to be at least 65535.
Therefore, no implementation is required to accept char [SIZE_MAX][2],
because it exceeds environmental limits.

Even assuming the environmental limit on object size applies to a type
for which no object is declared, that still doesn't imply that an
implementation *must reject* ``sizeof(char[SIZE_MAX][2])''.

In fact, the implementations I've seen do reject it (i.e., issue a
diagnostic and fail to process the translation unit), and I believe
that's the only reasonable behavior. But I don't see how to justify
it based on the normative wording of the standard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,056
Members
48,769
Latest member
Clifft

Latest Threads

Top