Null terminated strings: bad or good?

J

James Kuyper

Ian said:
The size of objects is calculated by the compiler (except for VLAs) and
is an unsigned integer type (6.5.3.4/4).

That's not what 6.5.3.4p4 says. It only says that sizeof returns the
size as an unsigned integer type. It doesn't say that it was calculated
using unsigned arithmetic, and it doesn't give permission to
mis-calculate the size in the way that could occur if unsigned
arithmetic were used.
 
J

James Kuyper

CBFalconer said:
It's a fine point, but the types are harmless. They don't create a
problem unless used in a declaration or sizeof.

Without the use of sizeof, what requirement specified by the standard
would an implementation be violating if it accepted such a declaration
without issuing a diagnostic? The only section of the standard you've
cited applies only to sizeof expressions, and says nothing to constrain
the behavior of an implementation when translating code that does not
apply the sizeof operator to such an object.
 
J

James Kuyper

CBFalconer said:
James Kuyper wrote:
... snip ...

size_t is specified to be able to specify the size of ANY object.

You keep repeating that as if it were true, have repeatedly been asked
to cite relevant text supporting that claim, and have repeatedly cited
text describing sizeof that provides no support for such a claim. I
can't make you acknowledge that your statement is wrong, but I certainly
won't accept it as a premise in an argument.

The only thing that the standard specifies about size_t is that it "is
the unsigned integer type of the result of the sizeof operator". Nowhere
does it say that every object must be small enough that sizeof could be
applied to it, even in code that never applies sizeof to that object,
even if that object is allocated and used in such a way that it's not
possible to apply sizeof to the entire object.
Also consider that the system has no known means of allocating that
'over SIZE_MAX' space in a continuous block.

Why not? If the system has a contiguous block of more than SIZE_MAX
bytes available for allocation, what aspect of the standard prevents
calloc() from returning a pointer to such a block?
 
J

James Kuyper

CBFalconer said:
Wojtek said:
CBFalconer said:
Wojtek Lerch wrote:

Where does the standard say that char[SIZE_MAX][SIZE_MAX] is
not a declarable type?
It says sizeof can return the size of a type. But it returns a
size_t, which has a maximum value of SIZE_MAX. This requires
that the declaration be an error, or at least unusable.
Unusable as an operand of sizeof, maybe. But it doesn't follow
that it must be unusable for other purposes.

No restriction on using it as a number. But size_t is intended to
measure the size of ANY object

No, it's only intended to measure the size of the object that it's
applied to. It's trivial to create code that never applies sizeof to an
object, and it's almost as trivial to create code where it's not
possible to apply sizeof to an entire object dynamically allocated by
calloc(), but only to sub-objects that are small enough for sizeof() to
yield the correct result.

I see nothing in the standard that prohibits an implementation from
accepting such code; I don't even see anything that mandates that an
implementation must issue a diagnostic when translating such code.
 
J

James Kuyper

CBFalconer said:
Keith Thompson wrote:
... snip ...

No. calloc should check that the multiplication does not overflow
before making any attempt to allocate memory. On overflow, return
NULL. Otherwise, call malloc. That's all that is required.

That's sufficient to meet requirements; but it is not required. It is
also permissible for calloc(SIZE_MAX, SIZE_MAX) to return a non-null
pointer to sufficient memory to hold the specified number of objects of
the specified size, even though sufficient memory to do so is much
greater than SIZE_MAX.
 
J

James Kuyper

CBFalconer said:
No need to ignore zero bytes. Just install them in the output.

That would not constitute reading an entire file into a single string.
That constitutes reading it into multiple strings; the total number of
strings being equal to the number of null characters written to the
output array.
The system is terminated by receiving an EOF in place of a char.
You have to do something else about signalling length.
>
A string is not a type. A char array is a type.

I can't imagine why you felt a need to point that out, since I wrote
nothing that suggested I might be unaware of that fact. A 'string' is
not a data type, nor is it a char array. It is a data format that can be
stored in a char array. Multiple strings can be stored in a single char
array. As defined by the C standard, it is a data format that ends at
the first null character.
 
R

Richard

James Kuyper said:
You keep repeating that as if it were true, have repeatedly been asked
to cite relevant text supporting that claim, and have repeatedly cited
text describing sizeof that provides no support for such a claim. I
can't make you acknowledge that your statement is wrong, but I
certainly won't accept it as a premise in an argument.

The only thing that the standard specifies about size_t is that it "is
the unsigned integer type of the result of the sizeof
operator". Nowhere does it say that every object must be small enough
that sizeof could be applied to it, even in code that never applies
sizeof to that object, even if that object is allocated and used in
such a way that it's not possible to apply sizeof to the entire
object.

I thought malloc took a size_t ?

It would take a special kind of pedantic nutter to suggest that malloc
is unable to create memory for any type of contiguous object used in C.
 
K

Keith Thompson

Harald van Dijk said:
The standard allows this for string literals, but it would have little
benefit as functions cannot make use of it reliably. The standard
guarantees that strlen("Hello") is 5, but equally that
strlen("Hello" + 1) is 4. 4 cannot be stored in the bytes before "ello".

And consider this:

char str[] = "Hello";
str[3] = rand() % CHAR_MAX;

strlen(str) is now either 5 or 3, depending on whether the value you
stored happened to be 0.

Even more fun:

char str[] = "Hello"; /* perhaps this is at file scope */
/* ... */
char *ptr = some_arbitrary_pointer_value();
*ptr = some_arbitrary_char_value();

Did strlen(str) change? If so, to what? The compiler might be able
to prove that it didn't change in some cases, but in general you can't
tell without re-scanning the array.

Ok, suppose you recompute the stored strlen() every time the array
might have changed.

char str[] = "Hello";
str[5] = '.';

Now str doesn't even contain a string (which is perfectly legitimate
as long as you don't try to treat it as one). What value do you store
in the hidden strlen()? And how much time will your program spend
re-scanning arrays rather than doing actual work?
 
T

Tony

Phil Carmody said:
Was that supposed to be an agreement or disagreement with what
I said? Your ability to follow and continue a logical argument
seems to be sub-par.

You basically just regurgitated to me what I said to you. Oh well. It's not
important. Move on.

Tony
 
I

Ian Collins

James said:
That's not what 6.5.3.4p4 says. It only says that sizeof returns the
size as an unsigned integer type.

Isn't that what I said?
It doesn't say that it was calculated
using unsigned arithmetic, and it doesn't give permission to
mis-calculate the size in the way that could occur if unsigned
arithmetic were used.

Where did I say it did?
 
I

Ian Collins

Richard said:
I thought malloc took a size_t ?

It would take a special kind of pedantic nutter to suggest that malloc
is unable to create memory for any type of contiguous object used in C.

Where does James mention malloc? You deliberately snipped the
clarifying comment regarding calloc. c != m.
 
J

James Kuyper

Ian said:
Isn't that what I said?

Sorry - I look back now at what you said, and I realize that you didn't
actually say what I thought you said. I thought you were agreeing with
CBFalconer, but I now see that you did not actually express agreement.
But if you weren't agreeing with him, what precisely was your point?
 
I

Ian Collins

James said:
Sorry - I look back now at what you said, and I realize that you didn't
actually say what I thought you said. I thought you were agreeing with
CBFalconer, but I now see that you did not actually express agreement.
But if you weren't agreeing with him, what precisely was your point?

I was attempting to expand upon your point. The size of an object
(other than a VLA) is a compile time unsigned integer constant and is
calculated by whatever means the compiler writer chooses.
 
D

David R Tribble

Keith said:
A calloc implementation that blindly multiplies its two arguments,
ignoring any wraparound, is buggy. I think I've seen such
implementations, but three systems I've just tried don't have this
problem. On all three systems, this program:

#include <stdio.h>
#include <stdlib.h>

#define MY_SIZE_MAX ((size_t)-1)
/* SIZE_MAX isn't always available */

int main(void)
{
void *c = calloc(MY_SIZE_MAX, MY_SIZE_MAX);
void *m = malloc(MY_SIZE_MAX * MY_SIZE_MAX);
printf("calloc %s\n", c == NULL ? "failed" : "succeeded");
printf("malloc %s\n", m == NULL ? "failed" : "succeeded");
return 0;
}

produces this output:

calloc failed
malloc succeeded

You might want to add the following lines to your code to
see what you get:

printf("calloc(%lu, %lu)\n",
(unsigned long)MY_SIZE_MAX, unsigned long)MY_SIZE_MAX);
printf("malloc(%lu)\n",
(unsigned long)(MY_SIZE_MAX * MY_SIZE_MAX));

The extra output will explain why malloc() did not fail.

-drt
 
D

David R Tribble

James said:
That's sufficient to meet requirements; but it is not required. It is
also permissible for calloc(SIZE_MAX, SIZE_MAX) to return a non-null
pointer to sufficient memory to hold the specified number of objects of
the specified size, even though sufficient memory to do so is much
greater than SIZE_MAX.

In fact, it's probably conforming for calloc() to allocate extra
megabytes of space with every object it allocates, just as
long as it returns a pointer to a block containing *at least*
the number of bytes requested. If the program can't tell
the difference, then nothing in the standard is being violated.

It is also possible that the system call used by malloc() and
calloc() to allocate heap memory might actually take two
parameters (e.g., a memory block size and a block count),
in which case such as system might very well be able to
allocate blocks larger than the compiler's chosen value
of SIZE_MAX. There is nothing in the standard that disallows
this.

-drt
 
J

James Kuyper

David said:
You might want to add the following lines to your code to
see what you get:

printf("calloc(%lu, %lu)\n",
(unsigned long)MY_SIZE_MAX, unsigned long)MY_SIZE_MAX);
printf("malloc(%lu)\n",
(unsigned long)(MY_SIZE_MAX * MY_SIZE_MAX));

The extra output will explain why malloc() did not fail.

He knows why the malloc() did not fail. The point is that the calloc()
did fail, which implies that it was not implemented as simply performing
the above multiplication.
 
J

J. J. Farrell

David said:
You might want to add the following lines to your code to
see what you get:

printf("calloc(%lu, %lu)\n",
(unsigned long)MY_SIZE_MAX, unsigned long)MY_SIZE_MAX);
printf("malloc(%lu)\n",
(unsigned long)(MY_SIZE_MAX * MY_SIZE_MAX));

The extra output will explain why malloc() did not fail.

Typo aside, that hardly needs any explaining - it's the core point of
the very long discussion!
 
K

Keith Thompson

[...]

The original question, given in the subject header, was:

Null terminated strings: bad or good?

The answer, I think, is simpler than this lengthy thread might indicate:

Both.
 
D

David Thompson

#if META /* but sadly ontopic */

Although people are perfectly free to post under pseudonyms, a
pseudonym on Usenet seems to be a good indication - not infallible,
just a good indication - that the person using it is a troll.
Han from China (also known as "George Orwell" (and probably "George"
sans surname, too), "Borked Pseudowhatever", and "Anonymous") is an
obvious dishonorable non-exception, as is Kenny McCormack (probably
- it *could*, after all, be his real name). <snip>

'Han' appeared as Orwell, Borked, Anonymous, and Nomen Nescio.
I believe you would be mistaken to include Just-George, who now seems
to have become just-Frank. He(?) has a distinctly different style, and
though recent here has been consistent on c.l.fortran since long
before 'Han'; to my recollection he has never been inflammatory or
vituperative, and rarely even contentious. He does seem a bit loopy*,
and may be worth ignoring on that basis, but not for trollery. (* In
case this word doesn't have the same meaning in en_notUS, it's between
eccentric and mildly crazy; think Elwood P Dowd, the Jimmy Stewart
character in Harvey.) He reminds me a bit of 'chair' (Malbrain).

#endif
And now for something completely different: The Topic!

Trying to salvage this thread may be doomed, but:
Here are some (not necessarily mutually exclusive) ideas that have
been used by stretchy string libraries in the past:

1) "struct hack" - housekeeping data at the front, char array at the
back, allowing stretchy-ready tstrings with a single malloc;

and perhaps pointing to the char part, which makes 'normal' data
accesses more convenient, with negative offsets to the header
2) keeping a max size as well as a current size, allowing the
library to resize when appropriate;

At least using normal C mallocation, realloc may (need to) move, so
that only works if the callers let you modify their pointers, or their
accesses indirect through e.g. a handle table that you can change.

Or just storing the max size without resizing, which still allows
overflow prevention.
3) using start and end pointers rather than a current size;
4) putting sentinels at the beginning and end of the allocated
array, so that buffer overflow - which would indicate a bug in the
library - can (or at least may) be detectable.
IME overrunning the beginning is much rarer than the end.
(And thus less likely worth the cost of checking for.)
Any more?

For a somewhat unusual application that involved much parsing of
stringy (text) pieces out of a few blobs of data (often only one large
one), I have seen used {ptr,len} all pointing into (and sharing) the
same space, which is deallocated en masse on returning to the top
level. The routines for this were a separate (and separable) module
within the project, but I wouldn't expect it to be widely reusable,
even if it hadn't been proprietary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top