Degenerate strcmp

C

Chris Dollin

Antoninus said:
Part of the confusion seems to be the names: for example, strlen takes a
char * and returns an int. If the parameter is a string, then the
integer is the length of the string and that makes perfect sense. But
what strlen actually takes is a general char *, not necessarily a
string, and if you pass strlen a char * that isn't a string then you
need to think more carefully about how to interpret the return value of
strlen (or strlen might not terminate at all).

Doesn't the Standard says specifically that `strlen` takes a string?
If so, passing a char* that /isn't/ a string is undefined behaviour.
You don't have to think carefully about how to interpret the
return value; you have to ensure it's passed a string.
 
R

Richard

One way I've seen strcmp(char *s1, char *s2) implemented is: return
immediately if s1==s2 (equality of pointers); otherwise do the usual
thing of searching through the memory at s1 and s2.

Of course the reason for doing this is to save time in case equal
pointers are passed to strcmp. But it seems to me that this could create
an inconsistency in the degenerate case when s1 points to memory that is
not null-terminated, i.e. by some freak chance, all of the memory from
s1 till the computer reaches the end of all its memory pages (however
that works) don't contain a single null byte. In this case, strcmp
should not say that s1 and s2 are "equal strings" since neither is
actually a string (because not null terminated).

Is my thinking correct?

No.

If you take that view then all the string functions are wrong because
they too could have "non terminated strings".

In addition, the chance of s1==s2 is probably very, very low in a real
program and the check will in fact impede performance over the run time.
 
C

CBFalconer

Eric said:
CBFalconer said:
user923005 wrote:
... snip ...

That is a bar to relying on any particular behaviour in such a
circumstance, not a bar to defining the result. Thus the following
should be a satisfactory system implementation of strcmp():

int strcmp(const char *_s1, const char *_s2) {
if (_s1 == _s2) return 0;
else
while (*_s1 && (*_s1 == *_s2)) {
_s1++; _s2++;
}
return (*_s1 > *_s2) - (*_s1 < *_s2);
} /* strcmp */

A digression from the main point (with which I agree):
There's a potential bug in the `return' statement on systems
where `char' is signed. 7.2.1.4p1:

"The sign of a nonzero value returned by the
comparison functions [...] is determined by
the sign of the difference between the values of
the first pair of characters (both interpreted
as unsigned char) that differ [...]"

That is, "abc\200" must compare greater than "abc\177",
even when the final `char' value in the first string is
negative.

Suggested fix:

return ((unsigned char) *_s1 > (unsigned char) *_s2)
- ((unsigned char) *_s1 < (unsigned char) *_s2);

That is an insect I will probably create again in the future, and
probably have done in the past. Bah. This is the first time I
recall it being pointed out, though. Thanks.
 
C

CBFalconer

Antoninus said:
.... snip ...

illustrates this more simply than strcmp: malloc(0) returns a
pointer to some random place in memory, and there's no absolute
guarantee that a 0-byte will occur later in memory, so what gets
printed could be a random number or in theory the program could
just never terminate.

No it doesn't, it can return a NULL. Check the standard,
carefully. This is probably in place to cover some unusual
implementations.
 
K

Keith Thompson

Chris Dollin said:
Doesn't the Standard says specifically that `strlen` takes a string?
If so, passing a char* that /isn't/ a string is undefined behaviour.
You don't have to think carefully about how to interpret the
return value; you have to ensure it's passed a string.

No, it says that strlen takes a char* (actually a const char*).

A char* is a pointer; it logically *cannot* be a string. A char*
value may or may not *point to* a string. (Strictly speaking it may
or may not point to the first character of a string, but the standard
specifically defines a "pointer to a string" as a pointer to its first
character.)

And yes, passing to strlen() a char* value that doesn't point to a
string invokes undefined behavior. One special case of this is that
strlen(NULL) invokes UB.

Anyone who's still confused should read sections 4, 6, and 8 of the
comp.lang.c FAQ, <http://www.c-faq.com/> (and probably the rest of it
too).
 
R

Richard Tobin

Your right that a string has to be null terminated, but for random
memory maybe by chance there just isn't any null byte.

In which case the program has a bug, because it's not allowed to call
strcmp on "random memory" that doesn't have a nul byte. Programs
with bugs of this kind are not required to behave in any particular way,
so it's quite legal for strcmp to return any value it likes.
so returning
For example, for the program

main() { printf("%d\n",strlen(malloc(0))); }

this will print out a random number, but in principal the strlen call
might never terminate, if the memory at the pointer returned by malloc
doesn't have any null bytes.

The C library functions are only required to behave correctly if you call
them correctly. If you call them with random data, all bets are off.
(In this particular case, it may well produce a segmentation fault, because
malloc(0) may return null.)

As the post you were replying to said:

-- Richard
 
R

Richard Tobin

user923005 said:
How often are you really going to do this:
if (strcmp(p,p)==0) call_captain_obvious();

Never, but I often call strcmp(p,q) where p and q might be the same
string, depending on user input.

-- Richard
 
R

Richard Tobin

In addition, the chance of s1==s2 is probably very, very low in a real
program

Depends on the program. I have programs where if strcmp(p,q) == 0
it's almost always because p == q. it's an interpreter for another
language, and equal strings that come from the same source document
will generally be merged.

-- Richard
 
R

Richard

Depends on the program. I have programs where if strcmp(p,q) == 0
it's almost always because p == q. it's an interpreter for another
language, and equal strings that come from the same source document
will generally be merged.

For sure there will be exceptions. But how many programs are language
interpreters doing word merges?

I would think that generally most string comparisons done in parsing
type situations will be comparing dynamically allocated input strings
with token string representations. They will never be the same physical
location.
 
P

pete

Richard said:
Never, but I often call strcmp(p,q) where p and q might be the same
string, depending on user input.

I don't think situations like that are common enough
to warrant strcmp checking for pointer equality.
I would prefer to have you write

if (p != q)

yourself, before calling strcmp,
for cases where it was shown to be significantly faster.
 
P

pete

Antoninus Twink wrote:
Part of the confusion seems to be the names:
for example, strlen takes a char * and returns an int.

The return type of strlen is size_t, not int.
 
F

fishpond

In which case the program has a bug, because it's not allowed to call
strcmp on "random memory" that doesn't have a nul byte. Programs
with bugs of this kind are not required to behave in any particular way,
so it's quite legal for strcmp to return any value it likes.
so returning


The C library functions are only required to behave correctly if you call
them correctly. If you call them with random data, all bets are off.
(In this particular case, it may well produce a segmentation fault, because
malloc(0) may return null.)

No I believe malloc(0) can never return null - after all, how could it
not be possible to allocate 0 bytes of memory!
 
F

Flash Gordon

On 18 Aug 2007 at 21:59, Richard Tobin wrote:


No I believe malloc(0) can never return null - after all, how could it
not be possible to allocate 0 bytes of memory!

You believe wrong for at least three reasons.
1) All possible pointer values might have been used, and if it is
non-null it has to be unique.
2) There may be house-keeping structures required for which there is no
space.
3) The standard allows it.


Please don't quote peoples signatures, the bit typically after the "--
", unless you are commenting on them. In fact, you should trim anything
not relevant to your reply as I have done.
 
C

Chris Dollin

No I believe malloc(0) can never return null

Nevertheless, it is permitted to do so. Since it is permitted,
an implemention could start with

if (requestedSize == 0) return 0;

Hence `malloc(0)` could return null. This could be a good idea
if the degenerate general case of `allocate N bytes` with N=0
took up more room than was justfied.
- after all, how could it
not be possible to allocate 0 bytes of memory!

If there was no room for the /bookkeeping/ necessary to record
the allocation.
 
P

pete

No I believe malloc(0) can never return null

You believe wrongly.

N869
7.20.3 Memory management functions
[#1]

If the size of the space requested is zero,
the behavior is implementation-defined:
either a null pointer is returned,
or the behavior is as if the size were some nonzero value,
except that the returned pointer shall
not be used to access an object.
- after all, how could it
not be possible to allocate 0 bytes of memory!

That's not what a null pointer return value means for malloc(0).
 
R

Richard Tobin

No I believe malloc(0) can never return null - after all, how could it
not be possible to allocate 0 bytes of memory!

The specification of library functions is a question best determined by
looking at the standard, not by arguing "it must be like this".

-- Richard
 
C

CBFalconer

Richard Tobin wrote:
.... snip ...


No I believe malloc(0) can never return null - after all, how
could it not be possible to allocate 0 bytes of memory!

Easy. Read the C standard.
 
E

Eric Sosman

[...]
No I believe malloc(0) can never return null -

Would reading section 7.20.3 paragraph 1 of the language
Standard alter your belief?

"[...] If the size of the space requested is zero,
the behavior is implementation-defined: either a
null pointer is returned, or the behavior is as if
the size were some nonzero value, except that the
returned pointer shall not be used to access an object."
> after all, how could it
> not be possible to allocate 0 bytes of memory!

Most likely, because it fails to allocate the internal
bookkeeping space it uses for keeping track of the addresses
it has returned that have not yet been free()d.

There's a potentially interesting quibble here, for people
interested in quibbles. The Standard requires (same paragraph)
that memory obtained from malloc() be "disjoint" from all other
objects, which is not quite the same as requiring that it have
an address different from that of all other objects. Since the
value of malloc(0) cannot be used to access an object, it could
be argued that the program cannot test disjointness without
engaging in undefined behavior anyhow. This would seem to open
the door to a malloc(0) that returned the same non-null value
on every call (a value free() would ignore), avoiding the need
for bookkeeping and making it possible to call malloc(0) an
unlimited number of times without fear of failure.

But even if it did so, the proposed use
> main() { printf("%d\n",strlen(malloc(0))); }

.... would be in trouble anyhow, because the strlen() call tries
to use the returned pointer to access an object -- an access
the Standard forbids.
 
F

Flash Gordon

Eric Sosman wrote, On 19/08/07 15:43:
[...]
No I believe malloc(0) can never return null -

Would reading section 7.20.3 paragraph 1 of the language
Standard alter your belief?

"[...] If the size of the space requested is zero,
the behavior is implementation-defined: either a
null pointer is returned, or the behavior is as if
the size were some nonzero value, except that the
returned pointer shall not be used to access an object."
after all, how could it
not be possible to allocate 0 bytes of memory!

Most likely, because it fails to allocate the internal
bookkeeping space it uses for keeping track of the addresses
it has returned that have not yet been free()d.

There's a potentially interesting quibble here, for people
interested in quibbles. The Standard requires (same paragraph)
that memory obtained from malloc() be "disjoint" from all other
objects, which is not quite the same as requiring that it have
an address different from that of all other objects. Since the
value of malloc(0) cannot be used to access an object, it could
be argued that the program cannot test disjointness without
engaging in undefined behavior anyhow.

The following does not invoke undefined behaviour since you are always
allowed to test for equality (the first byte is 1 beyond the end which
is still OK or you could not free it)...

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
void *p1 = malloc(0);
void *p2 = malloc(0);

if (p1==NULL || p2==NULL)
puts("At least one null pointer returned");
else if (p1==p2)
puts("Regions are not disjoint");
else
puts("Regions are disjoint");

free(p1);
free(p2);

return 0;
}
> This would seem to open
the door to a malloc(0) that returned the same non-null value
on every call (a value free() would ignore), avoiding the need
for bookkeeping and making it possible to call malloc(0) an
unlimited number of times without fear of failure.

No, I don't think so. See above.
But even if it did so, the proposed use


... would be in trouble anyhow, because the strlen() call tries
to use the returned pointer to access an object -- an access
the Standard forbids.

Agreed. That is not allowed whatever malloc(0) returns.
 
M

Mark Bluemel

No I believe malloc(0) can never return null - after all, how could it
not be possible to allocate 0 bytes of memory!

You've never used IBM's AIX C runtime obviously...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top