Degenerate strcmp

C

Chris Dollin

Keith said:
No, it says that strlen takes a char* (actually a const char*).

It also says (at least it does in this C90 draft here) "... computes the
length of the string pointed to by s". The Hedgehog presumably counts
that as saying specifically that `strlen` takes a string (as opposed
to just any `char*` value).
 
K

Keith Thompson

Chris Dollin said:
It also says (at least it does in this C90 draft here) "... computes the
length of the string pointed to by s". The Hedgehog presumably counts
that as saying specifically that `strlen` takes a string (as opposed
to just any `char*` value).

No doubt. Unfortunately, the Hedgehog is mistaken (though I have no
doubt it was an innocent mistake). A pointer to a string is not a
string.
 
C

Chris Dollin

Keith said:
No doubt. Unfortunately, the Hedgehog is mistaken (though I have no
doubt it was an innocent mistake). A pointer to a string is not a
string.

My sloopiness. I forget that a string is the null-terminated-char-sequence
and tend to use "string" to mean pointer-to-ditto.
 
K

Kenneth Brody

main() { printf("%d\n",strlen(malloc(0))); }
[...]
The C library functions are only required to behave correctly if you call
them correctly. If you call them with random data, all bets are off.
(In this particular case, it may well produce a segmentation fault, because
malloc(0) may return null.)

No I believe malloc(0) can never return null - after all, how could it
not be possible to allocate 0 bytes of memory!
[...]

What you believe is irrelevent. Quoting 7.20.3:

If the size of the space requested is zero, the behavior is
implementation defined: either a null pointer is returned, or
the behavior is as if the size were some nonzero value, except
that the returned pointer shall not be used to access an object.

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
F

Francine.Neary

Enumerating all of the reasons why this causes UB is left as an
exercise to the reader.

I'll have a go:
1) Uses variadic function with no prototype in scope.
2) Whether or not malloc(0) returns null, it is UB to try to
dereference it, and strlen is going to dereference it all right...
3) ...and there's no reason to expect it to point to a string, as
strlen requires.
4) Uses %d as a format specifier for a size_t... though as the
compiler assumes that strlen (used without a prototype) returns an
int, maybe these two bugs cancel each other out.
5) Similarly, if the compiler assumes malloc returns int, and doesn't
know anything about strlen's arguments as there's no prototype around,
the conversion void * -> int -> char * is, I believe, implementation
defined.
6) Execution falls off the end of a non-void function without
returning a value.
 
F

Flash Gordon

I'll have a go:
1) Uses variadic function with no prototype in scope.
2) Whether or not malloc(0) returns null, it is UB to try to
dereference it, and strlen is going to dereference it all right...
3) ...and there's no reason to expect it to point to a string, as
strlen requires.
4) Uses %d as a format specifier for a size_t... though as the
compiler assumes that strlen (used without a prototype) returns an
int, maybe these two bugs cancel each other out.

No, they definitely produce one instance of UB for using a function
which does not return int without a prototype.
5) Similarly, if the compiler assumes malloc returns int, and doesn't
know anything about strlen's arguments as there's no prototype around,
the conversion void * -> int -> char * is, I believe, implementation
defined.

No, because there is no prototype in scope for malloc and it does not
return an int calling it invokes UB.

Passing an int to strlen without a prototype in scope invokes UB (with
one in scope it requires a diagnostic.

No conversions were used.
6) Execution falls off the end of a non-void function without
returning a value.

That returns an undefined status, some people argue it does not invoke
undefined behaviour.

Please don't quote peoples signatures, the bit typically after the "-- "
unless you are actually quoting on them.
 
B

Ben Bacarisse

4) Uses %d as a format specifier for a size_t... though as the
compiler assumes that strlen (used without a prototype) returns an
int, maybe these two bugs cancel each other out.

I would say that the %d is not in error. The compiler will arrange
that the function called strlen will return an int and an int will be
printed (it might be trap representation, but that is because of the
*other* problem you identified).

I think it amusing that one of the few correct things about this
one-line program will have to change if the rest of it is corrected!
6) Execution falls off the end of a non-void function without
returning a value.

You get to chose here. Falling off main is not a problem in C99 but
the implicit int in main's definition is -- take your pic based on
language standard.
 
K

Keith Thompson

Flash Gordon said:
On Sat, 18 Aug 2007 11:19:49 +0200, Antoninus Twink wrote:
main() { printf("%d\n",strlen(malloc(0))); }
Enumerating all of the reasons why this causes UB is left as an
exercise to the reader.
[...]
6) Execution falls off the end of a non-void function without
returning a value.

That returns an undefined status, some people argue it does not invoke
undefined behaviour.

In C90, it returns an undefined status. IMHO that's not undefined
behavior; it only affects the behavior of the environment, which is
outside the scope of the standard.

In C99, since the function in question is main, a special rule says
that falling off the end is equivalent to 'return 0;'. (But then, in
C99 the 'main()' declaration is a constraint violation.)
 
K

Keith Thompson

Ben Bacarisse said:
I would say that the %d is not in error. The compiler will arrange
that the function called strlen will return an int and an int will be
printed (it might be trap representation, but that is because of the
*other* problem you identified).

Not necesarily. strlen actually returns a size_t. Calling it as if
it returned an int (regardless of what's done with the result) invokes
undefined behavior. One of the infinitely many possible consequences
of this undefined behavior is that the compiler uses its knowledge of
the standard library and treats strlen as if it returned a size_t
(which, of course, it does). Passing this size_t to printf with a
"%d" format invokes UB again (and the compiler could pretend that the
format is really "%zu"). This would be overly helpful in my opinion
(if I make a mistake, I want the compiler to tell me about it, not to
fix it), but it's legal.

[...]
 
K

Keith Thompson

Keith Thompson said:
Not necesarily. strlen actually returns a size_t. Calling it as if
it returned an int (regardless of what's done with the result) invokes
undefined behavior. One of the infinitely many possible consequences
of this undefined behavior is that the compiler uses its knowledge of
the standard library and treats strlen as if it returned a size_t
(which, of course, it does). Passing this size_t to printf with a
"%d" format invokes UB again (and the compiler could pretend that the
format is really "%zu").
[...]

And the real point I think, is that once you have a single instance of
undefined behavior, all bets are off. It makes some sense to go
through a piece of code and enumerate the instances of UB (since each
one is something that needs to be fixed), but don't expect to come up
with a definitive list.
 
B

Ben Bacarisse

Keith Thompson said:
Not necesarily. strlen actually returns a size_t. Calling it as if
it returned an int (regardless of what's done with the result) invokes
undefined behavior. One of the infinitely many possible consequences
of this undefined behavior is that the compiler uses its knowledge of
the standard library and treats strlen as if it returned a size_t
(which, of course, it does).

Yes. I should have said no more than that the %d *may* not be wrong.
Some implementations might ignore what they might know of strlen thus
rendering the format, oddly, OK. This is probably no more than
Francine Neary was saying in the first place -- I have not added
anything!

Your other point, elsewhere, that once there is one UB all bets are
off makes this sort of exercise rather odd.
 
W

William Hughes

Eric Sosman wrote, On 19/08/07 15:43:


[...]
No I believe malloc(0) can never return null -
Would reading section 7.20.3 paragraph 1 of the language
Standard alter your belief?
"[...] If the size of the space requested is zero,
the behavior is implementation-defined: either a
null pointer is returned, or the behavior is as if
the size were some nonzero value, except that the
returned pointer shall not be used to access an object."
after all, how could it
not be possible to allocate 0 bytes of memory!
Most likely, because it fails to allocate the internal
bookkeeping space it uses for keeping track of the addresses
it has returned that have not yet been free()d.
There's a potentially interesting quibble here, for people
interested in quibbles. The Standard requires (same paragraph)
that memory obtained from malloc() be "disjoint" from all other
objects, which is not quite the same as requiring that it have
an address different from that of all other objects. Since the
value of malloc(0) cannot be used to access an object, it could
be argued that the program cannot test disjointness without
engaging in undefined behavior anyhow.

The following does not invoke undefined behaviour since you are always
allowed to test for equality (the first byte is 1 beyond the end which
is still OK or you could not free it)...

#include <stdlib.h>
#include <stdio.h>

int main(void)
{
void *p1 = malloc(0);
void *p2 = malloc(0);

if (p1==NULL || p2==NULL)
puts("At least one null pointer returned");
else if (p1==p2)
puts("Regions are not disjoint");
else
puts("Regions are disjoint");

free(p1);
free(p2);

return 0;

}

All this does is to check if p1 is the same as p2.
The question is

Does the fact that p1 = p2 mean that the memory area
pointed to by p1 is not disjoint to the memory area pointed
to by p2?

This clearly depends on the meaning assigned to disjoint. If we take

A memory area A is disjoint to a memory area B iff there does not
exist a byte that belongs to both A and B

then any two memory areas of zero bytes are disjoint, in particular,
a memory area of zero bytes is disjoint to itself.

So knowing that p1 is equal to p2 does not allow you to conclude
"Regions are not disjoint".

- William Hughes
 
F

Flash Gordon

Keith Thompson wrote, On 21/08/07 00:09:
Flash Gordon said:
On Sat, 18 Aug 2007 11:19:49 +0200, Antoninus Twink wrote:
main() { printf("%d\n",strlen(malloc(0))); }
Enumerating all of the reasons why this causes UB is left as an
exercise to the reader. [...]
6) Execution falls off the end of a non-void function without
returning a value.
That returns an undefined status, some people argue it does not invoke
undefined behaviour.

In C90, it returns an undefined status. IMHO that's not undefined
behavior; it only affects the behavior of the environment, which is
outside the scope of the standard.

By saying "some" I implied others did not thing that :)
In C99, since the function in question is main, a special rule says
that falling off the end is equivalent to 'return 0;'. (But then, in
C99 the 'main()' declaration is a constraint violation.)

Sine it would not compile as C99 I did not bother with C99 rules.
 
A

Army1987

Flash Gordon said:
On Sat, 18 Aug 2007 11:19:49 +0200, Antoninus Twink wrote:
main() { printf("%d\n",strlen(malloc(0))); }
Enumerating all of the reasons why this causes UB is left as an
exercise to the reader. [...]
6) Execution falls off the end of a non-void function without
returning a value.

That returns an undefined status, some people argue it does not invoke
undefined behaviour.

In C90, it returns an undefined status. IMHO that's not undefined
behavior; it only affects the behavior of the environment, which is
outside the scope of the standard.
Not completely. For example, it specifies when the functions
registered with atexit() etc. are called, when files are closed
etc.
A return from main() is not *completely* equivalent to directly
calling exit(). First, main() returns, then exit() is called.
This can be seen by using functions which use pointers to auto
variables of main() and register them with atexit().
So I think that in this case [main() without a return] the
implementation tries to call exit() with an indeterminate
argument. I don't have a copy of the C90 standard, but I think it
could be UB.
 
M

mark_bluemel

... malloc(0) returns a pointer to
some random place in memory

Not on some of the systems I work on. It returns (in total conformance
to the standard) NULL.

This makes for some difficulties when porting code which makes the
same assumption as you...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,015
Latest member
AmbrosePal

Latest Threads

Top