David said:
You're missing his point. Once you've been steeped in C lore, of course
you know what strcat() does. Any good C programmer does, once they've
learned the language. But learning the language, in this case, consists
of unlearning your intuition about what it means to concatenate strings.
The natural intuition about what strcat() should do would be that it
concatenates strings ("str" "cat", get it?). And there is a natural
definition of what it means to concatenate strings. Unfortunately,
C's strcat() does not use that natural definition; it does not match
the natural intuition you might have before you were exposed to C.
That's a problem, because it means that learning C requires unlearning
your intuition. Unlearning an old intuition and learning a new one
is twice as hard as learning something that you never had any prior
intuition about. Strcat() is just one example of this kind of phenomenom;
it occurs in many funny places in the C language. The folks here are
so steeped in C that they've probably forgotten what it was like to
originally learn C and be surprised by some of its oddball semantics.
Those oddball semantics were justified in the days of PDP-11 where CPU
performance was more important than making the programmer's life easier.
Today, those design choices are debateable.
What do you think that strcat should do?
Intuitively, for someone who has never used C but only other languages,
the unintuitive thing about strcat is not that strcat(p,p) doesn't
work, but that strcat("hello ", "world!") doesn't work!
Intuitively for a newbie, strcat(x,y) doesn't modify y nor x and
returns a new string...
Obviously, this semantic shall not change in next C standards (though,
higher level string functions might be added).
Now, more seriously.
Consider a C programmer who has just learnt that strcat doesn't do what
he expected (i.e. he now understands that there the dest string must be
large enough to contain new characters, etc.).
In order to learn that he read the manual... And I do think that any
decent manual should warn the reader than dest and src must not be
aliased.
I just read the man page (in French) and that's mentioned. Good.
This programmer should have no real problem.
If you read a C99 manual, that would even be more obvious (thanks to
the restrict keyword).
Moreover, C99 programmers are very acustomed to the "restrict" keyword,
and know that aliasing might fail anywhere if they don't check the
manual.
But I fear that there are many Bad Manuals that don't specify it... (I
would like a confirmation, since I'm not sure).
Now, a second question:
For an intermediate C programmer who knows C enough to understand the
design principles of the C library... Does the expression strcat(p,p)
seems dubious enough to him that he checks the manual as soon as he
wants to use it?
The answer is not obvious... I think that many programmer would.... But
perhaps some programmers would not... In the latter case, it's really
harmful.
If that latter case is frequent enough, it might be worth revisiting
the specification of strcat
Now, I think that there is a far more unintuitive semantic in the C
standard library, that may produce bugs in the code of advanced
programmers... I think it's unintuitive even for an experienced
programmer who think to know well the language.
That's the semantic of memcpy and memove if their third argument is 0.
It requires that pointer point to objects, implying that:
memcpy(dest, NULL, 0);
Has undefined behavior (though, it doesn't crash on implementations I
know).
1) I've read the man page of my Linux distrib, and that thing is NOT
specified in it!
I fear that most docs don't talk about that issue at all.
2) If the standard specified that this is valid, and do nothing,
implementations would have only either a zero-overhead on
non-obfuscated platforms, or a minor overhead on those obfuscated
platforms, namely, a conditional test:
if (n>0) {
/* do the stuff */
}
Instead of:
/* do the stuff */
This overhead is almost negligible on platforms where memcpy and memove
are not inlined (and, memcpy and memove are not often inlined, at least
with implementations I know).
This overhead might be non-negligible on platforms where memcpy and
memove are inlined, though it's not huge at all, and programmers are
acustomed to the fact that calling memcpy or memove on very small
memory blocks has an inertial non-negligible overhead and program in a
way that doesn't decrease greatly performances if memcpy or memove have
this inertial overhead.
My point is that, specifying that memcpy(dest, NULL, 0) is ok, would be
almost negligible on all existing platforms, and would have a
zero-overhead on many common platforms.
Now, allowing aliasing in strcat is probably far less benefitial, since
it's far less counter-intuitive for intermediate & advanced
programmers, very uncommon (code such as memcpy(dest,
ptr_that_might_be_null, size_that_might_be_zero) is not uncommon).
One might think that it's hard to write an efficient implementation of
strcat accepting pointer aliasing.
But there seems to be a portable, efficient implementation:
char * safestrcat (char * dst, const char * src) {
if (*src) {
char * dend = dst + strlen (dst);
strcpy (dend + 1, src + 1);
*dend = *src;
}
return dst;
}
Now tell me this cannot be translated to an optimzed solution on any platform.
It seems good to me. There seems to be no aliasing problem, and there
seems to be no overhead.
In that case, it might be worth considering either adding a safestrcat
function to the standard in the spirit of memcpy vs memove, or update
the specification of strcat (probably the simpliest solution) and
optionally provide a no_alias_strcat.