Standards & Library functions

R

Richard G. Riley

In another thread it was pointed out that I'd made a booboo with
strcpy : one that that I've, if I'm honest, made many times
before. Not out of badness, just because since I first programmed C
back in 1986 (and have done so for about 25 % of the time since then)
or so I never really looked at the manpage for strcpy :
this combined with K&Rs famous pointer lessons which lead to 2 or 3
versions of a linear start to finish strpy implementatin meant that on
some occasions I used strcpy to move blocks of memory which may
overlap in a character buffer. Bad. Sloppy. This combined with swapping
between languages probably made me a little careless.

Anyway, my question is this:

Why would any language comittee decide to make strcpy work in an
undefined manner for an overlapping object? It seems to me to be as
valid to do the same for memmove.

In the platform/compiler implementation for strcpy you can do
something very quick/CPU specific from start to finish (which doesnt
mind about overlap) or not as you please. If there is overlap and the
instructions being used would corrupt the operation then, and only
then, branch off to a more robust copy using a call to memmove or
something similar. The use of memmove invariably results in a possibly
unnecessary strlen so why not just wrap it all in the strcpy function?

Any comments appreciated
 
R

Robert Gamble

Richard said:
In another thread it was pointed out that I'd made a booboo with
strcpy : one that that I've, if I'm honest, made many times
before. Not out of badness, just because since I first programmed C
back in 1986 (and have done so for about 25 % of the time since then)
or so I never really looked at the manpage for strcpy :
this combined with K&Rs famous pointer lessons which lead to 2 or 3
versions of a linear start to finish strpy implementatin meant that on
some occasions I used strcpy to move blocks of memory which may
overlap in a character buffer. Bad. Sloppy. This combined with swapping
between languages probably made me a little careless.

Anyway, my question is this:

Why would any language comittee decide to make strcpy work in an
undefined manner for an overlapping object? It seems to me to be as
valid to do the same for memmove.

Because the vast majority of the time you are not copying overlapping
objects and it can be a lot more efficient to assume the objects don't
overlap. If the objects might overlap you can always use memmove.
In the platform/compiler implementation for strcpy you can do
something very quick/CPU specific from start to finish (which doesnt
mind about overlap) or not as you please.

But you can usually perform the operation quicker if you don't have to
worry about the possibility of overlap.
If there is overlap and the
instructions being used would corrupt the operation then, and only
then, branch off to a more robust copy using a call to memmove or
something similar.

Are you suggesting that strcpy try to determine of the objects overlap
and behave accordingly? Why do you think the strcpy function should be
making this decision over the programmer and how exactly would strcpy
determine if the objects do overlap?
The use of memmove invariably results in a possibly
unnecessary strlen so why not just wrap it all in the strcpy function?

Why would memmove call strlen? Memmove does not operate on strings, it
operates on a specified number of bytes. The difference between
memmove and memcpy (which has the same overlapping restriction as
strcpy) is that the former operates as if it had first copied the
source into a new object avoiding the issue of overlap.

Robert Gamble
 
J

Jordan Abel

In another thread it was pointed out that I'd made a booboo with
strcpy : one that that I've, if I'm honest, made many times before.
Not out of badness, just because since I first programmed C back in
1986 (and have done so for about 25 % of the time since then) or so I
never really looked at the manpage for strcpy : this combined with
K&Rs famous pointer lessons which lead to 2 or 3 versions of a linear
start to finish strpy implementatin meant that on some occasions I
used strcpy to move blocks of memory which may overlap in a character
buffer. Bad. Sloppy. This combined with swapping between languages
probably made me a little careless.

Anyway, my question is this:

Why would any language comittee decide to make strcpy work in an
undefined manner for an overlapping object? It seems to me to be as
valid to do the same for memmove.

It's hard (possibly impossible) to implement a single-pass string
copying implementation that will behave well when moving a block from an
earlier position to a later position in the buffer, and it's a rare
enough need that it is senseless to add the extra overhead of
[effectively] strlen+memmove to the function.

And since you seem to want it to act like a "linear start to finish"
implementation, that would mean saying that it's undefined if the
destination overlaps the source to the right, and not undefined if it
overlaps it to the left - and that would be ugly and would basically
mandate a particular implementation.
 
J

Jordan Abel

Are you suggesting that strcpy try to determine of the objects overlap
and behave accordingly? Why do you think the strcpy function should be
making this decision over the programmer and how exactly would strcpy
determine if the objects do overlap?

I think he wants to be able to allow the destination to overlap the
source to the left, while still leaving it undefined if it overlaps to
the right. Since that's what some particular implementation he's used in
the past does.
 
M

Mark McIntyre

Why would any language comittee decide to make strcpy work in an
undefined manner for an overlapping object?

Because its designed for copying strings, and generally speaking, you
want to copy from one string to another, not from a string to itself?
t seems to me to be as
valid to do the same for memmove.

Well, memmove is for moving arbitrary chunks of memory, not for
copying strings. I don't consider the two very similar.

Mark McIntyre
 
E

Eric Sosman

Richard said:
[...]
Why would any language comittee decide to make strcpy work in an
undefined manner for an overlapping object? It seems to me to be as
valid to do the same for memmove. [...]

Speed, or at least the possibility of speed. The fewer
corner cases a library function must worry about, the more
freedom the implementation has to use assorted tricks to
make it go faster. Most library functions are not required
to handle overlapping sources and destinations.

memmove() is an oddity in that it is well-defined even if
source overlaps destination. Note that there is also a memcpy()
function that "does the same thing," but whose behavior is *not*
defined in the case of overlap -- an implementor may be able to
provide a "faster" memcpy() and a "safer" memmove(). One might
imagine a strmove() function that bears the same relation to
strcpy() as memmove() does to memcpy(), but there seems to be
little demand for it. Perhaps if you can find enough like-minded
compatriots you could lobby the C0x committee to include such a
thing in the next Standard.

Why this worship of speed? I'm among those who regularly
discourage over-aggressive optimization of code: if you spend
an extra hour researching, developing, and testing a trick that
saves one microsecond per execution, you need to execute the
code 3.6 billion times just to break even. Most code doesn't
execute that many times, so why am I suddenly doing an about-
face and defending aggressively optimized strcpy()?

Because library functions really do have enormous execution
counts. I wouldn't worry about optimizing abort() or setvbuf(),
but strcpy() and sqrt() and getc() and printf() and ... These
functions are used heavily, and by many many programs, so it
makes sense to optimize them aggressively. (The cynic in me says
that the functions used by standard benchmark suites are especially
likely candidates for "creative" optimization -- a colleague at a
PPOE told of a compiler that replaced printf("Hello, world!\n")
with puts("Hello, world!") so as to avoid interpreting a format
string!) At any rate, the Standard committee felt that strcpy()
was one of those functions where aggressive optimization ought
to be allowed, so they granted it a license to ignore certain
corner cases they thought relatively uncommon.
 
J

Jordan Abel

Richard said:
[...]
Why would any language comittee decide to make strcpy work in an
undefined manner for an overlapping object? It seems to me to be as
valid to do the same for memmove. [...]
Why this worship of speed? I'm among those who regularly
discourage over-aggressive optimization of code: if you spend an extra
hour researching, developing, and testing a trick that saves one
microsecond per execution, you need to execute the code 3.6 billion
times just to break even. Most code doesn't execute that many times,
so why am I suddenly doing an about-face and defending aggressively
optimized strcpy()?

Because library functions really do have enormous execution
counts. I wouldn't worry about optimizing abort() or setvbuf(), but
strcpy() and sqrt() and getc() and printf() and ... These functions
are used heavily, and by many many programs, so it makes sense to
optimize them aggressively. (The cynic in me says that the functions
used by standard benchmark suites are especially likely candidates for
"creative" optimization -- a colleague at a PPOE told of a compiler
that replaced printf("Hello, world!\n") with puts("Hello, world!") so
as to avoid interpreting a format string!)

That might sound like benchmark cheating, but given how commonly printf
is used for a static string ending in a newline in the real world, it's
not really. (and, on the other hand, how often is Hello, World used as a
benchmark?) (note: gcc does this, and it does it consistently for both
"any\n" and "%s\n".)

On some systems, perror() is optimized to the point of requiring extra
handling in freopen() to be able to properly deal with reopening stderr.
At any rate, the Standard committee felt that strcpy() was one of
those functions where aggressive optimization ought to be allowed, so
they granted it a license to ignore certain corner cases they thought
relatively uncommon.

Plus, the trivial implementation, with no optimization at all, DOES
cause undefined behavior if the destination overlaps the source on the
right.
 
R

Richard G. Riley

Because the vast majority of the time you are not copying overlapping
objects and it can be a lot more efficient to assume the objects
don't

I think you miss the point. If you dont copy backwards then there is
no overlap issue if the "from" is after the "start" : no checks
required. This can be documented rather thn a rathre offhand "no
verlap at all". remember that the overhead IS there for memmove : you need a
strlen() call as I metioned.
overlap. If the objects might overlap you can always use memmove.

But you can usually perform the operation quicker if you don't have to
worry about the possibility of overlap.

The overlap would only causes an issue in the previously mentioned
case wouldnt it? Again, there is an overhead in memmove too : the
strlen required.
Are you suggesting that strcpy try to determine of the objects overlap
and behave accordingly? Why do you think the strcpy function should be
making this decision over the programmer and how exactly would strcpy
determine if the objects do overlap?

I dont want it to do anything : I just dont want it to be "undefined"
in the case discusses. Fine its not, I just would have thought it
wasnt such a big thing.
Why would memmove call strlen? Memmove does not operate on strings, it
operates on a specified number of bytes. The difference between
memmove and memcpy (which has the same overlapping restriction as
strcpy) is that the former operates as if it had first copied the
source into a new object avoiding the issue of overlap.

Yes. I know. And thats why I am asking. You need a strlen because to
use memmove you need to know the length of the area being moved.

And anyway, a 2 pointer comparison is hardly a huge overhead is it?
 
R

Richard G. Riley

don't

I think you miss the point. If you dont copy backwards then there is
no overlap issue if the "from" is after the "start" : no checks
required. This can be documented rather thn a rathre offhand "no
verlap at all". remember that the overhead IS there for memmove : you need a
strlen() call as I metioned.



I've reconsidered all this. While I know what I would have done, I can
also see why they did what they did : so no more arguments/discussion
from me. And also apologies for the even more than usual typo content
on the last post
 
R

Robert Gamble

Richard said:
don't

I think you miss the point. If you dont copy backwards then there is
no overlap issue if the "from" is after the "start" : no checks
required. This can be documented rather thn a rathre offhand "no
verlap at all". remember that the overhead IS there for memmove : you need a
strlen() call as I metioned.

So now you also want the Standard to dictate how to implement strcpy?
The overlap would only causes an issue in the previously mentioned
case wouldnt it? Again, there is an overhead in memmove too : the
strlen required.


I dont want it to do anything : I just dont want it to be "undefined"
in the case discusses. Fine its not, I just would have thought it
wasnt such a big thing.


Yes. I know. And thats why I am asking. You need a strlen because to
use memmove you need to know the length of the area being moved.

Yes, if you are using memmove to copy strings and don't know the length
of the source string you will need to use strlen, I didn't get your
point the first time around.
And anyway, a 2 pointer comparison is hardly a huge overhead is it?

Do you mean to determine if the objects overlap? If the objects don't
overlap then the pointer comparision is undefined. If you do determine
through some other method that the strings don't overlap it still
doesn't mean that the objects which contain the strings don't overlap
which, from my reading of the Standard, would result in undefined
behavior in the current definition of strcpy.

Robert Gamble
 
E

Eric Sosman

Richard G. Riley wrote On 03/20/06 12:21,:
[... to detect source/destination overlap in strcpy() ...]
And anyway, a 2 pointer comparison is hardly a huge overhead is it?

Note that when using < and other relational operators
on pointers, the two pointers must point to elements of
the same array (or one past the end). Comparisons between
"random" pointers are not well-defined. (It's true that
the Standard library functions need not be written in C
and need not fret about portability issues. Still, it's
a bit much to *require* strcpy() and friends to engage
in non-Standard behavior, don't you think? Especially
since there may be platforms where different "banks" of
memory really do have incommensurate addresses ...)
 
R

Richard G. Riley

So now you also want the Standard to dictate how to implement
strcpy?

All standards dictate how something is implemented : in terms of the
end result that is. I never suggested how it should be
implemented : just defined. I suggested that strcpy could indeed be defined for
source>dest that was all because of the common move, increment repeat
instructions found for doing such a copy. Anyway, I have reconsidered
all this and can see why they just said "sod it, leave it to the
programmer".

I think my main argument came from being gobsmacked at my own mistake
which was born out of too much "familiarity" :)

cheers,
 
R

Richard G. Riley

Richard G. Riley wrote On 03/20/06 12:21,:
[... to detect source/destination overlap in strcpy() ...]
And anyway, a 2 pointer comparison is hardly a huge overhead is it?

Note that when using < and other relational operators
on pointers, the two pointers must point to elements of
the same array (or one past the end). Comparisons between
"random" pointers are not well-defined. (It's true that
the Standard library functions need not be written in C
and need not fret about portability issues. Still, it's
a bit much to *require* strcpy() and friends to engage
in non-Standard behavior, don't you think? Especially

No one asked them too :) I was merely questioning the "standard"
itself - but its all done & dusted now. Thanks for taking the time to
reply

Now, could you define "random" pointers?

Sounds fishy to me : how else would home brew memory managers work in
a defined manner if you cant compare pointers pointing to differently
malloced memory blocks?
since there may be platforms where different "banks" of
memory really do have incommensurate addresses ...)

The old fallback :-;
 
K

Keith Thompson

Eric Sosman said:
Richard G. Riley wrote On 03/20/06 12:21,:
[... to detect source/destination overlap in strcpy() ...]
And anyway, a 2 pointer comparison is hardly a huge overhead is it?

Note that when using < and other relational operators
on pointers, the two pointers must point to elements of
the same array (or one past the end). Comparisons between
"random" pointers are not well-defined. (It's true that
the Standard library functions need not be written in C
and need not fret about portability issues. Still, it's
a bit much to *require* strcpy() and friends to engage
in non-Standard behavior, don't you think? Especially
since there may be platforms where different "banks" of
memory really do have incommensurate addresses ...)

Requiring strcpy() to do implementation-specific pointer manipulation
to detect overlaps wouldn't be a tremendous burden. After all, the
standard already requires it for memmove(). If different memory
"banks" have incommensurate addresses, it's not much of a problem; if
the two addresses are in different banks, the objects don't overlap.

It wouldn't be a tremendous burden, but it would be a significant one.

Making strcpy() work for all cases of overlap would impose run-time
overhead on all strcpy() calls, the vast majority of which have
operands that don't overlap. Making it work for overlap in one
direction only (so the usual left-to-right byte-by-byte copy would
work) would make the description more complex. It might also
interfere with some optimizations that copy a word at a time.

If this functionality were to be added to the standard, the best way
to do it would be to leave strcpy() as it is and add a new strmove()
function, analagous to memmove(). But I don't believe it would be
worth doing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top