strcpy overlapping memory

J

JohnF

Any problem with code of the form
unsigned char s[999] = "0123456789";
strcpy(s+2,s+4);
which, in this case, would result in "01456789" ?

Someone reported a problem with one of my programs
that he resolved by replacing this kind of strcpy
with an equivalent memmove after valgrind reported
the overlapping memory. But when I asked him what
input he used, so I could reproduce the problem,
he couldn't remember, and then he couldn't reproduce
the problem himself.
 
J

JohnF

christian.bau said:
JohnF said:
Any problem with code of the form
unsigned char s[999] = "0123456789";
strcpy(s+2,s+4);
which, in this case, would result in "01456789" ?

Calling strcpy for overlapping memory is undefined behaviour.
As an example, an implementation could search for the trailing zero
in the source string, then start copying the source string from the
end. While doing that, it could load 16 byte at a time into registers
if the string is long and then write them out.

Thanks, Christian (ditto Azazel). Good point(s), and good example (maybe
a bit contrived, but, as you point out, perfectly possible). Easy enough
to fix, so I suppose I'll do that. I'd just have preferred to reliably
reproduce the problem first, so I could see exactly what's actually
happening. Thanks again, guys,
 
M

Michael Angelo Ravera

Any problem with code of the form
  unsigned char s[999] = "0123456789";
  strcpy(s+2,s+4);
which, in this case, would result in "01456789" ?

Someone reported a problem with one of my programs
that he resolved by replacing this kind of strcpy
with an equivalent memmove after valgrind reported
the overlapping memory. But when I asked him what
input he used, so I could reproduce the problem,
he couldn't remember, and then he couldn't reproduce
the problem himself.

In most implementations, strcpy (s+2, s+4) will work, but strcpy (s+4,
s+2) will cause trouble. It is completely valid for the
implementation, as others have said, to make the second form work
while the first one fails. The behavior is undefine because the
standards people wanted to give implementors the license to optimise
assuming that you wouldn't use either form.
 
L

lawrence.jones

Michael Angelo Ravera said:
In most implementations, strcpy (s+2, s+4) will work, but strcpy (s+4,
s+2) will cause trouble. It is completely valid for the
implementation, as others have said, to make the second form work
while the first one fails.

There are also implementations where one or both work under some
circumstances, but not others (e.g., when the strings are "too long"),
which can cause really insidious problems. There are also
implementations that copy in larger chunks than bytes (e.g., words)
which don't work in either case.
 
S

Seebs

Any problem with code of the form
unsigned char s[999] = "0123456789";
strcpy(s+2,s+4);
which, in this case, would result in "01456789" ?

It might. It also might not.
Someone reported a problem with one of my programs
that he resolved by replacing this kind of strcpy
with an equivalent memmove after valgrind reported
the overlapping memory. But when I asked him what
input he used, so I could reproduce the problem,
he couldn't remember, and then he couldn't reproduce
the problem himself.

Well, he's right. The behavior is undefined, and there's real systems
where that code, or similar code, might explode.

If you want to do overlapping moves, use mmemove. strcpy() is only
legit if the arguments do not overlap.

-s
 
J

JohnF

There are also implementations where one or both work under some
circumstances, but not others (e.g., when the strings are "too long"),

Thanks, everyone, for the additional remarks, and I can see
I should give up my habit of "shifting left" (squeezing out
substrings of) strings using strcpy.
But the above example's giving me a little problem. For strcpy
to "work under some circumstances", the compiler has to add
extra instructions to check the input strings and choose how to
proceed accordingly. The parenthetical example above needs to
determine strlen before proceeding. That's a lot of overhead.
I'd imagine so much overhead that any optimization gained by using,
say, one method for short strings and another for long ones would be
more than wiped out.
 
E

Eric Sosman

Thanks, everyone, for the additional remarks, and I can see
I should give up my habit of "shifting left" (squeezing out
substrings of) strings using strcpy.
But the above example's giving me a little problem. For strcpy
to "work under some circumstances", the compiler has to add
extra instructions to check the input strings and choose how to
proceed accordingly. The parenthetical example above needs to
determine strlen before proceeding. That's a lot of overhead.
I'd imagine so much overhead that any optimization gained by using,
say, one method for short strings and another for long ones would be
more than wiped out.

Quite often, some unusual optimizations "fall out" along the way.
For example, lots of implementations of string functions try to work
in bigger-than-char units if possible: If you can copy eight chars
per loop iteration instead of one, for example, you may gain enough
speed to make up for a more complicated loop. It's quite possible
that a strcpy() might move the first few characters in an obvious
way until the advancing pointers reach nice boundaries, and then
switch to a trickier method to move four or eight or sixteen at a
whack. Exactly what happens when source and destination overlap in
such a case could be very difficult to predict.

Anecdote: I was once doing speed tests on some sorting functions,
and wanted to try them with "fast" and "slow" comparators. My slow
comparator called strcmp(long_string, long_string) to waste some time,
and during initialization I'd adjust the length of long_string to make
the slow comparator take ~10 times as long as the fast one. All was
well -- until I tried my program on a previously-unmeasured O/S and it
ran out of memory during initialization. Poking around a bit, I found
that it was trying to grow long_string beyond the total size of memory,
because strcmp(long_string, long_string) was too fast ...

Yes, friends, the strcmp() implementation noticed that a string was
being compared to itself, and returned zero in constant time without
actually looking at the string's characters. The test sounds like a
time-waster (how often would a sane programmer call strcmp() with two
identical pointers?), but upon investigation I found that it pretty
much fell out of other tests that were being done to decide how many
characters strcmp() could gulp at a time. I switched my time-waster
to strcmp(long_string, long_string+1) and all was well.
 
S

Seebs

Thanks, everyone, for the additional remarks, and I can see
I should give up my habit of "shifting left" (squeezing out
substrings of) strings using strcpy.
But the above example's giving me a little problem. For strcpy
to "work under some circumstances", the compiler has to add
extra instructions to check the input strings and choose how to
proceed accordingly. The parenthetical example above needs to
determine strlen before proceeding. That's a lot of overhead.
I'd imagine so much overhead that any optimization gained by using,
say, one method for short strings and another for long ones would be
more than wiped out.

Ahh, but you can often tell which one you want in advance.

-s
 
J

J. J. Farrell

JohnF said:
Thanks, everyone, for the additional remarks, and I can see
I should give up my habit of "shifting left" (squeezing out
substrings of) strings using strcpy.
But the above example's giving me a little problem. For strcpy
to "work under some circumstances", the compiler has to add
extra instructions to check the input strings and choose how to
proceed accordingly. The parenthetical example above needs to
determine strlen before proceeding. That's a lot of overhead.
I'd imagine so much overhead that any optimization gained by using,
say, one method for short strings and another for long ones would be
more than wiped out.

I recommend basing optimization decisions on analysis and measurement
rather than imagination. You can't imagine how surprising the results
can be sometimes. The geniuses who spend months optimising these
algorithms in the libraries for each processor model aren't always
wasting their time. I couldn't believe the amount of code in the first
string copy implementation I saw, and was a little taken aback when such
obviously over-engineered over-complicated garbage was so much faster
than every alternative I could come up with.
 
L

lawrence.jones

JohnF said:
But the above example's giving me a little problem. For strcpy
to "work under some circumstances", the compiler has to add
extra instructions to check the input strings and choose how to
proceed accordingly.

Consider a machine with a move characters instruction with a maximum
length of 256 characters. A strcpy implementation might well use that
instruction, but have to add a loop around it in case the string is
longer than that. If the instruction handles overlapping moves
correctly but the loop always works the same way, then short strings
always work right, but longer strings may or may not work depending on
the direction and offset of the overlap.
 
M

Michael Foukarakis

Any problem with code of the form
  unsigned char s[999] = "0123456789";
  strcpy(s+2,s+4);
which, in this case, would result in "01456789" ?

Someone reported a problem with one of my programs
that he resolved by replacing this kind of strcpy
with an equivalent memmove after valgrind reported
the overlapping memory. But when I asked him what
input he used, so I could reproduce the problem,
he couldn't remember, and then he couldn't reproduce
the problem himself.

If your strcpy() copies backwards, you will most certainly corrupt
your buffer. A recent glibc change exposed several such bugs on
software like flash player on Linux, see [1].

[1] https://bugzilla.redhat.com/show_bug.cgi?id=638477
 
J

JohnF

Consider a machine with a move characters instruction with a maximum
length of 256 characters. A strcpy implementation might well use that
instruction, but have to add a loop around it in case the string is
longer than that. If the instruction handles overlapping moves
correctly but the loop always works the same way, then short strings
always work right, but longer strings may or may not work depending on
the direction and offset of the overlap.

Thanks again Larry, Eric, Seebs, JJ, etc.
And, like you guys said, I >>am<< surprised such runtime optimizations
work, i.e., that, on average, they save more time than they cost.
It also somewhat tarnishes my picture of C the way it's often
described as a "portable assembly language". In that picture, I'd
kind of hope that strcpy would just assemble to some straightforward
move instruction, along with whatever '\000' end-of-string check
is available in the particular instruction set. If they want
to add optimizations, they could at least reserve them for -O3,
or something like that.
And one other thing continues to surprise me: if they're adding
all this code to check args, I'd of thought they'd at least check
for NULL's. I've had more than a few programs segfault due to a
blunder during development passing strcpy a NULL arg. With all this
strcpy arg checking going on, couldn't they check for NULL?
 
K

Keith Thompson

JohnF said:
And one other thing continues to surprise me: if they're adding
all this code to check args, I'd of thought they'd at least check
for NULL's. I've had more than a few programs segfault due to a
blunder during development passing strcpy a NULL arg. With all this
strcpy arg checking going on, couldn't they check for NULL?

Check for NULL and do what?

If strcpy(s, NULL) quietly does nothing, the implementation isn't
doing you any favors. A seg fault or equivalent is the best thing
it can do to help you track down the error in your program.
 
B

BartC

Keith Thompson said:
Check for NULL and do what?

Pretend it's ""?
If strcpy(s, NULL) quietly does nothing, the implementation isn't
doing you any favors. A seg fault or equivalent is the best thing
it can do to help you track down the error in your program.

(And what would the application then do?)

If these functions were well-behaved with NULL arguments, it would save the
caller having to do the checks, or to do the checks and replace them with
pointers to empty strings.

NULL is not necessarily an error; it would be handy to sometimes deal with
NULL as though it was an empty string.
 
K

Keith Thompson

BartC said:
Pretend it's ""?
Why?


(And what would the application then do?)

Presumably it would crash -- and the developer would then fix the bug
so it doesn't crash next time.
If these functions were well-behaved with NULL arguments, it would save the
caller having to do the checks, or to do the checks and replace them with
pointers to empty strings.

NULL is not necessarily an error; it would be handy to sometimes deal with
NULL as though it was an empty string.

Or a string of length SIZE_MAX, maybe?

Passing a null pointer to strcpy() *is* necessarily an error. strlen()
could have been defined to return the length of string pointed to by its
arguments, or 0 if the argument is a null pointer. But it wasn't
defined that way. If you want such a function, feel free to write it.

NULL isn't a pointer to an empty string. It's a pointer value that
doesn't point to anything, and that's a valuable distinction that
I shouldn't be ignored by the standard library.
 
J

J. J. Farrell

Keith said:
BartC said:
Keith Thompson said:
[...]
And one other thing continues to surprise me: if they're adding
all this code to check args, I'd of thought they'd at least check
for NULL's. I've had more than a few programs segfault due to a
blunder during development passing strcpy a NULL arg. With all this
strcpy arg checking going on, couldn't they check for NULL?
Check for NULL and do what?
Pretend it's ""?
Why?
If strcpy(s, NULL) quietly does nothing, the implementation isn't
doing you any favors. A seg fault or equivalent is the best thing
it can do to help you track down the error in your program.
(And what would the application then do?)

Presumably it would crash -- and the developer would then fix the bug
so it doesn't crash next time.
If these functions were well-behaved with NULL arguments, it would save the
caller having to do the checks, or to do the checks and replace them with
pointers to empty strings.

NULL is not necessarily an error; it would be handy to sometimes deal with
NULL as though it was an empty string.

Or a string of length SIZE_MAX, maybe?

Passing a null pointer to strcpy() *is* necessarily an error. strlen()
could have been defined to return the length of string pointed to by its
arguments, or 0 if the argument is a null pointer. But it wasn't
defined that way. If you want such a function, feel free to write it.

NULL isn't a pointer to an empty string. It's a pointer value that
doesn't point to anything, and that's a valuable distinction that
I shouldn't be ignored by the standard library.

Not arguing the merits one way or another, but it's a little surprising
that things ended up defined this way. Some relatively mature versions
of UNIX had the assumption effectively built in that, in string context,
a null pointer was equivalent to a pointer to an empty string. There
were thousands of places in the SVR3 source which depended on that. It
was implemented by having the null pointer point to zero and mapping a
page of zeroes at zero so that the string functions didn't have to
detect or special-case it.
 
J

James Dow Allen

It also somewhat tarnishes my picture of C the way it's often
described as a "portable assembly language". In that picture, I'd
kind of hope that strcpy would just assemble to some straightforward
move instruction, along with whatever '\000' end-of-string check
is available in the particular instruction set. If they want
to add optimizations, they could at least reserve them for -O3,
or something like that.

As someone who also views C as a "portable assembly language" and loves
it for that reason, let me defend C here! A key meaning in that phrase
is C's *determinism* and if the end-result of strcpy() (when used
according to its rules!) is completely defined, what's to complain
about? (Anyway, strcpy() is part of a standard library, not the
language itself, though to say more on that would encourage a round of
semantic quibbling, as well as the objection that complications can
arise even in "parts of the language.")

And anyway, even machine languages can do elaborate checking and get
peculiar results. Speaking of overlapped copy, the IBM 370 has to check
for overlap in the copy instruction MVC DEST(8),SRC and, due to a
special wierdness on one model, it is possible to corrupt data with this
instruction *anyway*!

It isn't clear whether you were unaware of strcpy()'s overlap caveat or
were but chose to ignore it (?!). I also do overlapped copies from time
to time, but just roll my own olap_strcpy() for them. This is trivial
(the one-liner for strcpy() is famous!) and 99+% of the time squeezing
maximum efficiency is unnecessary.
And one other thing continues to surprise me: if they're adding
all this code to check args, I'd of thought they'd at least check
for NULL's.

If your debuggable code accidentally passes NULL, Segfault is *exactly
what you want* (in the absence of any other checking) to get a core-dump
at the earliest point.

(BTW, NULL == "" on some early Dig Equip C systems. This seemed not
necessarily bad, but obviously didn't "catch on"! :)

James Dow Allen
 
E

Eric Sosman

Keith Thompson said:
[...]
If strcpy(s, NULL) quietly does nothing, the implementation isn't
doing you any favors. A seg fault or equivalent is the best thing
it can do to help you track down the error in your program.

(And what would the application then do?)

If these functions were well-behaved with NULL arguments, it would save the
caller having to do the checks, or to do the checks and replace them with
pointers to empty strings.

NULL is not necessarily an error; it would be handy to sometimes deal with
NULL as though it was an empty string.

Yes. Unfortunately, strcpy() and strcmp() and so on are unable
to distinguish venial from mortal sins. Think about it: Put yourself
in the place of the strcpy() implementation, and imagine you've been
told "Copy *this* to *that*." You salute and start trying to carry
out your orders, and you find that *this* doesn't exist. How can you
know whether substituting "" for *this* is a suitable response, or
whether you should fire three warning shots into the air and sound
the klaxon? Answer: You can't know. You're a lowly private, not in
the confidence of the strategic thinkers at HQ, and it is *not* your
place to "follow" your orders by fudging the results.

Or, shifting the analogy a bit:

"Puhkins."

"Yes, Mister Bank Manager, sahh?"

"Do be a good lad and make sure someone's guarding the gold
vault, won't you?"

"Veddy good, sahh. Right away, sahh." (Hmmm: Nobody's there,
the door is flapping in the breeze, and there's no sign of gold in
the empty interior.) "Nothing to worry about, sahh; NULL is on
guard."

"Splendid, Puhkins. Knew we could count on you. Here's half
a ha'penny for your good wife, and I do hope and trust she's put an
end to her sordid affair with that bounder NULL."
 
M

Mark Bluemel

"Puhkins."

"Yes, Mister Bank Manager, sahh?"

"Do be a good lad and make sure someone's guarding the gold
vault, won't you?"

"Veddy good, sahh. Right away, sahh." (Hmmm: Nobody's there,
the door is flapping in the breeze, and there's no sign of gold in
the empty interior.) "Nothing to worry about, sahh; NULL is on
guard."

"Splendid, Puhkins. Knew we could count on you. Here's half
a ha'penny for your good wife, and I do hope and trust she's put an
end to her sordid affair with that bounder NULL."

I think they need to adjust your medication....
 
A

arnuld

Check for NULL and do what?

If strcpy(s, NULL) quietly does nothing, the implementation isn't doing
you any favors. A seg fault or equivalent is the best thing it can do
to help you track down the error in your program.

In my case, strcpy(s, NULL) Segfaults. I have done this mistake of not
checking arguments for NULL before passing them to strcpy(). Is there
anything wrong in checking for NULL before using Std Lib's string
functions ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top