strcpy overlapping memory

A

arnuld


Can't speak for what Bart saw and I had this issue many times in real
life code, which was being maintained by me but written by some
programmers a few years back and I was checking for Segfaults (at some
other place in code) but then later came to know it was causing troubles
not because pointer was NULL to strcpy() but because it was pointing to
an empty string which was crashing the program at some random locations
in code. Now I had to put 2 checks: one for NULL and one for whether
string is empty :-/

Presumably it would crash -- and the developer would then fix the bug so
it doesn't crash next time.

Yeah right. Thats what I did but I cursed original programmer a lot.


Passing a null pointer to strcpy() *is* necessarily an error. strlen()
could have been defined to return the length of string pointed to by its
arguments, or 0 if the argument is a null pointer. But it wasn't
defined that way. If you want such a function, feel free to write it.

I don't get you here. Whats the problem and whats the solution, swept
above my head.


NULL isn't a pointer to an empty string. It's a pointer value that
doesn't point to anything, and that's a valuable distinction that I
shouldn't be ignored by the standard library.

If its ignored it means its a C Standard Library :p
 
K

Keith Thompson

arnuld said:
In my case, strcpy(s, NULL) Segfaults. I have done this mistake of not
checking arguments for NULL before passing them to strcpy(). Is there
anything wrong in checking for NULL before using Std Lib's string
functions ?

It's best to write your code so you don't pass a null pointer in
the first place. For example, if both arguments are the names
of array objects, there's no need to check. Similarly, if both
arguments are pointer objects that cannot have null pointer values
due to the program logic, you also don't need to check.

You *must not* pass a null pointer to strcpy(). If checking is
the best way to avoid that, by all means check.

There is on possible drawback. Suppose you have some
string-processing function that calls strcpy(). If your function
decides to treat null pointers as if they were pointers to empty
strings, that encourages a certain sloppiness on the part of any
callers.

To put it another way, trying to use a null pointer as if it
pointed to a string is a logical error. IMHO, catching that error
is usually better than pretending it's not there.
 
K

Keith Thompson

arnuld said:
Can't speak for what Bart saw and I had this issue many times in real
life code, which was being maintained by me but written by some
programmers a few years back and I was checking for Segfaults (at some
other place in code) but then later came to know it was causing troubles
not because pointer was NULL to strcpy() but because it was pointing to
an empty string which was crashing the program at some random locations
in code. Now I had to put 2 checks: one for NULL and one for whether
string is empty :-/

It's impossible to tell from your description what the problem was.
A pointer to an empty string is a perfectly valid argument for a
function that operates on strings. It may or may not have been a
valid argument for the particular function you were dealing with.
(For example, a function that parses a string representing a number
can't do anything sensible with an empty string.)

[...]
I don't get you here. Whats the problem and whats the solution, swept
above my head.

I see I mixed up strcpy() and strlen() in the above; I hope that didn't
cause too much confusion.

strlen()'s argument is char* expression that needs to be a pointer to a
string. A null pointer doesn't point to a string, passing a null
pointer to strlen() is a logical error. The ideal solution is not to
try to call strlen() with a null pointer in the first place. If you
have a char* variable for which you need the length of the string it
points to, if your program logic is designed correctly, there should
be no possibility that it's a null or otherwise invalid pointer.

That's an ideal situation. There are times when run-time checks are
necessary. And what your program should do if the pointer is null
depends on the requirements of the program:

char *s = /* ... */; /* may or may not be a null pointer
if (s != NULL) {
len = strlen(s);
/* ... */
}
else {
/*
* What's the right thing to do here? It's impossible to
* tell without more information.
*/
}

In most cases, the logical error occurred when s was set to a null
pointer value.
If its ignored it means its a C Standard Library :p

Hmm?
 
I

ImpalerCore

It's best to write your code so you don't pass a null pointer in
the first place.  For example, if both arguments are the names
of array objects, there's no need to check.  Similarly, if both
arguments are pointer objects that cannot have null pointer values
due to the program logic, you also don't need to check.

You *must not* pass a null pointer to strcpy().  If checking is
the best way to avoid that, by all means check.

There is on possible drawback.  Suppose you have some
string-processing function that calls strcpy().  If your function
decides to treat null pointers as if they were pointers to empty
strings, that encourages a certain sloppiness on the part of any
callers.

There is a negative side to your story. When a library function
(calling it 'my_str_func') that uses strcpy or the other str family of
functions that bomb on NULL don't perform any systematic validation of
its arguments, you've condemned all users of 'my_str_func' to handle
the checking apriori. And most of us know that the track record of
many developers will either ignore argument checking, implement it
erroneously, or do it inconsistently. Even those that perform all the
necessary checks, the amount of issues with "checking code" to support
validation is a huge maintenance burden that is proportional to the
number of users and the number of called instances of the
'my_str_func' library function.

If the checking is done internal to the 'my_str_func' library
function, many of those issues go away. Of course the new issue is
that no one can agree on what the response to NULL pointer argument
errors should be, (ignore, abort, print an error message, return a
error value, set a global errno, etc). And unfortunately, the best
response is dependent on the situation (development, test or
production) and programmer preferences.
To put it another way, trying to use a null pointer as if it
pointed to a string is a logical error.  IMHO, catching that error
is usually better than pretending it's not there.

I agree that ultimately this is what most developers want, but this
gets back to the whole dichotomy of whether to design library
functions that are "robust" to developer mistakes.

Do you want strcpy within 'my_str_func' to crash the application when
the developer goofs (great for development, poor for the user
experience), or gloss over the error (maybe with an annoying message)
and rely on tester or user observations to detect some software flaw
(poor for development, can be better [handle error gracefully] or poor
[corrupting data] for users depending on situation and perspective)?

It's not a clear cut picture, at least to me. I think the glib
'g_return_if_fail' and 'g_return_val_if_fail' macros are pretty good
if you want to incorporate the early-return style of error checking
within 'my_str_func'. To get the flexibility, I use a modified
version of these macros to use global function pointers to abstract
the reporting (useful to alert developers and annoy users like
printing a message to stderr, or adding it to a queue of error
messages shown in a window) and response (abort or do nothing) steps.
For example, in development or test, I can make the response step
abort to make it feel more like design by contract, but in production
it can be a no-op or prompt to save the user's work and quit. It's
not perfect, but it's more flexible than using assert and feels better
than the alternatives I've tried.

Best regards,
John D.
 
M

Michael Press


Can't speak for what Bart saw and I had this issue many times in real
life code, which was being maintained by me but written by some
programmers a few years back and I was checking for Segfaults (at some
other place in code) but then later came to know it was causing troubles
not because pointer was NULL to strcpy() but because it was pointing to
an empty string which was crashing the program at some random locations
in code. Now I had to put 2 checks: one for NULL and one for whether
string is empty :-/[/QUOTE]

Next step you took was a global search in the code for `strcpy',
and the usual suspects.
 
J

JohnF

James Dow Allen said:
JohnF said:
It also somewhat tarnishes my picture of C the way it's often
described as a "portable assembly language". In that picture, I'd
kind of hope that strcpy would just assemble to some straightforward
move instruction, along with whatever '\000' end-of-string check
is available in the particular instruction set. If they want
to add optimizations, they could at least reserve them for -O3,
or something like that.

As someone who also views C as a "portable assembly language"
[...] A key meaning in that phrase is C's determinism

Never heard that before. What, precisely, is "determinism"
supposed to mean in this context?
and if the end-result of strcpy() (when used
according to its rules!) is completely defined,
what's to complain about?

My personal (repeat, >>personal<<) complaint would be
that if C instructions aren't compiled/assembled in
a straightforward way to machine instructions, then
I can't reliably visualize the code I'm generating.
Although different instruction sets do differ, they
nevertheless typically have lots in common. As an
old assembly programmer, I kind of know what I should
be able to expect from something calling itself
a "portable assembly language". In particular, I kind
of know how something calling itself strcpy should
be compiled. And that implicit understanding tells me
that strcpy(s,s+n) ought to be safe (as long, of course,
as strlen(s)>n). And it also tells me, of course, that
the reverse strcpy(s+n,s) is totally ridiculous.
It isn't clear whether you were unaware of strcpy()'s
overlap caveat or were but chose to ignore it (?!).

Hint: read this thread's Subject :)
As explained just above, I relied on the "portable
assembly language" idea to infer that strcpy(s,s+n)
would behave universally predictably and safely.
Won't make that mistake again. And I'd never have
made that mistake with, say, C++. But I'd like to have
thought that C (without optional -O3 type optimizations)
would compile straightforwardly/transparently/etc.
(the one-liner for strcpy() is famous!) and 99+% of
the time squeezing maximum efficiency is unnecessary.

Yeah, that's another good point. strcpy is typically (99+%)
used for parsing input, formatting output, etc, not within
heavy computational tasks. So optimizing strcpy is pretty
much a total waste, e.g.,
for (answer=0,i=0; i<10zillion; i++)
whatever;
format and display answer
in some way using sprintf,strcpy,etc;
What genius wastes his time optimizing the strcpy?
If your debuggable code accidentally passes NULL,
Segfault is exactly what you want

Might be what >>you<< want, but it's hard to speak for everybody.
Of course, I always check (except for blunders), and so don't
really care what it does. But I have lots of functions that take
optional char *args, which can be NULL or "\0" for default
behavior (and where I don't want to use variable arg lists).
Some checks would be unnecessary if strcpy, etc, just behaved
"nicely" when passed NULL's. But, again, I couldn't really
care less. My remark was just an observation, not a complaint.
(BTW, NULL == "" on some early Dig Equip C systems. This seemed not
necessarily bad, but obviously didn't "catch on"! :)

Can't recall what tops10 C did (I worked on the third ka10 processor
ever built), but vms's string (and other) descriptors were typically
a total pain in the neck.
 
S

Seebs

My personal (repeat, >>personal<<) complaint would be
that if C instructions aren't compiled/assembled in
a straightforward way to machine instructions, then
I can't reliably visualize the code I'm generating.

You're doing it wrong. Don't visualize it as assembly, visualize
it as the abstract machine.
Although different instruction sets do differ, they
nevertheless typically have lots in common. As an
old assembly programmer, I kind of know what I should
be able to expect from something calling itself
a "portable assembly language".

No, you don't.
In particular, I kind
of know how something calling itself strcpy should
be compiled.

No, you don't.

Your expectations are wrong. You have misunderstood the
point of the "portable assembly language".

In particular, library functions are not instructions, they're
potentially extremely elaborate hunks of functionality. C gives
you a pretty good guess as to what happens for "x += 2". It isn't
supposed to tell you which single machine instruction is used
to handle printf("%-02x", a->b[3].c()).

-s
 
A

arnuld

It's impossible to tell from your description what the problem was. A
pointer to an empty string is a perfectly valid argument for a function
that operates on strings. It may or may not have been a valid argument
for the particular function you were dealing with. (For example, a
function that parses a string representing a number can't do anything
sensible with an empty string.)

It *was* about parsing a string :) . Only difference is, I was using that
program as input to my program and my program always said "Invalid
Input". Only later I figured out when input to that program was either
NULL or empty string then it used ti output strange. I even saw inverted
question mark being sent to my program :-o , which made me realize the
words like "uninitialized memory" and "garbage".



strlen()'s argument is char* expression that needs to be a pointer to a
string. A null pointer doesn't point to a string, passing a null
pointer to strlen() is a logical error. The ideal solution is not to
try to call strlen() with a null pointer in the first place. If you
have a char* variable for which you need the length of the string it
points to, if your program logic is designed correctly, there should be
no possibility that it's a null or otherwise invalid pointer.

That's an ideal situation. There are times when run-time checks are
necessary. And what your program should do if the pointer is null
depends on the requirements of the program:

char *s = /* ... */; /* may or may not be a null pointer if (s !=
NULL) {
len = strlen(s);
/* ... */
}
else {
/*
* What's the right thing to do here? It's impossible to * tell
without more information.
*/
}

In most cases, the logical error occurred when s was set to a null
pointer value.

I meant, there are so many traps and gotcha(s) in C that I smile whenever
I get a chance to look upon them. Sometimes (many times ?) I feel because
of my Common=Sense I get segfaults or semantic errors :) (one fine
example will be using = instead of == and to void that I always use
something like if 1 == x rather than if x == 1
 
A

arnuld

That's an ideal situation. There are times when run-time checks are
necessary. And what your program should do if the pointer is null
depends on the requirements of the program:

char *s = /* ... */; /* may or may not be a null pointer if (s !=
NULL) {
len = strlen(s);
/* ... */
}
else {
/*
* What's the right thing to do here? It's impossible to * tell
without more information.
*/
}

In most cases, the logical error occurred when s was set to a null
pointer value.


What about this situation where size of dest is lesser than size of
source. It does Segfault but it Segfaults after printing "WOW..."
statement, can't understand why. It should not print the message because
it has already Segfaulted in strcpy().



#include <stdio.h>
#include <string.h>

int main(void)
{
char* src = "000000000000000000000000";
char dest[6] = {0};

strcpy(dest, src);

printf("WOW!... SRC is larger thab DEST but strcpy still works\n");

return 0;
}



===================== OUTPUT =========================
[arnuld@dune programs]$ gcc -ansi -pedantic -Wall -Wextra strcpy.c
[arnuld@dune programs]$ ./a.out
WOW!... SRC is larger thab DEST but strcpy still works
Segmentation fault
[arnuld@dune programs]$
 
T

tm

Pretend it's ""?

I remember that under HPUX NULL and "" were interchangeable.
A very very bad idea. It made the job of porting a program
from HPUX to SGI harder. Strings were initialized with NULL
all over the place and nobody cared. The development team
(they mainly used HPUX) was not even aware of the difference
between NULL and "". And when I explained it to them, they
could hardly believe there is a difference. It really
hindered their understanding of pointers and strings.

I had to introduce a layer of macros and functions for
every string function (strcmp, strlen, strcpy, ...) like

size_t my_strlen (const char *s)
{
if (s == NULL) {
return 0;
} else {
return strlen(s);
}
}

#define strlen(s) my_strlen(s)

Greetings Thomas Mertes

--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
I

Ike Naar

What about this situation where size of dest is lesser than size of
source. It does Segfault but it Segfaults after printing "WOW..."
statement, can't understand why. It should not print the message because
it has already Segfaulted in strcpy().

#include <stdio.h>
#include <string.h>

int main(void)
{
char* src = "000000000000000000000000";
char dest[6] = {0};

strcpy(dest, src);

printf("WOW!... SRC is larger thab DEST but strcpy still works\n");

return 0;
}

===================== OUTPUT =========================
[arnuld@dune programs]$ gcc -ansi -pedantic -Wall -Wextra strcpy.c
[arnuld@dune programs]$ ./a.out
WOW!... SRC is larger thab DEST but strcpy still works
Segmentation fault
[arnuld@dune programs]$

It probably doesn't segfault *in* strcpy().
It's very well possible that strcpy() finishes the job, filling
dest[] and clobbering over memory near dest[]. You shouldn't be
surprised if it clobbers, for instance, the return adress pointing
to the location from where main() was called, and that the segfault
occurs on return from main().
 
B

BartC

tm said:
I remember that under HPUX NULL and "" were interchangeable.
A very very bad idea. It made the job of porting a program
from HPUX to SGI harder.

Only because SGI hadn't thought of it too!
Strings were initialized with NULL
all over the place and nobody cared. The development team
(they mainly used HPUX) was not even aware of the difference
between NULL and "". And when I explained it to them, they
could hardly believe there is a difference. It really
hindered their understanding of pointers and strings.

I had to introduce a layer of macros and functions for
every string function (strcmp, strlen, strcpy, ...) like

size_t my_strlen (const char *s)
{
if (s == NULL) {
return 0;
} else {
return strlen(s);
}
}

#define strlen(s) my_strlen(s)

This demonstrates my point exactly. If strlen() behaved like my_strlen(),
then would be no problem.

If NULL is not taken care of by the standard functions, then these NULL
checks have to be sprinkled all over the code.
 
T

tm

Only because SGI hadn't thought of it too!







This demonstrates my point exactly. If strlen() behaved like my_strlen(),
then would be no problem.

Nobody hinders you to define this macros and functions.
But keep in mind: This functions do not work according to
the C standard.

BTW. It is not just the string functions. What happens
when you want to access the characters of a string
yourself? You have to check for NULL before you do s[0].
While C strings are '\0' terminated NULL is not. Except
when NULL[0] succeeds and returns '\0'. But that opens
another can of worms, since it allows dereferencing NULL
without segfault.

When NULL and "" are equivalent you can easily initialize
strings with NULL. But you could also initialize them
with "" instead. I think you probably want an automatic
memory management for strings, but this is byond the concept
of char* strings.

Having two distinct values with the same meaning (like
NULL and "") is IMHO generally not a good idea.
If NULL is not taken care of by the standard functions, then these NULL
checks have to be sprinkled all over the code.

No, I consider a NULL string an error. In most cases I do
not check for NULL strings. I check when the momory for
a string is requested with malloc, but not afterwards.
I really want that my program segfaults when a NULL
string is used.

BTW. I also think that string library functions, which
check for NULL all the time, would be much slower.
Note that the C standard string library functions do not
check for NULL and create a segfault when it happens. The
segfault is caused by a different mechanism, which does
not slow down the string functions.


Greetings Thomas Mertes

--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
E

Eric Sosman

tm said:
[...]
I had to introduce a layer of macros and functions for
every string function (strcmp, strlen, strcpy, ...) like

size_t my_strlen (const char *s)
{
if (s == NULL) {
return 0;
} else {
return strlen(s);
}
}

#define strlen(s) my_strlen(s)

This demonstrates my point exactly. If strlen() behaved like
my_strlen(), then would be no problem.

No, the problem would just change its nature and move from
one place to another. Instead of crashing, the program might (for
example) just print nothing at all in the "Amount" field of a
paycheck -- delighting the worker who gets to fill in whatever he
likes, but displeasing the company who paid for the check-writing
software ...
If NULL is not taken care of by the standard functions, then these NULL
checks have to be sprinkled all over the code.

If NULL means something different from "", the program must check
for the difference. If "" means something different from "purple",
the program must check for the difference. What's the, er, difference?
 
B

BartC

Eric Sosman said:
No, the problem would just change its nature and move from
one place to another. Instead of crashing, the program might (for
example) just print nothing at all in the "Amount" field of a
paycheck -- delighting the worker who gets to fill in whatever he
likes, but displeasing the company who paid for the check-writing
software ...

But with crashing software *nobody* would get paid. And if an application
used "", then the field would be blank anyway.

If a blank field is an error, it will be obvious when the field appears
'blank'. Not so obvious when there's a seg fault or whatever.
If NULL means something different from "", the program must check
for the difference. If "" means something different from "purple",
the program must check for the difference. What's the, er, difference?

If. Usually a string is a string, you might not do anything different with a
0-length string compared with an N-length one.

But it is a pain then to have to check for NULLs (which I tend to do with
functions taking string arguments).

At the moment I might write:

if (s==NULL) return NULL;
slen = strlen(s);
if (slen==0) return NULL;

And mention in the specs that NULL is an acceptable argument equivalent to
an empty string.

But wouldn't it be useful to be able to omit that first check?
 
I

Ike Naar

But it is a pain then to have to check for NULLs (which I tend to do with
functions taking string arguments).

At the moment I might write:

if (s==NULL) return NULL;
slen = strlen(s);
if (slen==0) return NULL;

And mention in the specs that NULL is an acceptable argument equivalent to
an empty string.

There a many categories of invalid pointers: null pointer, indeterminate
pointer, pointer to array of char with no zero terminator, pointer to short
buffer, etc. Which categories do you find acceptable arguments?
 
B

BartC

Ike Naar said:
There a many categories of invalid pointers: null pointer, indeterminate
pointer, pointer to array of char with no zero terminator, pointer to
short
buffer, etc. Which categories do you find acceptable arguments?

We're talking a valid pointer to a zero-terminated string, or a NULL value.
Nothing much can be done about invalid pointers.

NULL meaning 'empty value' is not so useful for other types which need other
mechanisms to determine the size of the data.

It's possible that a rogue string pointer might end up with a NULL value
that is then treated as an empty string rather than cause a crash. And
possibly that is a more difficult bug to pick up than a rogue pointer into
random memory, which will likely give obviously wrong results.

But then, NULL is also used for signalling, which suffers from the same
problem: is that NULL value intentional, or accidental?
 
S

Seebs

What about this situation where size of dest is lesser than size of
source. It does Segfault but it Segfaults after printing "WOW..."
statement, can't understand why. It should not print the message because
it has already Segfaulted in strcpy().

No it hasn't.

Invoking undefined behavior is not the same as segfaulting. It's quite
possible for the undefined behavior to, instead of crashing, set things
up so that you will crash at some unspecified point in the distant future.

You have a fascinating cognitive map of C, but it's basically totally wrong.

-s
 
K

Keith Thompson

BartC said:
Only because SGI hadn't thought of it too!

Only because SGI *and everyone else in the world* didn't do it the way
HPUX did, and because the HPUX behavior isn't guaranteed by the
standard.

[...]
If NULL is not taken care of by the standard functions, then these NULL
checks have to be sprinkled all over the code.

Or the code could have been written correctly in the first place so it
never tries to pass a null pointer to a string function.

Yes, writing software correctly can be non-trivial, but keeping a
consistent mental model ("" and NULL are two different things)
makes it easier.
 
D

Default User

What about this situation where size of dest is lesser than size of
source. It does Segfault but it Segfaults after printing "WOW..."
statement, can't understand why. It should not print the message because
it has already Segfaulted in strcpy().

There is no defined behavior for undefined behavior.



Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top