removing substring from string

B

becte

I am little bit confused
Is this a legal way of removing a substring
from a string? What about the second alternative
using strcpy, is it ok even though the source and
dest. strings overlap?

// Remove (first occurence of) sub from src
void func(char *src, char *sub)
{
char *p;
if ((p=strstr(src,sub)) != NULL)
{
memmove(p,p+strlen(sub), strlen(p+strlen(sub))+1);

// alternative
// strcpy(p,p+strlen(sub));
}
}
 
E

Eric Sosman

becte said:
I am little bit confused
Is this a legal way of removing a substring
from a string? What about the second alternative
using strcpy, is it ok even though the source and
dest. strings overlap?

// Remove (first occurence of) sub from src
void func(char *src, char *sub)
{
char *p;
if ((p=strstr(src,sub)) != NULL)
{
memmove(p,p+strlen(sub), strlen(p+strlen(sub))+1);

Looks all right to me.
// alternative
// strcpy(p,p+strlen(sub));

Undefined behavior if source and destination overlap
(i.e., if strlen(p+strlen(sub)) >= strlen(sub)).
 
A

Al Bowers

becte said:
I am little bit confused
Is this a legal way of removing a substring
from a string? What about the second alternative
using strcpy, is it ok even though the source and
dest. strings overlap?

// Remove (first occurence of) sub from src
void func(char *src, char *sub)
{
char *p;
if ((p=strstr(src,sub)) != NULL)
{
memmove(p,p+strlen(sub), strlen(p+strlen(sub))+1);

// alternative
// strcpy(p,p+strlen(sub));
}
}

The Standard's Description of function strcpy says
"If copying takes place between objects that
overlap, the behavior is undefined."
p, pointing to an array than includes the string
pointed to by p+strlen(sub), is an overlap.
An implementator is free to implement function strcpy
similiar to this:
char *MyStrCpy(char *s, const char *cs)
{
const char *tmp;
size_t n = strlen(cs);

for(tmp = cs+n; ; tmp--,n--)
{
*(s+n) = *tmp;
if(!n) break;
}
return s;
}
Using this as your strcpy function in the function func
will fail to give the expected result.


The Standard's description of function memmove describes
the behavior is AS IF the characters(number: strlen(p+strlen(sub))
pointed to by p+strlen(sub) are first copied to a seperate
temporary object. Therefore this function would be safe
to use in the code above. There is no overlapping problem.

Of the two, I would stick with function memmove.
 
G

Gregory Pietsch

becte said:
I am little bit confused
Is this a legal way of removing a substring
from a string? What about the second alternative
using strcpy, is it ok even though the source and
dest. strings overlap?

// Remove (first occurence of) sub from src
void func(char *src, char *sub)
{
char *p;
if ((p=strstr(src,sub)) != NULL)
{
memmove(p,p+strlen(sub), strlen(p+strlen(sub))+1);

// alternative
// strcpy(p,p+strlen(sub));
}
}

Since the source and destination operands overlap, memmove() is your
safest bet. Don't use strcpy() for something like this, ever.

Also, for newsgroups, don't use // comments because of wordwrapping
issues.

Gregory Pietsch
 
A

aegis

Eric said:
Looks all right to me.


Undefined behavior if source and destination overlap
(i.e., if strlen(p+strlen(sub)) >= strlen(sub)).

That doesn't overlap. See
http://groups-beta.google.com/group/comp.std.c/msg/03a4a0c929fd2138?dmode=source

imagine that you have, "aabbccdd"
and you search for the substring "dd"
then p points to the first "d" and
p + strlen(sub) would point one past the
second "d". This means a distinct value of
an object is used to assign to another
distinct object. Overlap does not mean
any two pointers pointing in the same
memory region. It is an overlap when
the objects within the memory region
are not distinct. Consider this:

strcpy(p + 3, p); p overlaps p + 3 here
the converse cannot be said for p+offset
overlapping p.
 
A

Al Bowers

aegis said:
That doesn't overlap. See
http://groups-beta.google.com/group/comp.std.c/msg/03a4a0c929fd2138?dmode=source

imagine that you have, "aabbccdd"
and you search for the substring "dd"
then p points to the first "d" and
p + strlen(sub) would point one past the
second "d". This means a distinct value of
an object is used to assign to another
distinct object. Overlap does not mean
any two pointers pointing in the same
memory region. It is an overlap when
the objects within the memory region
are not distinct. Consider this:

strcpy(p + 3, p); p overlaps p + 3 here
the converse cannot be said for p+offset
overlapping p.

No, there is an overlap problem. For strcpy(), the Standard does
not define in what order the characters from the string(arg 2)
are copied to the character array(arg1). Among possiblities,
an implementor may copy from the beginning of the string to the
end terminating character, as in function Mystrcpy1 example below.
Another possibility is for the implementor to copy to the character
array from the string's terminating character to the beginning of the
string, as shown in the function Mystrcpy2 example below.

Now, take the string "aabbccdd", and, attempt to remove the
substring "cc". It the implementor made strcpy like the Mystrcpy1
example, you will get away with using the strcpy. But, if the
the implementation was similiar to function Mystrcpy2 then the
resulting string will be wrong, "aabb" instead of "aabbdd".

Run the following to see the effect.

#include <stdio.h>
#include <string.h>

char *Mystrcpy1(char *s, const char *cs);
char *Mystrcpy2(char *s, const char *cs);
void func1(char *src, char *sub);
void func2(char *src, char *sub);

int main(void)
{
char s[32], *substr = "cc";

Mystrcpy2(s,"aabbccdd");
printf("From the string: \"%s\"\n",s);
printf("We will attempt to remove substring \"%s\"\n",substr);
func1(s,substr);
printf("Using function Mystrcpy1. The result: \"%s\"\n\n",s);

Mystrcpy2(s,"aabbccdd");
printf("From the string: \"%s\"\n",s);
printf("We will attempt to remove substring \"%s\"\n",substr);
func2(s,substr);
printf("Using function Mystrcpy2. The result: \"%s\"\n",s);
return 0;
}

char *Mystrcpy1(char *s, const char *cs)
{ /* Copy from beginning of string cs to the end */
char *s1;
const char *cs1;

for(s1 = s,cs1 = cs; '\0' != (*s1 = *cs1); s1++,cs1++) ;
return s;
}

char *Mystrcpy2(char *s, const char *cs)
{ /* Copy from the end of string cs to the beginning */
const char *tmp;
size_t n = strlen(cs);

for(tmp = cs+n; ; tmp--,n--)
{
*(s+n) = *tmp;
if(!n) break;
}
return s;
}

void func1(char *src, char *sub)
{ /* Using function Mystrcpy1 */
char *p;

if ((p=strstr(src,sub)) != NULL)
Mystrcpy1(p,p+strlen(sub));
return;
}

void func2(char *src, char *sub)
{ /* Using function Mystrcpy2 */
char *p;

if ((p=strstr(src,sub)) != NULL)
Mystrcpy2(p,p+strlen(sub));
return;
}
 
E

Eric Sosman

aegis said:
Eric said:
becte wrote:

I am little bit confused
Is this a legal way of removing a substring
from a string? What about the second alternative
using strcpy, is it ok even though the source and
dest. strings overlap?

// Remove (first occurence of) sub from src
void func(char *src, char *sub)
{
char *p;
if ((p=strstr(src,sub)) != NULL)
{
[...]
// alternative
// strcpy(p,p+strlen(sub));

Undefined behavior if source and destination overlap
(i.e., if strlen(p+strlen(sub)) >= strlen(sub)).

That doesn't overlap. See
http://groups-beta.google.com/group/comp.std.c/msg/03a4a0c929fd2138?dmode=source

imagine that you have, "aabbccdd"
and you search for the substring "dd"
then p points to the first "d" and
p + strlen(sub) would point one past the
second "d". [...]

Then strlen(p+strlen(sub)) will be zero and
strlen(sub) will be two. 0 >= 2 yields "false."
 
L

Lawrence Kirby


It *can* overlap. For strcpy() overlap occurs if within one call to
strcpy() the same byte in memory is both read and written.
imagine that you have, "aabbccdd"
and you search for the substring "dd"
then p points to the first "d" and
p + strlen(sub) would point one past the
second "d". This means a distinct value of
an object is used to assign to another
distinct object.

However if you search for "aa" instead of "dd" then, for example, src[2]
is both read and written by the strcpy() and you have overlap.
Overlap does not mean
any two pointers pointing in the same
memory region. It is an overlap when
the objects within the memory region
are not distinct. Consider this:

I don't follow.
strcpy(p + 3, p); p overlaps p + 3 here
the converse cannot be said for p+offset
overlapping p.

"Overlap" is a commutative operation, if a overlaps b then b overlaps a.

Lawrence
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top