string-search-and-replace function wanted

S

Stefan Ram

I would know how to write a plain and simple function
to search and replace a substring within a string. It might
take several hours including testing. But for something so
common, I thought it should be easier to find some code
for this already existing. This turned out to be more
difficult than I thought.

Does anyone know of a function or a string library that
can be used as follows:

#include <lib.h>

...

{ char * const p = malloc( 1024 );
if( p )
{ strncpy( p, "exemple", 1023 ); p[ 1023 ] = 0;
char * const r = srp( p, "e", "ABC", 1024 );
if( r )
{ printf( "%s\n", r );
if( r != p )free( r ); }
free( p ); }}

This should print "ABCxABCmplABC". The function "srp"
might replace within the buffer given (it knows the size
from the fourth argument). If this is not possible, it
tries to realloc or malloc a larger buffer. The buffer
with the result is returned (or 0 in case of an error).
Has anyone already written something like this?
The interface might differ - it just should search and
replace.

If not:

Does there exist a simple text editor without a UI, but a
C-API (written in plain ISO C)?

#include <editor.h>

...

struct editor * const e = editor_load_from_file( "alpha.txt" );
if( e )
{ editor_search_and_replace( e, "e", "ABC" );
char * s = editor_string( e ); /* this function is less important */
if( s ) { printf( "buffer = \"%s\".\n", s ); editor_string_dispose( s ); }
editor_save_to_file( e, "alpha.txt", EDITOR_OVERWRITE_OK );
editor_quit( e ); }

TIA
 
M

Mike Wahler

Stefan Ram said:
I would know how to write a plain and simple function
to search and replace a substring within a string. It might
take several hours including testing. But for something so
common, I thought it should be easier to find some code
for this already existing. This turned out to be more
difficult than I thought.

Does anyone know of a function or a string library that
can be used as follows:


Does there exist a simple text editor without a UI, but a
C-API (written in plain ISO C)?

The web is packed full of open-source code, snippets,
examples, etc. Did you google around?

www.sourceforge.net is one example of such a code repository.

-Mike
 
S

Stefan Ram

Mike Wahler said:
The web is packed full of open-source code, snippets,
examples, etc. Did you google around?

Yes, it turned out to be more difficult than I thought.
 
P

pete

Stefan said:
I would know how to write a plain and simple function
to search and replace a substring within a string. It might
take several hours including testing. But for something so
common, I thought it should be easier to find some code
for this already existing. This turned out to be more
difficult than I thought.

Does anyone know of a function or a string library that
can be used as follows:

#include <lib.h>

...

{ char * const p = malloc( 1024 );
if( p )
{ strncpy( p, "exemple", 1023 ); p[ 1023 ] = 0;
char * const r = srp( p, "e", "ABC", 1024 );
if( r )
{ printf( "%s\n", r );
if( r != p )free( r ); }
free( p ); }}

This should print "ABCxABCmplABC". The function "srp"
might replace within the buffer given (it knows the size
from the fourth argument). If this is not possible, it
tries to realloc or malloc a larger buffer. The buffer
with the result is returned (or 0 in case of an error).
Has anyone already written something like this?
The interface might differ - it just should search and
replace.

/* BEGIN sub.c */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main(void)
{
char string[] = "exemple";
char *sub_1 = "e";
char *sub_2 = "ABC";
char *new_string;
char *ptr_1;
char *ptr_2;
char *ptr_3;
size_t length_1;
size_t length_2;
size_t count;

count = 0;
length_1 = strlen(sub_1);
length_2 = strlen(sub_2);
if (length_2 > length_1) {
ptr_1 = strstr(string, sub_1);
while (ptr_1 != NULL) {
++count;
ptr_1 = strstr(ptr_1 + 1, sub_1);
}
}
new_string =
malloc(count * (length_2 - length_1) + strlen(string) + 1);
if (new_string == NULL) {
fputs("Bonus Nachos\n", stderr);
exit(EXIT_FAILURE);
}
ptr_1 = string;
ptr_2 = new_string;
ptr_3 = strstr(ptr_1, sub_1);
while (ptr_3 != NULL) {
while (ptr_1 != ptr_3) {
*ptr_2++ = *ptr_1++;
}
strcpy(ptr_2, sub_2);
ptr_1 += length_1;
ptr_2 += length_2;
ptr_3 = strstr(ptr_1, sub_1);
}
strcpy(ptr_2, ptr_1);
puts(new_string);
free(new_string);
return 0;
}

/* END sub.c */
 
P

pete

pete said:
length_1 = strlen(sub_1);
length_2 = strlen(sub_2);
if (length_2 > length_1) {
ptr_1 = strstr(string, sub_1);
while (ptr_1 != NULL) {
++count;
ptr_1 = strstr(ptr_1 + 1, sub_1);
}
}
new_string =
malloc(count * (length_2 - length_1) + strlen(string) + 1);

ptr_1 = strstr(string, sub_1);
while (ptr_1 != NULL) {
++count;
ptr_1 = strstr(ptr_1 + 1, sub_1);
}
length_1 = strlen(sub_1);
length_2 = strlen(sub_2);
new_string =
malloc(count * ((int)length_2 - (int)length_1)
+ strlen(string) + 1);
 
S

Stefan Ram

pete said:
/* BEGIN sub.c */

Thanks! (I have not yet looked at your following posting.)

I rewrote your code into a separate function "sub" and passed
the size allocated for the current text buffer to that
function (parameter "z").

A new buffer "b" is allocated only if needed. Otherwise I
would like to search-and-replace in-place (within the old
buffer). I have marked that with "todo" and will try to fill
it in later.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char * sub
( char * const t, /* text */
char const * const s, /* search */
char const * const r, /* replace */
size_t const z ) /* bufsize */
{ char * b; char * q;
char const * p; char const * i;
size_t const ls = strlen( s );
size_t const lr = strlen( r );
long const d =( long )lr -( long )ls;

{ size_t n = 0; if( d > 0 )
{ p = strstr( t, s );
while( p ){ ++n; p = strstr( p + 1, s ); }
{ size_t const w = n * d + strlen( t )+ 1;
b = w > z ? malloc( w ) : t; }}}

if( b != t ) /* new buffer */
{ if( b )
{ p = t; q = b;
i = strstr( p, s );
while( i )
{ while( p != i )*q++ = *p++;
strcpy( q, r );
p += ls; q += lr;
i = strstr( p, s ); }
strcpy( q, p ); }}

else /* substitute in old buffer */
{ if( d < 0 ){ /* todo */ }
else if( d > 0 ){ /* todo */ }
else { /* todo */ }}

return b; }

int main( void )
{ { char t[ 8 ]= "exemple"; puts( sub( t, "e", "ABC", 8 )); }
/*{ char t[ 8 ]= "exemple"; puts( sub( t, "e", "f", 8 )); }*/
/*{ char t[ 14 ]= "ABCxABCmplABC"; puts( sub( t, "ABC", "e", 14 )); }*/
/*{ char t[ 14 ]= "ABCxABCmplABC"; puts( sub( t, "ABC", "DEF", 14 )); }*/ }
 
N

nrk

pete said:
ptr_1 = strstr(string, sub_1);
while (ptr_1 != NULL) {
++count;
ptr_1 = strstr(ptr_1 + 1, sub_1);
}
length_1 = strlen(sub_1);
length_2 = strlen(sub_2);
new_string =
malloc(count * ((int)length_2 - (int)length_1)
+ strlen(string) + 1);

Instead of all those mental calisthenics and dubious casts for the
allocation size, I would prefer:

new_string = malloc( count * length_2 +
(strlen(string) - (count *length_1)) + 1 );

That expresses exactly what we wish to compute.

-nrk.
 
P

pete

nrk said:
pete wrote:

Instead of all those mental calisthenics and dubious casts for the
allocation size, I would prefer:

new_string = malloc( count * length_2 +
(strlen(string) - (count *length_1)) + 1 );

That expresses exactly what we wish to compute.

I like your way better because the types match up good.
I was also thinking of declaring
length_1 and length_2 as type ptrdiff_t
and using my same malloc argument without the casts.
 
P

pete

Stefan said:
pete said:
/* BEGIN sub.c */

Thanks! (I have not yet looked at your following posting.)

I rewrote your code into a separate function "sub" and passed
the size allocated for the current text buffer to that
function (parameter "z").

A new buffer "b" is allocated only if needed. Otherwise I
would like to search-and-replace in-place (within the old
buffer). I have marked that with "todo" and will try to fill
it in later.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

char * sub
( char * const t, /* text */
char const * const s, /* search */
char const * const r, /* replace */
size_t const z ) /* bufsize */
{ char * b; char * q;
char const * p; char const * i;
size_t const ls = strlen( s );
size_t const lr = strlen( r );
long const d =( long )lr -( long )ls;

{ size_t n = 0; if( d > 0 )
{ p = strstr( t, s );
while( p ){ ++n; p = strstr( p + 1, s ); }
{ size_t const w = n * d + strlen( t )+ 1;
b = w > z ? malloc( w ) : t; }}}

if( b != t ) /* new buffer */
{ if( b )
{ p = t; q = b;
i = strstr( p, s );
while( i )
{ while( p != i )*q++ = *p++;
strcpy( q, r );
p += ls; q += lr;
i = strstr( p, s ); }
strcpy( q, p ); }}

else /* substitute in old buffer */
{ if( d < 0 ){ /* todo */ }
else if( d > 0 ){ /* todo */ }
else { /* todo */ }}

return b; }

int main( void )
{ { char t[ 8 ]= "exemple"; puts( sub( t, "e", "ABC", 8 )); }
/*{ char t[ 8 ]= "exemple"; puts( sub( t, "e", "f", 8 )); }*/
/*{ char t[ 14 ]= "ABCxABCmplABC"; puts( sub( t, "ABC", "e", 14 )); }*/
/*{ char t[ 14 ]= "ABCxABCmplABC"; puts( sub( t, "ABC", "DEF", 14 )); }*/ }

Remember to store the return value of sub() in the
calling function, and check if it doesn't equal t,
so you'll know what you have to free later.
 
N

nrk

pete said:
I like your way better because the types match up good.
I was also thinking of declaring
length_1 and length_2 as type ptrdiff_t
and using my same malloc argument without the casts.

The fact that size_t is unsigned is a big pain in the butt. It always
forces me to special case an empty string, for instance. Also, ptrdiff_t
might not be a good idea. If the return of either strlen is not in the
range of a ptrdiff_t then you'll end up invoking UB. For instance, a
strlen implemented along the lines of:

size_t strlen(const char *s) {
size_t ret = 0;

while ( *s++ ) ++ret;
return ret;
}

I am not sure what happens in the case of:

size_t strlen(const char *src) {
const char *s = src;

while ( *s ) ++s;

return s - src;
}

and s - src being outside the range of a ptrdiff_t. Looks like UB to me.

-nrk.
 
P

pete

nrk said:
The fact that size_t is unsigned is a big pain in the butt. It always
forces me to special case an empty string, for instance.
Also, ptrdiff_t might not be a good idea.
If the return of either strlen is not in the
range of a ptrdiff_t then you'll end up invoking UB. For instance, a
strlen implemented along the lines of:

size_t strlen(const char *s) {
size_t ret = 0;

while ( *s++ ) ++ret;
return ret;
}

I am not sure what happens in the case of:

size_t strlen(const char *src) {
const char *s = src;

while ( *s ) ++s;

return s - src;
}

and s - src being outside the range of a ptrdiff_t.
Looks like UB to me.

Thank you.
I see now that the standard does explicitly mention that possibility,
though I think it's screwy.
 
A

anony*mouse

new_string =
malloc(count * ((int)length_2 - (int)length_1)
+ strlen(string) + 1);

It's time to read up on integer overflow bugs.

http://msdn.microsoft.com/library/en-us/dncode/html/secure04102003.asp

Proof of concept:

Consider a 65KB input string containing only the letter 'a'. The
string to replace is also 'a'. The replacement string is also 65KB.

Code:
	size_t count      = 65536;
	size_t length_2   = 65536;
	size_t length_1   = 1;
	size_t stringlen  = 65536;

	size_t result = count * ((int)length_2 - (int)length_1) 
	                   + stringlen + 1;

	printf("%u\n", result);

Output: 1

Not good!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,054
Latest member
LucyCarper

Latest Threads

Top