Trying to figure out how to dynamically re-allocate strings while concatenating

  • Thread starter Shriramana Sharma
  • Start date
S

Shriramana Sharma

Hello. I am not a professional programmer and only program now and then. So please forgive me if my question sounds silly or naive.

I am trying to find out how to dynamically lengthen a string while concatenating in C, as it is easily possible to do so in C++ or Python (two other languages I have used).

I patched up the following code:

Code:
# include <stdio.h>
# include <stdlib.h>
# include <string.h>

void strappend ( char * a, char * b )
{
a = ( char * ) realloc ( a, strlen ( a ) + strlen ( b ) ) ;
strcat ( a, b ) ;
}

void main ( void )
{
char * name = "my name is" ;
strappend ( name, " this or " ) ;
strappend ( name, "that." ) ;
printf ( "%s\n", name ) ;
}

But it aborts with an error saying: "./strappend: realloc(): invalid pointer: 0x000000000040073c ***". No matter how many times I execute it that hex number remains the same.

Can anyone kindly point me to what I am doing wrong with realloc? I have never really used malloc/realloc stuff so I am probably botching things up somewhere but I'm not sure where.

Thanks!
 
X

Xavier Roche

Le 04/08/2012 16:12, Shriramana Sharma a écrit :
a = ( char * ) realloc ( a, strlen ( a ) + strlen ( b ) ) ;
.....................................^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+1 for the ending \0
 
X

Xavier Roche

Le 04/08/2012 16:12, Shriramana Sharma a écrit :
char * name = "my name is" ;

Oh, and you can't do that, too.

char * name = strdup("my name is");

Otherwise the string will be a constant, not a valid malloc() pointer.

(building with -Werror=write-strings might be a good idea)
 
S

Shriramana Sharma

Hey very nice thank you very much for that guidance! :) It's now working.
 
H

Heinrich Wolf

Shriramana Sharma said:
Hello. I am not a professional programmer and only program now and then.
So please forgive me if my question sounds silly or naive.

I am trying to find out how to dynamically lengthen a string while
concatenating in C, as it is easily possible to do so in C++ or Python
(two other languages I have used).

I patched up the following code:

Code:
# include <stdio.h>
# include <stdlib.h>
# include <string.h>

void strappend ( char * a, char * b )
{
a = ( char * ) realloc ( a, strlen ( a ) + strlen ( b ) ) ;
strcat ( a, b ) ;
}[/QUOTE]

You are changing a in this function, but a is restored on leaving it. So you
need to declare
void strappend ( char ** a, char * b )

strlen ( a ) + strlen ( b )
is not enough! You need to add 1 char for the terminating 0.
[QUOTE]
void main ( void )
{
char * name = "my name is" ;
strappend ( name, " this or " ) ;
strappend ( name, "that." ) ;
printf ( "%s\n", name ) ;
}

You are reallocating memory in strappend, but not freeing it. This short
program is ok. The reallocated "name" is still in reach and automatically
freed on exiting your program. But be careful in a more complex program!

I doubt that realloc is allowed on the initial value of "name", since it was
not malloc'ed. In fact it is a pointer aiming at constant memory segment.
Maybe
char * name = strdup("my name is");
works better.
 
E

Eric Sosman

Hello. I am not a professional programmer and only program now and then. So please forgive me if my question sounds silly or naive.

I am trying to find out how to dynamically lengthen a string while concatenating in C, as it is easily possible to do so in C++ or Python (two other languages I have used).

I patched up the following code:

Code:
# include <stdio.h>
# include <stdlib.h>
# include <string.h>

void strappend ( char * a, char * b )[/QUOTE]

Aside: `strappend' is a reserved name (anything starting with
`str...' is reserved), and it's conceivable you could get into trouble.
Change the name.
[QUOTE]
{
	a = ( char * ) realloc ( a, strlen ( a ) + strlen ( b ) ) ;[/QUOTE]

The end of a string is marked by a '\0' character, which is *not*
included in the string's length.  The string "Hello" has length five,
but requires six bytes: 'H', 'e', 'l', 'l', 'o', '\0'.  You are
allocating one byte less than you actually need.

[QUOTE]
strcat ( a, b ) ;[/QUOTE]

Functions that acquire resources can fail.  When realloc() cannot
find enough memory to satisfy your request, it returns NULL -- and if
you blindly use that NULL as the first argument to strcat(), you'll
be in Deep Trouble.  Get in the habit of always checking the returned
value before using it -- and this goes for any function that acquires
a resource, whether you're requesting memory or trying to open a file
or anything: If the function can fail, you should check for failure.

But even if realloc() and strcat() work fine, now what?  You began
with a pointer `a', which is a local variable inside strappend().  You
use realloc() to change that local variable; it changes.  But how will
your caller learn of the change?  Answer: It won't.  What happens in
strappend() stays in strappend().  The call to strappend() provided an
initial value for `a', but that's all: There is no "reverse path" from
`a' back to the expression that provided the value.  You will have to
get the value back to the caller by some other route, perhaps by
returning it as the value of the strappend() function (there are other
ways, but this seems simplest).
[QUOTE]
}

void main ( void )[/QUOTE]

Aside: `int main(void)'.
[QUOTE]
{
	char * name = "my name is" ;
	strappend ( name, " this or " ) ;[/QUOTE]

No, no, no!  You can only RE-allocate memory that was dynamically
allocated to begin with, that is, memory that you earlier obtained
from realloc() or malloc() or calloc().  The variable `name' points
to memory that is statically allocated, not dynamic: If you attempt to
realloc() that non-dynamic memory, you get "undefined behavior."  (It
is likely that the specific message you're seeing is due to exactly
this problem.)
[QUOTE]
strappend ( name, "that." ) ;
	printf ( "%s\n", name ) ;[/QUOTE]

As mentioned earlier, `name' has not been affected by what went
on inside strappend().
[QUOTE]
}

But it aborts with an error saying: "./strappend: realloc(): invalid pointer: 0x000000000040073c ***". No matter how many times I execute it that hex number remains the same.

Can anyone kindly point me to what I am doing wrong with realloc? I have never really used malloc/realloc stuff so I am probably botching things up somewhere but I'm not sure where.

The comp.lang.c Frequently Asked Questions (FAQ) page at
<http://www.c-faq.com/> would make good reading. Section 4 is
all about pointers -- in fact, Question 4.8 specifically addresses
one of the problems in your code.

The C programming languages has many strengths, but hand-holding
is not among them. You need to be something of a shade-tree mechanic
or tinkerer (or even gearhead) to use C well. If you're an occasional
programmer and not much interested in fiddly low-level details, you
may be happier with a higher-level, more coddling language. I think
Mark Twain wrote that one lifetime is not long enough to master
German; something similar could be said of C.
 
B

Ben Bacarisse

Shriramana Sharma said:
Hey very nice thank you very much for that guidance! :) It's now
working.

Take care. Many C programs appear to work, right up to the point where
they don't! I hope you have taken all the advice, not just the advice
in the message to which you replied.

Speaking of which, strdup is not a standard C function, so you should
check that it will available everywhere your program might run (or at
least everywhere your program might be compiled and linked).
 
S

Stefan Ram

Shriramana Sharma said:
Can anyone kindly point me to what I am doing wrong with realloc?

The result of an attempt should always be checked according
to me. The »free« below is not really required for this
small program, but would be helpful in many cases, where the
code is repeatedly executed in a program.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char * strapp( char * * const a, char const * const b )
{ *a = realloc( *a, 1 + strlen( *a )+ strlen( b ));
if( *a )strcat( *a, b );
return *a; }

int main( void )
{ char const * const name_ = "my name is";
char * name = malloc( strlen( name_ )+ 1 );
if( name )
{ strcpy( name, name_ );
if( strapp( &name, " this or " ))
if( strapp( &name, "that." ))
printf( "%s\n", name );
free( name ); }}
 
E

Eric Sosman

[...]
I see from your call to the free function,
that you intend to free the allocated memory within main.

If either call to strapp returns a null pointer,
then your call to free, will free a null pointer instead.

... which is harmless, defined as a no-op.
 
E

Eric Sosman

[...]
One thing that nobody else has mentioned,
is that it is good programming practice
to assign the return value of realloc
to a temporary pointer first,
and only to assign that value to your working pointer
if the temporay pointer has been tested
and found to be not a null pointer.

Otherwise you lose
the value of the pointer to the allocated memory,
which can cause a memory leak.

Note that his realloc() call and value assignment were
inside a function that received the pointer value as an
argument. That is, the caller still owns (or can compute)
a pointer to the allocated memory.

Still, (1) the O.P.'s code was pretty badly messed up
already and we're all in the business of imagining different
possible corrections to it, and (2) your advice to assign
the value of realloc(p,...) to something other than p is
usually excellent.
 
K

Keith Thompson

pete said:
One thing that nobody else has mentioned, is that it is good
programming practice to assign the return value of realloc to a
temporary pointer first, and only to assign that value to your working
pointer if the temporay pointer has been tested and found to be not a
null pointer.

Otherwise you lose the value of the pointer to the allocated memory,
which can cause a memory leak.
[...]

That is, of course, very good advice.

On the other hand, whether it matters depends on what you intend to do
once you've detected an allocation failure.

char *buf = malloc(100);
if (buf == NULL) {
/* malloc failed */
}

char *temp = realloc(buf, 200);
if (temp == NULL) {
/* realloc failed */
}

So how can you recover from a failure to allocate memory? You could
just try again, but it will probably just fail again. You could fall
back to a less memory-intensive, but perhaps slower, algorithm, but
I don't think I've ever seen code in the real world that does that.

A very common response to a memory allocation failure is to
terminate the program, ideally after doing some cleanup (such as
saving temporary files, restoring the display to a sane state,
printing and/or logging an error message, etc.) But if the
program is about to terminate anyway, then losing the pointer to
your original buffer isn't going to make much difference, and you
probably might as well write:

buf = malloc(buf, 200);

On the other other hand, cleaning up after yourself is a good habit, and
it's possible that calling free(buf) will free up memory that's needed
by your cleanup code.
 
S

Shriramana Sharma

Hello -- I hadn't expected quite so many replies, but obviously memory management is an important thing so you have taken care to give me detailed advice! Many thanks to all of you! As it is a lot for me to digest, I will take some time to read it and understand it. Thanks again.
 
J

Jens Gustedt

Am 05.08.2012 02:00, schrieb Keith Thompson:
That is, of course, very good advice.

On the other hand, whether it matters depends on what you intend to do
once you've detected an allocation failure.

char *buf = malloc(100);
if (buf == NULL) {
/* malloc failed */
}

char *temp = realloc(buf, 200);
if (temp == NULL) {
/* realloc failed */
}

So how can you recover from a failure to allocate memory? You could
just try again, but it will probably just fail again. You could fall
back to a less memory-intensive, but perhaps slower, algorithm, but
I don't think I've ever seen code in the real world that does that.

A very common response to a memory allocation failure is to
terminate the program, ideally after doing some cleanup (such as
saving temporary files, restoring the display to a sane state,
printing and/or logging an error message, etc.) But if the
program is about to terminate anyway, then losing the pointer to
your original buffer isn't going to make much difference, and you
probably might as well write:

buf = malloc(buf, 200);

On the other other hand, cleaning up after yourself is a good habit, and
it's possible that calling free(buf) will free up memory that's needed
by your cleanup code.

Very well stated, this error checking for valid dynamic allocations is
often overestimated, treated almost religiously.

If you don't know how to recover from such a failure other than to
abort the program, there is really no point in it. (But implying also
that for specialized code where you know what to do, you should do
it.)

What these things often do on the other hand is code obfuscation. I
think in many cases having simple and readable code is much more
important.

Jens
 
B

BartC

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define u32 unsigned
#define u8 unsigned char
#define P printf

u32 __stdcall asmcat(u8** dove, u8* cosa);

int main(void)
{u8 *name=0;
u32 res;
res=asmcat(&name, "123 prova");
P("[%s] len=%u len1=%u\n", (name?name:"NULL"), res, (u32) strlen(name));
res=asmcat(&name, " 456 ");
P("[%s] len=%u len1=%u\n", (name?name:"NULL"), res, (u32) strlen(name));
free(name);
return 0;
}
-----------------
; nasmw -fobj asmfile.asm
; bcc32 -v name.c asmfile.obj
section _DATA use32 public class=DATA

extern _realloc
global asmcat

section _BSS use32 public class=BSS
section _TEXT use32 public class=CODE

; u32 __stdcall asmcat(u8** dove, u8* cosa)
; 0k,4j,8i,12b,16ra, 20P_dove, 24P_cosa
align 4
asmcat:
push ebx
push esi
push edi
push ebp
;iint3
mov edi, dword[esp+ 20]
mov esi, dword[esp+ 24]
cmp edi, 0
je .e
cmp esi, 0
je .e
jmp short .1
.e: mov eax, -1
stc
jmp .z
.1: mov edx, esi
jmp short .3
.2: inc edx
jz .e
.3: cmp byte[edx], 0
jne .2
sub edx, esi
mov ebp, edx
mov edx, [edi]
mov ebx, 0
cmp edx, 0
je .6
jmp short .5
.4: inc edx
jz .e
.5: cmp byte[edx], 0
jne .4
sub edx, [edi]
mov ebx, edx
.6: mov ecx, ebx
add ecx, ebp
jc .e
add ecx, 2
jc .e
mov edx, [edi]
push ecx
push edx
call _realloc
add esp, 8
cmp eax, 0
je .e
cmp dword[edi], 0
jne .7
mov byte[eax], 0
.7: mov [edi], eax
mov edx, eax
jmp short .9
.8: inc edx
.9: cmp byte[edx], 0
jne .8
mov ecx, edx
sub ecx, eax
cmp ecx, ebx
jne .e
mov eax, 0
add ebx, ebp
jc .e
add ebp, 2
jc .e
.a: mov al, [esi]
mov [edx], al
inc esi
inc edx
dec ebp
jz .e
cmp eax, 0
jne .a
mov eax, ebx
clc
.z:
pop ebp
pop edi
pop esi
pop ebx
ret 8

ret
-----------------

section _DATA use32 public class=DATA

extern _realloc
global asmcat

section _BSS use32 public class=BSS
section _TEXT use32 public class=CODE

; u32 __stdcall asmcat(u8** dove, u8* cosa)
; 0k,4j,8i,12b,16ra, 20P_dove, 24P_cosa
align 4
asmcat:
<b,i,j,k
;iint3
j=^20|i=^24|j==0#.e|i==0#.e|#.1
.e: a=-1 |stc |##.z
.1: r= i |#.3
.2: ++r |jz .e
.3: B*r#.2
r-=i |k=r
r=*j |b=0|r==0#.6|#.5
.4: ++r |jz .e
.5: B*r#.4
r-=*j|b=r
.6: c=b |c+=k|jc .e|c+=2|jc .e
r=*j |_realloc<(r, c)|a==0#.e
D*j==0!#.7|B*a=0
.7: *j =a|r=a |#.9
.8: ++r
.9: B*r#.8
c=r|c-=a|c!=b#.e|a=0|b+=k|jc .e|k+=2|jc .e
.a: al=*i|*r=al
++i |++r |--k|jz .e|a#.a
a=b |clc
.z:
ret 8

ret


in few words: i prefer the above version even
it can be seem a little covoluted-difficult

It's more than a little convoluted. The OP said they'd also used Python:

name = "my name is"

name += " this or "
name += "that."

print name

Do you still prefer your version (in whatever language it's in)?

The nearest you can get in C is to forget about dynamic allocation, and use
an upper limit for the size of the name. In lot of cases (99% of the ones
where I need to use string append), this is quite adequate, when you are
confident the results will not overflow:

#include <stdio.h>
#include <string.h>

#define maxnamesize 250

int main(void){
char name[maxnamesize];

strcpy(name,"my name is");
strcat(name," this or ");
strcat(name,"that.");

puts(name);

}

sprintf() can also be used for building strings this way. Otherwise using
all this memory management to, for example, build a string a character at a
time, is overkill. And will generate clutter if the result needs checking in
the caller's code.
 
B

Ben Bacarisse

Jens Gustedt said:
Am 05.08.2012 02:00, schrieb Keith Thompson:

Very well stated, this error checking for valid dynamic allocations is
often overestimated, treated almost religiously.

If you don't know how to recover from such a failure other than to
abort the program, there is really no point in it.

I don't entirely agree, at least not literally. Even in the "no
recovery" case a test for a NULL return, an explicit message saying
what's wrong, and an immediate abort is better than doing nothing. This
is partly because it's so easy: a two-line wrapper around malloc is all
that's needed, and you only need to write it once in a lifetime! And if
you ever port to a system where low-numbered addresses are valid, the
pay-off is huge.

<snip>
 
H

Heinrich Wolf

....
The nearest you can get in C is to forget about dynamic allocation, and
use
an upper limit for the size of the name. In lot of cases (99% of the ones
where I need to use string append), this is quite adequate, when you are
confident the results will not overflow:

This is what I did in almost every program in my job.
#include <stdio.h>
#include <string.h>

#define maxnamesize 250

int main(void){
char name[maxnamesize];

strcpy(name,"my name is");
strcat(name," this or ");
strcat(name,"that.");

puts(name);

}

But strcat() is dangerous. It may result in a buffer overflow. In the
programs, which I made in my job, I did it like that:

#define STRNCAT(a, b) strncat(a, b, sizeof(a) - strlen(a) - 1)
strcpy(name, "");
STRNCAT(name, "my name is");
STRNCAT(name, " this or ");
STRNCAT(name, "that.");

It works great and you need not clutter your code with error handling. It is
just consuming some CPU cycles more than efficient code, which would not
step through the length of a two times.
 
J

Jens Gustedt

Am 05.08.2012 13:49, schrieb Ben Bacarisse:
I don't entirely agree, at least not literally. Even in the "no
recovery" case a test for a NULL return, an explicit message saying
what's wrong, and an immediate abort is better than doing nothing. This
is partly because it's so easy: a two-line wrapper around malloc is all
that's needed, and you only need to write it once in a lifetime! And if
you ever port to a system where low-numbered addresses are valid, the
pay-off is huge.

Right, sometimes you gain a good deal by encapsulating system calls in
macros that do the error handling. I have that even systematically by
using the same name for the macro as for the function it encapsulates,
so it still reads

toto * a = malloc(sizeof toto[n]);

This helps to concentrate on the logic of the program instead of being
overwhelmed by error handling.

Behind the scenes then there is a simple try/catch mechanism
(implemented with setjmp) for the cases where I want to handle errors
more nicely than by abort.

As I don't program on too much for constraint systems, with my setting
on my platform the one for malloc never triggered :)

Much more important and common are all the other library calls, that
many people forget to check.

Jens
 
B

Ben Bacarisse

Heinrich Wolf said:
...
The nearest you can get in C is to forget about dynamic allocation,
and use
an upper limit for the size of the name. In lot of cases (99% of the ones
where I need to use string append), this is quite adequate, when you are
confident the results will not overflow:

This is what I did in almost every program in my job.
#include <stdio.h>
#include <string.h>

#define maxnamesize 250

int main(void){
char name[maxnamesize];

strcpy(name,"my name is");
strcat(name," this or ");
strcat(name,"that.");

puts(name);

}

But strcat() is dangerous. It may result in a buffer overflow. In the
programs, which I made in my job, I did it like that:

#define STRNCAT(a, b) strncat(a, b, sizeof(a) - strlen(a) - 1)
strcpy(name, "");
STRNCAT(name, "my name is");
STRNCAT(name, " this or ");
STRNCAT(name, "that.");

It works great and you need not clutter your code with error
handling. It is just consuming some CPU cycles more than efficient
code, which would not step through the length of a two times.

Unfortunately it goes silently wrong if the code refers to a point
rather than to an array (e.g. if you move the code into a function where
name is a parameter, it all goes wrong).

You can guard against that using tricks like this:

#define STRNCAT(a, b) \
(am_i_an_array(&a), strncat(a, b, sizeof(a) - strlen(a) - 1))

static inline void am_i_an_array(char (*ap)[]) { return; }

though I feel sure there must be neater ways to it. C11's _Generic, if
it gets widely implemented, would be one way.
 
B

BartC

io_x said:
"BartC" <[email protected]> ha scritto nel messaggio

adeguate... there is no way, for the king, in informatic :)

A str_append() function as proposed by the OP, I tested to be about 3 times
*slower* than Python. Usually C is some fifty times faster than Python.

But mainly it introduces extra complexity, and opportunities for bugs and
memory leaks, if it is used where a fixed string size is viable.
#include <stdio.h>
#include <string.h>

#define maxnamesize 250

int main(void){
char name[maxnamesize];

and if the size is >250? you have to change the number in the file...
and if the string is going from extern?

If a fixed upper limit is not possible, then you use a different approach.
But only if necessary.

(What if you need to do arithmetic on two integers, do you worry about the
result not fitting into 32 or 64 bits? You don't use an arbitrary precision
integer library just in case! In Python of course numbers will just get
wider as needed. In C you need to use a more primitive approach, to retain
an advantage.)
 
K

Keith Thompson

Oops, what I *meant* was

buf = realloc(buf, 200);
I don't entirely agree, at least not literally. Even in the "no
recovery" case a test for a NULL return, an explicit message saying
what's wrong, and an immediate abort is better than doing nothing. This
is partly because it's so easy: a two-line wrapper around malloc is all
that's needed, and you only need to write it once in a lifetime! And if
you ever port to a system where low-numbered addresses are valid, the
pay-off is huge.

The point I was making upthread was specifically about realloc() (and of
course I messed that up by typoing "malloc" in place of "realloc").

Assigning the result of realloc() to the pointer object given as the
first argument is considered a bad idea because, if the realloc() call
fails and returns a null pointer, you've lost your only pointer to the
original buffer. My point was merely that losing that pointer is bad
only if you would have made use of it. If your response to an realloc()
failure is going to be to abort the program anyway, then you might as
well write

buf = realloc(buf, new_size);

I did *not* mean to suggest omitting the check for a null pointer. If
an allocation call fails, you want to deal with it immediately -- even
if you're just going to abort the program.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,575
Members
45,053
Latest member
billing-software

Latest Threads

Top