to malloc or not to malloc?

K

kj

I used to program C all the time, but that was years ago. Since
then, continued use of languages like Java that take care of memory
management have made me soft and flabby. Now I have to code
something in C, and I get the feeling that I'm courting memory
leaks and segmentation faults at every turn...

I'm writing a (Flex/Bison) parser that reads some serialized
representation of an arbitrarily complex C data structure, and
cranks out the "revived" data structure at the other end. So this
code must generate a ton of C data elements (integers, floats,
strings, etc.), and compose them in arbitrarily convoluted ways.

Suppose I have this structure

typedef struct {
int bar;
baz *frobozz;
} foo;

to represent one of these data elements. Now, I want to define a
constructor that will produce these things.

Here's where I feel particularly rusty: to malloc or not to malloc.

More specifically, what would be better of these two general
strategies:

foo new_foo(int bar, baz *frobozz) {
foo ret;
ret.bar = bar;
ret.baz = copy_a_baz(frobozz);
return ret;
}

*or*

foo *new_foo(int bar, baz *frobozz) {
foo *ret;
ret = (foo *)malloc(sizeof *ret);
if (!ret) kick_up_a_fuss();
ret->bar = bar;
ret->bar = copy_a_baz(frobozz);
return ret;
}

My gut feeling is that the second one is the way to go, but I
confess that I could not provide a very solid defense for this
preference. At best I'd mumble something indistinct about stacks
and heaps, and change the subject...

What really throws me off is to consider the simpler case of
implementing a constructor for, e.g., integers. In this case, the
gut response I described above feels all wrong. For example, it
feels weird to implement new_int like this:

int *new_int(char *int_as_string) {
int *ret;
ret = (int *)malloc(sizeof *ret);
if (!ret) kick_up_a_fuss();
*ret = (int)strtol(int_as_string, NULL, 10);
return ret;
}

Instead, in this case my instinct would be to simply write

int new_int(char *int_as_string) {
int ret = (int)strtol(int_as_string, NULL, 10);
return ret;
}

or better yet

int new_int(char *int_as_string) {
return (int)strtol(int_as_string, NULL, 10);
}

Okay, my state of confusion should be evident by now. Any words
of wisdom would be much appreciated.

TIA!

kynn

--
 
B

Ben Pfaff

kj said:
More specifically, what would be better of these two general
strategies:

foo new_foo(int bar, baz *frobozz) {
foo ret;
ret.bar = bar;
ret.baz = copy_a_baz(frobozz);
return ret;
}

Occasionally this style is very handy, because it becomes
possible to use the function in a larger expression, especially
if the type that the function initializes does not contain
anything that needs to be destroyed. For example,
print_string(substring("abcdef", 2, 3));
is more convenient than:
s = substring("abcdef", 2, 3);
print_string(s);
free_substring(s);
But in general I avoid writing functions that return structures,
if only because the calling convention for returning a structure
on many platforms is not very efficient; it often involves one or
more structure copies.
foo *new_foo(int bar, baz *frobozz) {
foo *ret;
ret = (foo *)malloc(sizeof *ret);
if (!ret) kick_up_a_fuss();
ret->bar = bar;
ret->bar = copy_a_baz(frobozz);
return ret;
}

This style is better from a calling convention point of view, but
it may be less efficient than necessary because it forces the use
of malloc().

The style that I prefer is this:

void foo_init(foo *foo, int bar, baz *frobozz)
{
foo->bar = bar;
foo->baz = copy_a_baz(frobozz);
}

Then the caller can allocate the data with malloc(), or
statically, or as a local variable, which is better all around,
in my opinion.
 
C

cr88192

Ben Pfaff said:
Occasionally this style is very handy, because it becomes
possible to use the function in a larger expression, especially
if the type that the function initializes does not contain
anything that needs to be destroyed. For example,
print_string(substring("abcdef", 2, 3));
is more convenient than:
s = substring("abcdef", 2, 3);
print_string(s);
free_substring(s);
But in general I avoid writing functions that return structures,
if only because the calling convention for returning a structure
on many platforms is not very efficient; it often involves one or
more structure copies.

yep...

as well, if one uses multiple compilers, this is an area where compilers can
often disagree as to just how it is done... (AKA: MSVC<->GCC struct-return
calls are very likely to either transfer garbage or blow up...).

This style is better from a calling convention point of view, but
it may be less efficient than necessary because it forces the use
of malloc().

The style that I prefer is this:

void foo_init(foo *foo, int bar, baz *frobozz)
{
foo->bar = bar;
foo->baz = copy_a_baz(frobozz);
}

Then the caller can allocate the data with malloc(), or
statically, or as a local variable, which is better all around,
in my opinion.


just my quick comment:
for a lot of my compiler work, since there tends to be a whole lot of data
created in a short amount of time, and then is typically destroyed in-bulk
at the end, an effective strategy I have found is to wrap malloc in a
psuedo-allocator, and use this for most of the temporary data structures.

general style is like this:
if heap is empty:
initialize heap by allocating a chunk of memory (say, 1MB);
set up rover and end_pointer;
for each alloc, check that (rover+size)<end_pointer;
if it fails:
allocate another chunk (say, another 1MB), adding it to the end of the
chunk list;
set up rover and end_pointer for new chunk.
then copy the rover to the temp_pointer;
add size to rover and re-align rover (or, pad up side and add to rover);
return the temp_pointer.

(note: large allocs may be handled specially, such as allocating a
custom-fit chunk of memory, but this is not so difficult to add or deal
with).


so, allocation typically goes very fast, and overhead is typically only as
much as ones' alignment (16 is usually a good value IMO).

when one is done, and no longer needs any of this, then they can free the
list of chunks and leave an empty heap.

(then, for the next pass, the cycle repeats again...).

note that the output is typically either stored-in, or copied to,
non-temporary memory...


note that another (technically simpler) option is to allocate a single much
more huge block, but I will recommend against this, mostly because:
if it overflows then crap starts breaking, fast...
simply allocating and initializing such a chunk of memory is time-consuming
(it is kind of lame if the 'malloc' takes more time than the whole rest of
the compilation process...).
this is not very good practice IMO, as other parts of the app may need this
memory and/or address space as well...


so, incrementally allocating smaller chunks is better, both in terms of
making it work well, and having better performance...

note that it also works well for some kinds of interpreters, especially if
the interpreter is used for a similar purpose (namely, highly burst-based
activity, but with little need for a persistent heap).


similarly, one can also make use of a general-purpose garbage collector (as
opposed to a custom strategy), but I will recommend against this, as this
style of usage typically does not mix well with general purpose conservative
GC's (rapid burst allocations and creations of lots of garbage all at once,
is not an ideal usage pattern for most GC's...).

(I had used a general purpose GC for the compiler before, and will note that
using a customized strategy works much better as far as performance and
reliability goes...).


but, anyways, this is just one of many allocation strategies I have
discovered over the years, each useful for its own little thing...


note: if lots of strings are to be used, it may also make sense to merge
them, where the gains from merging the strings (if done efficiently), often
more than make up for the overhead of having done so (merged strings ->
faster code, and potentially drastically reduced memory overhead...).

....

 
F

Fred

I used to program C all the time, but that was years ago.  Since
then, continued use of languages like Java that take care of memory
management have made me soft and flabby.  Now I have to code
something in C, and I get the feeling that I'm courting memory
leaks and segmentation faults at every turn...

I'm writing a (Flex/Bison) parser that reads some serialized
representation of an arbitrarily complex C data structure, and
cranks out the "revived" data structure at the other end.  So this
code must generate a ton of C data elements (integers, floats,
strings, etc.), and compose them in arbitrarily convoluted ways.

Suppose I have this structure

typedef struct {
  int bar;
  baz *frobozz;

} foo;

to represent one of these data elements.  Now, I want to define a
constructor that will produce these things.

Here's where I feel particularly rusty: to malloc or not to malloc.

More specifically, what would be better of these two general
strategies:

foo new_foo(int bar, baz *frobozz) {
  foo ret;
  ret.bar = bar;
  ret.baz = copy_a_baz(frobozz);
  return ret;

}

*or*

foo *new_foo(int bar, baz *frobozz) {
  foo *ret;
  ret = (foo *)malloc(sizeof *ret);
  if (!ret) kick_up_a_fuss();
  ret->bar = bar;
  ret->bar = copy_a_baz(frobozz);
  return ret;

}
<snip>

The second is the ONLY one that will work.
In the first one, you return a local variable that goes
out of scope immediately.
 
L

lndresnick

<snip>

The second is the ONLY one that will work.
In the first one, you return a local variable that goes
out of scope immediately.

Nope, will return a copy of the struct. The prohibition you mention
is against returning pointers to automatic objects. Doesn't apply
here.

-David
 
F

Fred

Nope, will return a copy of the struct.  The prohibition you mention
is against returning pointers to automatic objects.   Doesn't apply
here.

Oops - I misread it as returning a *foo
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top