to malloc or not to malloc?

Discussion in 'C Programming' started by kj, Jun 4, 2009.

  1. kj

    kj Guest

    I used to program C all the time, but that was years ago. Since
    then, continued use of languages like Java that take care of memory
    management have made me soft and flabby. Now I have to code
    something in C, and I get the feeling that I'm courting memory
    leaks and segmentation faults at every turn...

    I'm writing a (Flex/Bison) parser that reads some serialized
    representation of an arbitrarily complex C data structure, and
    cranks out the "revived" data structure at the other end. So this
    code must generate a ton of C data elements (integers, floats,
    strings, etc.), and compose them in arbitrarily convoluted ways.

    Suppose I have this structure

    typedef struct {
    int bar;
    baz *frobozz;
    } foo;

    to represent one of these data elements. Now, I want to define a
    constructor that will produce these things.

    Here's where I feel particularly rusty: to malloc or not to malloc.

    More specifically, what would be better of these two general
    strategies:

    foo new_foo(int bar, baz *frobozz) {
    foo ret;
    ret.bar = bar;
    ret.baz = copy_a_baz(frobozz);
    return ret;
    }

    *or*

    foo *new_foo(int bar, baz *frobozz) {
    foo *ret;
    ret = (foo *)malloc(sizeof *ret);
    if (!ret) kick_up_a_fuss();
    ret->bar = bar;
    ret->bar = copy_a_baz(frobozz);
    return ret;
    }

    My gut feeling is that the second one is the way to go, but I
    confess that I could not provide a very solid defense for this
    preference. At best I'd mumble something indistinct about stacks
    and heaps, and change the subject...

    What really throws me off is to consider the simpler case of
    implementing a constructor for, e.g., integers. In this case, the
    gut response I described above feels all wrong. For example, it
    feels weird to implement new_int like this:

    int *new_int(char *int_as_string) {
    int *ret;
    ret = (int *)malloc(sizeof *ret);
    if (!ret) kick_up_a_fuss();
    *ret = (int)strtol(int_as_string, NULL, 10);
    return ret;
    }

    Instead, in this case my instinct would be to simply write

    int new_int(char *int_as_string) {
    int ret = (int)strtol(int_as_string, NULL, 10);
    return ret;
    }

    or better yet

    int new_int(char *int_as_string) {
    return (int)strtol(int_as_string, NULL, 10);
    }

    Okay, my state of confusion should be evident by now. Any words
    of wisdom would be much appreciated.

    TIA!

    kynn

    --
     
    kj, Jun 4, 2009
    #1
    1. Advertising

  2. kj

    Ben Pfaff Guest

    kj <> writes:

    > More specifically, what would be better of these two general
    > strategies:
    >
    > foo new_foo(int bar, baz *frobozz) {
    > foo ret;
    > ret.bar = bar;
    > ret.baz = copy_a_baz(frobozz);
    > return ret;
    > }


    Occasionally this style is very handy, because it becomes
    possible to use the function in a larger expression, especially
    if the type that the function initializes does not contain
    anything that needs to be destroyed. For example,
    print_string(substring("abcdef", 2, 3));
    is more convenient than:
    s = substring("abcdef", 2, 3);
    print_string(s);
    free_substring(s);
    But in general I avoid writing functions that return structures,
    if only because the calling convention for returning a structure
    on many platforms is not very efficient; it often involves one or
    more structure copies.

    > foo *new_foo(int bar, baz *frobozz) {
    > foo *ret;
    > ret = (foo *)malloc(sizeof *ret);
    > if (!ret) kick_up_a_fuss();
    > ret->bar = bar;
    > ret->bar = copy_a_baz(frobozz);
    > return ret;
    > }


    This style is better from a calling convention point of view, but
    it may be less efficient than necessary because it forces the use
    of malloc().

    The style that I prefer is this:

    void foo_init(foo *foo, int bar, baz *frobozz)
    {
    foo->bar = bar;
    foo->baz = copy_a_baz(frobozz);
    }

    Then the caller can allocate the data with malloc(), or
    statically, or as a local variable, which is better all around,
    in my opinion.
    --
    Ben Pfaff
    http://benpfaff.org
     
    Ben Pfaff, Jun 4, 2009
    #2
    1. Advertising

  3. kj

    cr88192 Guest

    "Ben Pfaff" <> wrote in message
    news:...
    > kj <> writes:
    >
    >> More specifically, what would be better of these two general
    >> strategies:
    >>
    >> foo new_foo(int bar, baz *frobozz) {
    >> foo ret;
    >> ret.bar = bar;
    >> ret.baz = copy_a_baz(frobozz);
    >> return ret;
    >> }

    >
    > Occasionally this style is very handy, because it becomes
    > possible to use the function in a larger expression, especially
    > if the type that the function initializes does not contain
    > anything that needs to be destroyed. For example,
    > print_string(substring("abcdef", 2, 3));
    > is more convenient than:
    > s = substring("abcdef", 2, 3);
    > print_string(s);
    > free_substring(s);
    > But in general I avoid writing functions that return structures,
    > if only because the calling convention for returning a structure
    > on many platforms is not very efficient; it often involves one or
    > more structure copies.
    >


    yep...

    as well, if one uses multiple compilers, this is an area where compilers can
    often disagree as to just how it is done... (AKA: MSVC<->GCC struct-return
    calls are very likely to either transfer garbage or blow up...).


    >> foo *new_foo(int bar, baz *frobozz) {
    >> foo *ret;
    >> ret = (foo *)malloc(sizeof *ret);
    >> if (!ret) kick_up_a_fuss();
    >> ret->bar = bar;
    >> ret->bar = copy_a_baz(frobozz);
    >> return ret;
    >> }

    >
    > This style is better from a calling convention point of view, but
    > it may be less efficient than necessary because it forces the use
    > of malloc().
    >
    > The style that I prefer is this:
    >
    > void foo_init(foo *foo, int bar, baz *frobozz)
    > {
    > foo->bar = bar;
    > foo->baz = copy_a_baz(frobozz);
    > }
    >
    > Then the caller can allocate the data with malloc(), or
    > statically, or as a local variable, which is better all around,
    > in my opinion.



    just my quick comment:
    for a lot of my compiler work, since there tends to be a whole lot of data
    created in a short amount of time, and then is typically destroyed in-bulk
    at the end, an effective strategy I have found is to wrap malloc in a
    psuedo-allocator, and use this for most of the temporary data structures.

    general style is like this:
    if heap is empty:
    initialize heap by allocating a chunk of memory (say, 1MB);
    set up rover and end_pointer;
    for each alloc, check that (rover+size)<end_pointer;
    if it fails:
    allocate another chunk (say, another 1MB), adding it to the end of the
    chunk list;
    set up rover and end_pointer for new chunk.
    then copy the rover to the temp_pointer;
    add size to rover and re-align rover (or, pad up side and add to rover);
    return the temp_pointer.

    (note: large allocs may be handled specially, such as allocating a
    custom-fit chunk of memory, but this is not so difficult to add or deal
    with).


    so, allocation typically goes very fast, and overhead is typically only as
    much as ones' alignment (16 is usually a good value IMO).

    when one is done, and no longer needs any of this, then they can free the
    list of chunks and leave an empty heap.

    (then, for the next pass, the cycle repeats again...).

    note that the output is typically either stored-in, or copied to,
    non-temporary memory...


    note that another (technically simpler) option is to allocate a single much
    more huge block, but I will recommend against this, mostly because:
    if it overflows then crap starts breaking, fast...
    simply allocating and initializing such a chunk of memory is time-consuming
    (it is kind of lame if the 'malloc' takes more time than the whole rest of
    the compilation process...).
    this is not very good practice IMO, as other parts of the app may need this
    memory and/or address space as well...


    so, incrementally allocating smaller chunks is better, both in terms of
    making it work well, and having better performance...

    note that it also works well for some kinds of interpreters, especially if
    the interpreter is used for a similar purpose (namely, highly burst-based
    activity, but with little need for a persistent heap).


    similarly, one can also make use of a general-purpose garbage collector (as
    opposed to a custom strategy), but I will recommend against this, as this
    style of usage typically does not mix well with general purpose conservative
    GC's (rapid burst allocations and creations of lots of garbage all at once,
    is not an ideal usage pattern for most GC's...).

    (I had used a general purpose GC for the compiler before, and will note that
    using a customized strategy works much better as far as performance and
    reliability goes...).


    but, anyways, this is just one of many allocation strategies I have
    discovered over the years, each useful for its own little thing...


    note: if lots of strings are to be used, it may also make sense to merge
    them, where the gains from merging the strings (if done efficiently), often
    more than make up for the overhead of having done so (merged strings ->
    faster code, and potentially drastically reduced memory overhead...).

    ....


    > --
    > Ben Pfaff
    > http://benpfaff.org
     
    cr88192, Jun 4, 2009
    #3
  4. kj

    Fred Guest

    On Jun 4, 8:33 am, kj <> wrote:
    > I used to program C all the time, but that was years ago.  Since
    > then, continued use of languages like Java that take care of memory
    > management have made me soft and flabby.  Now I have to code
    > something in C, and I get the feeling that I'm courting memory
    > leaks and segmentation faults at every turn...
    >
    > I'm writing a (Flex/Bison) parser that reads some serialized
    > representation of an arbitrarily complex C data structure, and
    > cranks out the "revived" data structure at the other end.  So this
    > code must generate a ton of C data elements (integers, floats,
    > strings, etc.), and compose them in arbitrarily convoluted ways.
    >
    > Suppose I have this structure
    >
    > typedef struct {
    >   int bar;
    >   baz *frobozz;
    >
    > } foo;
    >
    > to represent one of these data elements.  Now, I want to define a
    > constructor that will produce these things.
    >
    > Here's where I feel particularly rusty: to malloc or not to malloc.
    >
    > More specifically, what would be better of these two general
    > strategies:
    >
    > foo new_foo(int bar, baz *frobozz) {
    >   foo ret;
    >   ret.bar = bar;
    >   ret.baz = copy_a_baz(frobozz);
    >   return ret;
    >
    > }
    >
    > *or*
    >
    > foo *new_foo(int bar, baz *frobozz) {
    >   foo *ret;
    >   ret = (foo *)malloc(sizeof *ret);
    >   if (!ret) kick_up_a_fuss();
    >   ret->bar = bar;
    >   ret->bar = copy_a_baz(frobozz);
    >   return ret;
    >
    > }
    >

    <snip>

    The second is the ONLY one that will work.
    In the first one, you return a local variable that goes
    out of scope immediately.
    --
    Fred K
     
    Fred, Jun 4, 2009
    #4
  5. kj

    Guest

    On Jun 4, 2:22 pm, Fred <> wrote:
    > On Jun 4, 8:33 am, kj <> wrote:
    >
    > > I used to program C all the time, but that was years ago.  Since
    > > then, continued use of languages like Java that take care of memory
    > > management have made me soft and flabby.  Now I have to code
    > > something in C, and I get the feeling that I'm courting memory
    > > leaks and segmentation faults at every turn...

    >
    > > I'm writing a (Flex/Bison) parser that reads some serialized
    > > representation of an arbitrarily complex C data structure, and
    > > cranks out the "revived" data structure at the other end.  So this
    > > code must generate a ton of C data elements (integers, floats,
    > > strings, etc.), and compose them in arbitrarily convoluted ways.

    >
    > > Suppose I have this structure

    >
    > > typedef struct {
    > >   int bar;
    > >   baz *frobozz;

    >
    > > } foo;

    >
    > > to represent one of these data elements.  Now, I want to define a
    > > constructor that will produce these things.

    >
    > > Here's where I feel particularly rusty: to malloc or not to malloc.

    >
    > > More specifically, what would be better of these two general
    > > strategies:

    >
    > > foo new_foo(int bar, baz *frobozz) {
    > >   foo ret;
    > >   ret.bar = bar;
    > >   ret.baz = copy_a_baz(frobozz);
    > >   return ret;

    >
    > > }

    >
    > > *or*

    >
    > > foo *new_foo(int bar, baz *frobozz) {
    > >   foo *ret;
    > >   ret = (foo *)malloc(sizeof *ret);
    > >   if (!ret) kick_up_a_fuss();
    > >   ret->bar = bar;
    > >   ret->bar = copy_a_baz(frobozz);
    > >   return ret;

    >
    > > }

    >
    > <snip>
    >
    > The second is the ONLY one that will work.
    > In the first one, you return a local variable that goes
    > out of scope immediately.


    Nope, will return a copy of the struct. The prohibition you mention
    is against returning pointers to automatic objects. Doesn't apply
    here.

    -David
     
    , Jun 4, 2009
    #5
  6. kj

    Fred Guest

    On Jun 4, 11:28 am, wrote:
    > On Jun 4, 2:22 pm, Fred <> wrote:
    >
    >
    >
    >
    >
    > > On Jun 4, 8:33 am, kj <> wrote:

    >
    > > > I used to program C all the time, but that was years ago.  Since
    > > > then, continued use of languages like Java that take care of memory
    > > > management have made me soft and flabby.  Now I have to code
    > > > something in C, and I get the feeling that I'm courting memory
    > > > leaks and segmentation faults at every turn...

    >
    > > > I'm writing a (Flex/Bison) parser that reads some serialized
    > > > representation of an arbitrarily complex C data structure, and
    > > > cranks out the "revived" data structure at the other end.  So this
    > > > code must generate a ton of C data elements (integers, floats,
    > > > strings, etc.), and compose them in arbitrarily convoluted ways.

    >
    > > > Suppose I have this structure

    >
    > > > typedef struct {
    > > >   int bar;
    > > >   baz *frobozz;

    >
    > > > } foo;

    >
    > > > to represent one of these data elements.  Now, I want to define a
    > > > constructor that will produce these things.

    >
    > > > Here's where I feel particularly rusty: to malloc or not to malloc.

    >
    > > > More specifically, what would be better of these two general
    > > > strategies:

    >
    > > > foo new_foo(int bar, baz *frobozz) {
    > > >   foo ret;
    > > >   ret.bar = bar;
    > > >   ret.baz = copy_a_baz(frobozz);
    > > >   return ret;

    >
    > > > }

    >
    > > > *or*

    >
    > > > foo *new_foo(int bar, baz *frobozz) {
    > > >   foo *ret;
    > > >   ret = (foo *)malloc(sizeof *ret);
    > > >   if (!ret) kick_up_a_fuss();
    > > >   ret->bar = bar;
    > > >   ret->bar = copy_a_baz(frobozz);
    > > >   return ret;

    >
    > > > }

    >
    > > <snip>

    >
    > > The second is the ONLY one that will work.
    > > In the first one, you return a local variable that goes
    > > out of scope immediately.

    >
    > Nope, will return a copy of the struct.  The prohibition you mention
    > is against returning pointers to automatic objects.   Doesn't apply
    > here.
    >


    Oops - I misread it as returning a *foo
    --
    Fred
     
    Fred, Jun 5, 2009
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John
    Replies:
    13
    Views:
    708
  2. ravi
    Replies:
    0
    Views:
    455
  3. Peter
    Replies:
    34
    Views:
    1,957
    Richard Tobin
    Oct 22, 2004
  4. porting non-malloc code to malloc

    , Feb 18, 2005, in forum: C Programming
    Replies:
    3
    Views:
    481
    Walter Roberson
    Feb 19, 2005
  5. Johs32

    to malloc or not to malloc??

    Johs32, Mar 30, 2006, in forum: C Programming
    Replies:
    4
    Views:
    324
    Captain Winston
    Mar 30, 2006
Loading...

Share This Page