Buffer growing strategy?

Discussion in 'C Programming' started by Jef Driesen, Sep 17, 2009.

  1. Jef Driesen

    Jef Driesen Guest

    Hi,

    I'm implementing a buffer that grows automatically when appending or
    prepending data. To improve performance, I'm looking into a buffer
    growing strategy to double the memory (or more generally multiply with a
    FACTOR > 1). It seems there are a number of variations in use:

    1. Try to increase the current size once, and use the requested size 'n'
    if that is not enough.

    newsize = oldsize * FACTOR;
    if (n > newsize)
    newsize = n;

    2. Increase the current size until the new size is large enough. (I'm
    aware that I have to take extra care in case oldsize is zero, or with
    multiplications with a non-integer FACTOR.)

    newsize = oldsize;
    while (n > newsize)
    newsize *= FACTOR;

    3. Similar to #2, but always using a fixed blocksize instead of the
    current size. (Typically BLOCKSIZE=1 and FACTOR=2, such that the
    allocated buffer is always a power of two.)

    newsize = BLOCKSIZE;
    while (n > newsize)
    newsize *= FACTOR;

    Which of these methods is preferred, and for what reason? Or is that
    just a matter of personal preference?

    Thanks,

    Jef
    Jef Driesen, Sep 17, 2009
    #1
    1. Advertising

  2. Jef Driesen

    jacob navia Guest

    Richard Harter a écrit :

    [snip discussion about reallocation strategies]

    > The real trouble is that the allocator has no way of
    > knowing that you have a resizable array.
    >


    And why not?

    Why do we have a malloc that can't receive a set of flags that
    would indicate it what use will be done for the allocated
    memory?

    If we could tell the malloc/realloc/free system

    #define ALLOC_FLAG_RESIZABLE 1 // Can be resized
    #define ALLOC_FLAG_STATIC 2 // Never resized
    #define ALLOC_FLAG_NOALIGN 4 // No alignment requirement
    #define ALLOC_FLAG_FILLZERO 8
    #define ALLOC_FLAG_PERMANENT 16 // Never freed

    and many others. That would allow the allocator to function
    much more efficiently than now, where it has to be good for
    all purposes!

    This is what I mean with an obsolete standard library.
    jacob navia, Sep 17, 2009
    #2
    1. Advertising

  3. Jef Driesen

    Seebs Guest

    On 2009-09-17, jacob navia <> wrote:
    > Why do we have a malloc that can't receive a set of flags that
    > would indicate it what use will be done for the allocated
    > memory?


    Don't see much benefit.

    > If we could tell the malloc/realloc/free system


    > #define ALLOC_FLAG_RESIZABLE 1 // Can be resized
    > #define ALLOC_FLAG_STATIC 2 // Never resized


    This is silly. Pick one, make it a flag. The other is the
    absence of the flag.

    Anyway, what do you mean "resized"? Resized how much? Are we
    going to end up with

    ALLOC_FLAG_RESIZABLE_NO_MORE_THAN_ONE_MEGABYTE
    ALLOC_FLAG_RESIZABLE_I_DUNNO_MAYBE_TWENTY_K_OR_SO
    ALLOC_FLAG_RESIZABLE_FACTOR_OF_THREE_PROBABLY

    ?

    > #define ALLOC_FLAG_NOALIGN 4 // No alignment requirement


    I don't see any measurable benefit.

    > #define ALLOC_FLAG_FILLZERO 8


    This one I think has actual potential.

    > #define ALLOC_FLAG_PERMANENT 16 // Never freed


    Maybe.

    > and many others. That would allow the allocator to function
    > much more efficiently than now, where it has to be good for
    > all purposes!
    >
    > This is what I mean with an obsolete standard library.


    I'm not at all convinced. I've seen a couple of allocators with flags,
    and in general, it didn't work out particularly well. (Sometimes merely
    due to poor choices about documentation.) I don't necessarily buy
    the argument that there would be a significant performance increase,
    but there'd certainly be a ton of potential for mistakes (resizing
    a no-resize region, etc.).

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Sep 17, 2009
    #3
  4. Seebs <> writes:
    > On 2009-09-17, jacob navia <> wrote:

    [...]
    >> #define ALLOC_FLAG_PERMANENT 16 // Never freed

    >
    > Maybe.

    [...]

    So what happens if you try to free() a pointer that was allocated with
    ALLOC_FLAG_PERMANENT?

    unsigned char *perm = new_malloc(1000, ALLOC_FLAG_PERMANENT);
    free(perm);

    I can think of several possible answers, considering that free() has
    no mechanism for reporting errors.

    1. It's undefined behavior.

    2. It silently does nothing.

    3. There's a new free-like function to go along with the new
    malloc-like function (I think it's sufficiently obvious that we
    can't add a flags argument to malloc). free(perm) either is UB or
    does nothing; new_free(perm) either does nothing or reports an
    error.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Sep 17, 2009
    #4
  5. Jef Driesen

    jacob navia Guest

    Seebs a écrit :
    > On 2009-09-17, jacob navia <> wrote:
    >> Why do we have a malloc that can't receive a set of flags that
    >> would indicate it what use will be done for the allocated
    >> memory?

    >
    > Don't see much benefit.
    >
    >> If we could tell the malloc/realloc/free system

    >
    >> #define ALLOC_FLAG_RESIZABLE 1 // Can be resized
    >> #define ALLOC_FLAG_STATIC 2 // Never resized

    >
    > This is silly. Pick one, make it a flag. The other is the
    > absence of the flag.
    >


    Surely not, since there are allocations where you just
    do not know anything about the future use yet!


    > Anyway, what do you mean "resized"? Resized how much? Are we
    > going to end up with
    >
    > ALLOC_FLAG_RESIZABLE_NO_MORE_THAN_ONE_MEGABYTE
    > ALLOC_FLAG_RESIZABLE_I_DUNNO_MAYBE_TWENTY_K_OR_SO
    > ALLOC_FLAG_RESIZABLE_FACTOR_OF_THREE_PROBABLY
    >
    > ?


    No, the size should go in the extra arguments

    >
    >> #define ALLOC_FLAG_NOALIGN 4 // No alignment requirement

    >
    > I don't see any measurable benefit.
    >


    Less wasted space between the allocated data

    >> #define ALLOC_FLAG_FILLZERO 8

    >
    > This one I think has actual potential.
    >
    >> #define ALLOC_FLAG_PERMANENT 16 // Never freed

    >
    > Maybe.
    >
    >> and many others. That would allow the allocator to function
    >> much more efficiently than now, where it has to be good for
    >> all purposes!
    >>
    >> This is what I mean with an obsolete standard library.

    >
    > I'm not at all convinced. I've seen a couple of allocators with flags,
    > and in general, it didn't work out particularly well. (Sometimes merely
    > due to poor choices about documentation.) I don't necessarily buy
    > the argument that there would be a significant performance increase,
    > but there'd certainly be a ton of potential for mistakes (resizing
    > a no-resize region, etc.).


    Flags with the allocator exist in windows since windows 3.1 (1992 if I
    remember correctly) and they were and are useful.

    malloc/free needs functionality like

    size_t GetBlockSize(void *);

    that allows you to know how big this malloced block is. If you get
    (size_t)-1 it means that is not a block allocated with malloc.
    jacob navia, Sep 17, 2009
    #5
  6. Jef Driesen

    Jef Driesen Guest

    Richard Harter wrote:
    > On Thu, 17 Sep 2009 02:38:36 -0700 (PDT), "christian.bau"
    > <> wrote:
    >
    >> On Sep 17, 8:32=A0am, Jef Driesen <>
    >> wrote:
    >>> Hi,
    >>>
    >>> I'm implementing a buffer that grows automatically when appending or
    >>> prepending data. To improve performance, I'm looking into a buffer
    >>> growing strategy to double the memory (or more generally multiply with a
    >>> FACTOR > 1). It seems there are a number of variations in use:
    >>>
    >>> 1. Try to increase the current size once, and use the requested size 'n'
    >>> if that is not enough.
    >>>
    >>> newsize =3D oldsize * FACTOR;
    >>> if (n > newsize)
    >>> =A0 =A0 newsize =3D n;
    >>>
    >>> 2. Increase the current size until the new size is large enough. (I'm
    >>> aware that I have to take extra care in case oldsize is zero, or with
    >>> multiplications with a non-integer FACTOR.)
    >>>
    >>> newsize =3D oldsize;
    >>> while (n > newsize)
    >>> =A0 =A0 newsize *=3D FACTOR;
    >>>
    >>> 3. Similar to #2, but always using a fixed blocksize instead of the
    >>> current size. (Typically BLOCKSIZE=3D1 and FACTOR=3D2, such that the
    >>> allocated buffer is always a power of two.)
    >>>
    >>> newsize =3D BLOCKSIZE;
    >>> while (n > newsize)
    >>> =A0 =A0 newsize *=3D FACTOR;
    >>>
    >>> Which of these methods is preferred, and for what reason? Or is that
    >>> just a matter of personal preference?

    >
    > Let me add number 4:
    >
    > if (n > size) {
    > size = n * FACTOR;
    > realloc(buffer,size);
    > }
    >
    > It mostly is a matter of personal preference since the costs are
    > nominal in all cases. I see no advantage to 3 since 2 will do
    > the same thing more cheaply. There are potential disadvantages
    > in using power of two sizes. However if you want powers of two
    > sizes use #2.
    >
    > Otherwise use 1 or 4. In my opinion, for whatever that may be
    > worth.


    Can you explain why #2 is cheaper than #3? And what is the potential
    disadvantage of powers of two? I assumed powers of two might be an
    advantage because the underlying allocator probably uses such block
    sizes internally, but I'm not really sure about that.

    My typical use case is that the data blocks that needs to be
    appended/prepended to the buffer usually have a fixed size. I believe a
    FACTOR=2 is the best choice in that case, because such a block will
    always fit in the free space (if there is free space of course).

    >> A very simple solution is to just call realloc () for the new size
    >> whenever the size changes, and hope that the realloc implementation is
    >> efficient. Which it usually is. I just checked that calling realloc
    >> for the same pointer with all sizes from 1 byte to 1 billion bytes
    >> took just 55 nanoseconds per realloc on a three year old MacBook. In
    >> this case, anything more complex than calling realloc seems a waste of
    >> programmers time.

    >
    > That's simple enough, but it is more expensive than you appear to
    > think. The problem is not with the cost of an individual realloc
    > (though that will increase with buffer size); rather it is with
    > the number of calls.
    >
    > [... nice explanation about allocators ...]


    In my particular case, realloc is not always optimal because my buffer
    not only needs to support appending data at the end, but also prepending
    data at the front. And realloc is not ideal when the data needs to be
    moved anyway. (See my other thread "Increase memory buffer with realloc
    or malloc?")

    I have implemented my buffer in such a way that the free space is either
    located at the end (for efficient appending) or at the front (for
    efficient prepending). Whenever data is appended and the free space is
    at the wrong end it is moved (and if necessary the size increased as
    well). That way, future append operations are as efficient as possible.
    The same for prepending data. Only interleaving appends and prepends is
    inefficient, but that does not happen in my code, so that's fine.
    Jef Driesen, Sep 17, 2009
    #6
  7. On Sep 17, 3:39 pm, Keith Thompson <> wrote:


    > (I think it's sufficiently obvious that we
    >    can't add a flags argument to malloc).



    Certainly Jacob Navia would not agree.

    Seems to me the best approach is to add
    optional arguments and default values to C.
    This would not break backward compatibility if
    we require that a library function called without
    optional arguments must have identical behaviour
    as before.

    We could then add an optional flag argument to
    malloc and an optional error return argument
    to free with default value null, eg,

    free(ptr,&free_erno)
    free(ptr) equivalent to free(ptr,null)

    If ptr is an ALLOC_FLAG_PERMANENT ptr then
    free(ptr) could silently do nothing while
    free(ptr,&free_erno) would return an appropriate
    error.

    I am not sure if the gain in worth it in this
    specific case. On the
    other hand I think optional arguments are
    a great idea.


    - William Hughes
    William Hughes, Sep 17, 2009
    #7
  8. jacob navia <> writes:
    > Seebs a écrit :
    >> On 2009-09-17, jacob navia <> wrote:
    >>> Why do we have a malloc that can't receive a set of flags that
    >>> would indicate it what use will be done for the allocated
    >>> memory?

    >>
    >> Don't see much benefit.
    >>
    >>> If we could tell the malloc/realloc/free system

    >>
    >>> #define ALLOC_FLAG_RESIZABLE 1 // Can be resized
    >>> #define ALLOC_FLAG_STATIC 2 // Never resized

    >>
    >> This is silly. Pick one, make it a flag. The other is the
    >> absence of the flag.
    >>

    >
    > Surely not, since there are allocations where you just
    > do not know anything about the future use yet!


    I don't see the difference between "Let me resize this later" and
    "I don't know whether I'll need to resize this later". Well, maybe
    the former could do something to make resizing a trifle faster,
    but I'm skeptical that that's a distinction worth making.

    What happens if I specify ALLOC_FLAG_RESIZABLE|ALLOC_FLAG_STATIC?

    >> Anyway, what do you mean "resized"? Resized how much? Are we
    >> going to end up with
    >>
    >> ALLOC_FLAG_RESIZABLE_NO_MORE_THAN_ONE_MEGABYTE
    >> ALLOC_FLAG_RESIZABLE_I_DUNNO_MAYBE_TWENTY_K_OR_SO
    >> ALLOC_FLAG_RESIZABLE_FACTOR_OF_THREE_PROBABLY
    >>
    >> ?

    >
    > No, the size should go in the extra arguments


    Um, what extra arguments? I don't think you've shown us the
    function(s) that take these flags.

    >>
    >>> #define ALLOC_FLAG_NOALIGN 4 // No alignment requirement

    >>
    >> I don't see any measurable benefit.

    >
    > Less wasted space between the allocated data


    Hmm. Depending on the implementation, there might be little or no
    benefit to this; for example, there might be housekeeping information
    stored in a freed block that needs to be aligned anyway.

    But if you're going to provide alignment control, IMHO it makes more
    sense to allow the actual alignment to be specified. Document, or
    even provide a way to query, what alignemnts are supported, including
    a constant for the maximum required alignment (what malloc() uses).

    >>> #define ALLOC_FLAG_FILLZERO 8

    >>
    >> This one I think has actual potential.


    With the usual caveat that all-bits-zero isn't necessarily 0 for
    pointers or FP values. (That's not to say it wouldn't be useful
    in spite of that; after all, we already have calloc().)

    [...]

    > malloc/free needs functionality like
    >
    > size_t GetBlockSize(void *);
    >
    > that allows you to know how big this malloced block is. If you get
    > (size_t)-1 it means that is not a block allocated with malloc.


    If I do this:
    void *ptr = malloc(999);
    does GetBlockSize(ptr) give me 999 or the actual number of bytes
    allocated?

    I can see this being useful. I can also see it imposing a significant
    burden on some implementations.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Sep 17, 2009
    #8
  9. Jef Driesen

    Seebs Guest

    On 2009-09-17, jacob navia <> wrote:
    > Seebs a écrit :
    >>> If we could tell the malloc/realloc/free system

    >>
    >>> #define ALLOC_FLAG_RESIZABLE 1 // Can be resized
    >>> #define ALLOC_FLAG_STATIC 2 // Never resized


    >> This is silly. Pick one, make it a flag. The other is the
    >> absence of the flag.


    > Surely not, since there are allocations where you just
    > do not know anything about the future use yet!


    So what?

    If you don't know at the time of allocation whether or not it is possible
    to resize, then it's resizable.

    "Never resized" means "I can prove that it can never be resized".

    > No, the size should go in the extra arguments


    But what size? The largest possible future size? Merely the information
    that there may be a future size but we don't know what it is?

    >>> #define ALLOC_FLAG_NOALIGN 4 // No alignment requirement

    >>
    >> I don't see any measurable benefit.


    > Less wasted space between the allocated data


    Wasted space between allocated data is probably very close to zero
    functionally, because you still need alignment for the information used
    to describe the region...

    > Flags with the allocator exist in windows since windows 3.1 (1992 if I
    > remember correctly) and they were and are useful.


    Maybe. I'm not convinced that they've paid off.

    Which systems have had more problems with memory allocation systems failing
    catastrophically: Windows, or systems which just provided malloc/free?

    Answer: Windows.

    > malloc/free needs functionality like
    >
    > size_t GetBlockSize(void *);
    >
    > that allows you to know how big this malloced block is. If you get
    > (size_t)-1 it means that is not a block allocated with malloc.


    Why does it need this functionality? I have never needed it. If I want to
    track how much space I've requested, I do so -- and the rest of the time I
    don't pay the (substantial) overhead.

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Sep 17, 2009
    #9
  10. Jef Driesen

    Seebs Guest

    On 2009-09-17, William Hughes <> wrote:
    > Seems to me the best approach is to add
    > optional arguments and default values to C.


    I think this would be pretty horrible.

    > This would not break backward compatibility if
    > we require that a library function called without
    > optional arguments must have identical behaviour
    > as before.


    How would we know it had been called without the optional arguments?

    In the existing world, "optional arguments" have always been marked by
    a previous argument -- you have to pass in an argument which tells you
    that there are more arguments coming.

    You can't do that to free, though.

    > I am not sure if the gain in worth it in this
    > specific case. On the
    > other hand I think optional arguments are
    > a great idea.


    They might be, except a large number of ABIs make no provision for a way
    to tell whether or not they've been passed. The best I can think of is
    that you'd have to always pass the "default" value implicitly, but that's
    a pretty big change and imposes substantial costs too. And honestly,
    I'm not sure I think it's necessarily an improvement. I've found that
    "default arguments" seem to more often hide errors than be useful to me.

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Sep 17, 2009
    #10
  11. Jef Driesen

    Seebs Guest

    On 2009-09-17, jacob navia <> wrote:
    > Sure, I remember that malloc wa&s completely screwed back then but
    >
    > seriously...
    >
    > do you REALLY think it was because of the extra arguments?


    Well, let's see. We have a simple design and a complex design. The
    complex design fails more often.

    Hmmmmmmm.

    > To avoid buffer overflows it is nice to know the size of your buffer
    > isn't it?


    If you don't know the size of your buffer already, checking the allocated
    space is the wrong way to find out.

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Sep 17, 2009
    #11
  12. Jef Driesen

    jacob navia Guest

    William Hughes a écrit :
    > On Sep 17, 3:39 pm, Keith Thompson <> wrote:
    >
    >
    >> (I think it's sufficiently obvious that we
    >> can't add a flags argument to malloc).

    >
    >
    > Certainly Jacob Navia would not agree.
    >
    > Seems to me the best approach is to add
    > optional arguments and default values to C.
    > This would not break backward compatibility if
    > we require that a library function called without
    > optional arguments must have identical behaviour
    > as before.
    >
    > We could then add an optional flag argument to
    > malloc and an optional error return argument
    > to free with default value null, eg,
    >
    > free(ptr,&free_erno)
    > free(ptr) equivalent to free(ptr,null)
    >
    > If ptr is an ALLOC_FLAG_PERMANENT ptr then
    > free(ptr) could silently do nothing while
    > free(ptr,&free_erno) would return an appropriate
    > error.
    >
    > I am not sure if the gain in worth it in this
    > specific case. On the
    > other hand I think optional arguments are
    > a great idea.
    >
    >
    > - William Hughes
    >


    I know of a C compiler that implements optional arguments.

    Aren't they GREAT?

    :)

    But I think a new set of entry points could be designed, and
    it would be OK too.

    Microsoft always appended Ex when it needed a new interface,
    like CreateWindowEx that added some args to CreateWindow.

    We could add some suffix (malloc_extended, for instance).
    Yes, there will be no other solution as to use namespace
    since optional arguments would provoke a tempest of protests

    :)
    jacob navia, Sep 17, 2009
    #12
  13. Jef Driesen

    jacob navia Guest

    Seebs a écrit :
    >
    >> Flags with the allocator exist in windows since windows 3.1 (1992 if I
    >> remember correctly) and they were and are useful.

    >
    > Maybe. I'm not convinced that they've paid off.
    >
    > Which systems have had more problems with memory allocation systems failing
    > catastrophically: Windows, or systems which just provided malloc/free?
    >
    > Answer: Windows.
    >


    Sure, I remember that malloc wa&s completely screwed back then but

    seriously...

    do you REALLY think it was because of the extra arguments?

    Now, come on...

    >> malloc/free needs functionality like
    >>
    >> size_t GetBlockSize(void *);
    >>
    >> that allows you to know how big this malloced block is. If you get
    >> (size_t)-1 it means that is not a block allocated with malloc.

    >
    > Why does it need this functionality? I have never needed it. If I want to
    > track how much space I've requested, I do so -- and the rest of the time I
    > don't pay the (substantial) overhead.
    >


    To avoid buffer overflows it is nice to know the size of your buffer
    isn't it?

    :)
    jacob navia, Sep 17, 2009
    #13
  14. Seebs <> writes:
    > On 2009-09-17, William Hughes <> wrote:
    >> Seems to me the best approach is to add
    >> optional arguments and default values to C.

    >
    > I think this would be pretty horrible.
    >
    >> This would not break backward compatibility if
    >> we require that a library function called without
    >> optional arguments must have identical behaviour
    >> as before.

    >
    > How would we know it had been called without the optional arguments?
    >
    > In the existing world, "optional arguments" have always been marked by
    > a previous argument -- you have to pass in an argument which tells you
    > that there are more arguments coming.
    >
    > You can't do that to free, though.

    [...]

    I don't think it would be unreasonable to provide arguments with
    default values. If you don't pass the argument, the parameter gets
    the default value.

    For example, this could potentially replace putc and putchar:

    int new_putc(int c, FILE *stream = stdout);

    so that new_putc('x') is exactly equivalent to new_putc('x', stdout).

    >> I am not sure if the gain in worth it in this
    >> specific case. On the
    >> other hand I think optional arguments are
    >> a great idea.

    >
    > They might be, except a large number of ABIs make no provision for a way
    > to tell whether or not they've been passed. The best I can think of is
    > that you'd have to always pass the "default" value implicitly, but that's
    > a pretty big change and imposes substantial costs too. And honestly,
    > I'm not sure I think it's necessarily an improvement. I've found that
    > "default arguments" seem to more often hide errors than be useful to me.


    I don't see what "substantial costs" it would impose, other than that
    the code always passes the extra arguments.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Sep 17, 2009
    #14
  15. Jef Driesen

    Seebs Guest

    On 2009-09-17, Keith Thompson <> wrote:
    > I don't see what "substantial costs" it would impose, other than that
    > the code always passes the extra arguments.


    Hmm.

    Well, let's see. I guess the compiler would just translate
    new_putc(x) into new_putc(x, stdout)... That seems like it's asking
    for trouble, but I agree that it's not immediately obvious.

    Although...

    What are the allowable forms of the "default"? Constants? Well, stdout isn't
    a constant. Expressions?

    int why_do_you_torment_me_so(int *a,
    int b = a[0],
    int c = (int) sqrt((double) a[1]),
    int d = (a && a[0]) ? array_sum(a + 1, a[0]) : -23);

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Sep 18, 2009
    #15
  16. Jef Driesen

    websnarf Guest

    On Sep 17, 2:38 am, "christian.bau" <>
    wrote:
    > On Sep 17, 8:32 am, Jef Driesen <>
    > wrote:
    > > I'm implementing a buffer that grows automatically when appending or
    > > prepending data. To improve performance, I'm looking into a buffer
    > > growing strategy to double the memory (or more generally multiply with a
    > > FACTOR > 1). It seems there are a number of variations in use:

    >
    > > 1. Try to increase the current size once, and use the requested size 'n'
    > > if that is not enough.

    >
    > > newsize = oldsize * FACTOR;
    > > if (n > newsize)
    > >     newsize = n;


    Indeed, this retains the guarantee that the number of reallocs is O(log
    (FACTOR, MAXLENGTH)) and its a finite cost to calculate the size of
    the new size itself, so long as you never *shrink* the memory size of
    the buffer. This strategy may save your maximum memory usage if your
    largest allocation comes before any allocations at least half its
    size.

    > > 2. Increase the current size until the new size is large enough. (I'm
    > > aware that I have to take extra care in case oldsize is zero, or with
    > > multiplications with a non-integer FACTOR.)

    >
    > > newsize = oldsize;
    > > while (n > newsize)
    > >     newsize *= FACTOR;


    In practice this does just as well as above, its just that a few cases
    where you do a few too many multiplies. It also doesn't deal with the
    start up case. But its all window dressing. This strategy will tend
    to have fewer overall calls to realloc if your memory allocations tend
    to all be roughly the same size.

    > > 3. Similar to #2, but always using a fixed blocksize instead of the
    > > current size. (Typically BLOCKSIZE=1 and FACTOR=2, such that the
    > > allocated buffer is always a power of two.)

    >
    > > newsize = BLOCKSIZE;
    > > while (n > newsize)
    > >     newsize *= FACTOR;


    Oh -- this performs shrinks, because it doesn't retain the oldsize as
    a lower bound. The problem is that depending on your implementation
    of realloc() this can either cause your system to thrash (see exercise
    below) or it will behave as well as the two good solutions you pose
    above. The number of real data moves depends on whether or not the
    size oscillates between values that are in different FACTOR power
    ranges. The solutions above actually guarantee that the number of
    data moves is finite and uses no more than twice (assuming FACTOR = 2)
    your memory bandwidth for the largest allocation.

    > > Which of these methods is preferred, and for what reason? Or is that
    > > just a matter of personal preference?


    Between the first two is a matter of personal preferences. You may be
    able to see a slight advantage of one over the other depending on the
    characteristics of your application, but both are suitable no matter
    what. The third strategy is an example of what not to do.

    > A very simple solution is to just call realloc () for the new size
    > whenever the size changes, and hope that the realloc implementation is
    > efficient. Which it usually is. I just checked that calling realloc
    > for the same pointer with all sizes from 1 byte to 1 billion bytes
    > took just 55 nanoseconds per realloc on a three year old MacBook. In
    > this case, anything more complex than calling realloc seems a waste of
    > programmers time.


    Exercise to the reader: a) Show that this solution is can be as bad as
    O(MAXLENGTH**2) in terms of memory copy performance. b) Why would a
    system vendor implement a tight and "chop-and-reliquish" strategy for
    realloc()? Is it possible for a compiler vendor to implement a realloc
    () that guarantees a O(log(MAXLENGTH)) performance for this sort of
    problem while making efficient use of memory?

    --
    Paul Hsieh
    http://www.pobox.com/~qed/
    http://bstring.sf.net/
    websnarf, Sep 18, 2009
    #16
  17. Seebs <> writes:
    > On 2009-09-17, Keith Thompson <> wrote:
    >> I don't see what "substantial costs" it would impose, other than that
    >> the code always passes the extra arguments.

    >
    > Hmm.
    >
    > Well, let's see. I guess the compiler would just translate
    > new_putc(x) into new_putc(x, stdout)... That seems like it's asking
    > for trouble, but I agree that it's not immediately obvious.


    There's certainly plenty of precedent for it in other languages.

    > Although...
    >
    > What are the allowable forms of the "default"? Constants? Well,
    > stdout isn't a constant. Expressions?
    >
    > int why_do_you_torment_me_so(int *a,
    > int b = a[0],
    > int c = (int) sqrt((double) a[1]),
    > int d = (a && a[0]) ? array_sum(a + 1, a[0]) : -23);


    In C++, the default argument "is type checked at the time of
    the function declaration and evaluated at the time of the call"
    (quoting Stroustrup, The C++ Programming Language, 3rd edition).
    That seems to me like a reasonable rule (and there's *some* virtue
    in being compatible with C++). Default argument expressions may
    not refer to other parameters.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    Nokia
    "We must do something. This is something. Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
    Keith Thompson, Sep 18, 2009
    #17
  18. On Sep 17, 5:47 pm, Seebs <> wrote:
    > On 2009-09-17, William Hughes <> wrote:
    >
    > > Seems to me the best approach is to add
    > > optional arguments and default values to C.

    >
    > I think this would be pretty horrible.
    >
    > > This would not break backward compatibility if
    > > we require that a library function called without
    > > optional arguments must have identical behaviour
    > > as before.

    >
    > How would we know it had been called without the optional arguments?


    By count.

    Possible rules.
    Optional arguments have to come last and must have default
    values assigned. Optional arguments and "..." cannot be used
    in the same function. A function for which you cannot tell
    from the prototype how many non optional arguments
    there are cannot have optional arguments.
    Non-optional arguments must be passed.
    Optional arguments are detected by count and any optional
    argument that is not passed gets the default value.


    >
    > In the existing world, "optional arguments" have always been marked by
    > a previous argument -- you have to pass in an argument which tells you
    > that there are more arguments coming.
    >


    False. See e.g. Python.


    > You can't do that to free, though.
    >
    > > I am not sure if the gain in worth it in this
    > > specific case.  On the
    > > other hand I think optional arguments are
    > > a great idea.

    >
    > They might be, except a large number of ABIs make no provision for a way
    > to tell whether or not they've been passed.  The best I can think of is
    > that you'd have to always pass the "default" value implicitly, but that's
    > a pretty big change and imposes substantial costs too.


    I don't see this. No deep change is made, a function with
    optional arguments acts just like a function with a fixed
    number (non-optional plus optional) of arguments.
    If values for the optional arguments are not provided
    in the code then the default values are passed.
    (On the other hand, *any* change imposes substantial costs).

    >  And honestly,
    > I'm not sure I think it's necessarily an improvement.  I've found that
    > "default arguments" seem to more often hide errors than be useful to me.
    >


    My experience is exactly opposite. I find optional arguments
    very useful in modifying and extending existing code and I do not
    find them any more error prone than non-optional arguments.

    - William Hughes
    William Hughes, Sep 18, 2009
    #18
  19. Jef Driesen

    Alan Curry Guest

    default arguments (was Re: Buffer growing strategy?)

    In article <>,
    William Hughes <> wrote:
    >
    >Possible rules.
    >Optional arguments have to come last and must have default
    >values assigned. Optional arguments and "..." cannot be used
    >in the same function. A function for which you cannot tell
    >from the prototype how many non optional arguments
    >there are cannot have optional arguments.
    >Non-optional arguments must be passed.
    >Optional arguments are detected by count and any optional
    >argument that is not passed gets the default value.
    >


    void foo(int x, int y, int z=0) { ... }
    void bar(int x, int y, int z=1) { ... }

    typedef void (*func_taking_3_ints)(int, int, int);

    func_taking_3_ints p1 = rand()%1 ? foo : bar;
    (*p1)(0, 0);

    What happens? Does the caller have to figure out which function is being
    called, and provide the default value for the missing optional argument? Or
    do you just make optional arguments mandatory when calling a function through
    a pointer? And how would you do that, since even a plain function name
    evaluates as a pointer? There's no such thing as a non-pointer function call,
    so you'll have to invent one.

    Or does the default value figure into the function's type, making it illegal
    to assign foo or bar to p1? How could I declare a pointer to a function with
    default argument? Could be something like this I suppose:

    void (*p1)(int, int, int=0) = foo; /* OK */
    p1 = bar; /* Mismatch! */

    --
    Alan Curry
    Alan Curry, Sep 18, 2009
    #19
  20. Jef Driesen

    Seebs Guest

    On 2009-09-18, William Hughes <> wrote:
    > On Sep 17, 5:47 pm, Seebs <> wrote:
    >> In the existing world, "optional arguments" have always been marked by
    >> a previous argument -- you have to pass in an argument which tells you
    >> that there are more arguments coming.


    > False. See e.g. Python.


    I was talking specifically about C. There are several examples of
    functions in C which take variable numbers of arguments, but in every
    case, you can tell what arguments are coming by looking at existing
    arguments.

    > I don't see this. No deep change is made, a function with
    > optional arguments acts just like a function with a fixed
    > number (non-optional plus optional) of arguments.
    > If values for the optional arguments are not provided
    > in the code then the default values are passed.


    Hmm.

    It's an interesting thought, I guess it just seems like a pretty big
    change -- in particular, a ton of simple functions might suddenly start
    having additional overhead, having to change calling sequences, and so
    on.

    At the very least, you would be pretty much forced to recompile everything.
    You couldn't be compatible with existing binaries which called free() without
    serious hassle.

    > My experience is exactly opposite. I find optional arguments
    > very useful in modifying and extending existing code and I do not
    > find them any more error prone than non-optional arguments.


    I guess I just like prototypes which catch common errors, such as forgetting
    to specify something...

    -s
    --
    Copyright 2009, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    Seebs, Sep 18, 2009
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Philippe Poulard

    [ I/O ] Growing buffer

    Philippe Poulard, Aug 1, 2005, in forum: Java
    Replies:
    7
    Views:
    412
    Philippe Poulard
    Aug 5, 2005
  2. Raja
    Replies:
    12
    Views:
    24,327
    John Harrison
    Jun 21, 2004
  3. Replies:
    2
    Views:
    579
    sergejusz
    Mar 26, 2007
  4. Neal Becker

    buffer creates only read-only buffer?

    Neal Becker, Jan 8, 2009, in forum: Python
    Replies:
    0
    Views:
    394
    Neal Becker
    Jan 8, 2009
  5. xingye
    Replies:
    9
    Views:
    252
    Michael Lu
    Apr 19, 2004
Loading...

Share This Page