List as a dynamic array of increasing-sized arrays

Discussion in 'C Programming' started by MartinBroadhurst, Oct 21, 2010.

  1. In my experience, the most common data structures used for lists
    supporting addition and removal at the tail, and random access, are
    linked lists and dynamic arrays.

    * Linked lists require following pointers for traversal (3 lookups per
    element, poor cache performance), and have O(n) indexed access.
    Adding and removing elements cause many small allocations and
    deallocations of memory.
    * Dynamic arrays have efficient traversal (1 lookup per element, good
    cache performance) and indexed access is O(1).
    Adding elements requires reallocation when full, however, which
    involves copying all elements.

    Instead, I should like to propose a structure consisting of a dynamic
    array of increasing-sized arrays:

    0 -> 0 1 2 3
    1 -> 4 5 6 7
    2 -> 8 9 10 11 12 13 14 15
    3 -> 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

    * Each array after the first two is twice the size of its predecessor.
    This means that the structure has the same growth strategy as most
    dynamic arrays (double when full), reducing the number of
    allocations.
    * As new arrays are appended to the end, no copying is required,
    except for the pointers when the containing array is resized.
    * The regular increase in the size of the arrays makes indexed access
    efficient.

    I have placed the source code for an implemenation here:
    http://www.martinbroadhurst.com/c/list.html

    To do:
    * Allow specification of a starting capacity.
    * The algorithm for finding the block for an index should use log2,
    not repeated multiplication and addition.
    * The block at the tail should not be freed as soon as it becomes
    empty, as this means that a list whose population oscillates around a
    power of 2 will repeatedly allocate and deallocate that block.
    Instead, the block should be freed when some number of elements from
    the next block have been removed, introducing a bit of "hysteresis".

    Martin
     
    MartinBroadhurst, Oct 21, 2010
    #1
    1. Advertising

  2. On 21 Oct, 18:01, Armand </> wrote:
    > Nice iterators.


    Thanks, Armand. I think iterators are more useful, though, when
    they're "polymorphic", and can decouple collection types from
    consumers.
    I wrote a polymorphic iterator with just a simple "get" function that
    wraps data structure-specific implementations:
    http://www.martinbroadhurst.com/source/iterator.h.html
    http://www.martinbroadhurst.com/source/iterator.c.html
    If I find the time I'm going to unify this with the list one,
    providing reverse traversal, random access and eof/bof. Reverse
    traversal and random-access can be optional for data structures that
    don't support them.
     
    MartinBroadhurst, Oct 21, 2010
    #2
    1. Advertising

  3. On 21 Oct, 18:51, (Richard Harter) wrote:
    >
    > You've reinvented obstacks.  Good show.
    >
    >


    Thanks, Richard. I've heard of obstacks, but was unsure from the GNU
    documentation how they are implemented.

    Martin
     
    MartinBroadhurst, Oct 21, 2010
    #3
  4. MartinBroadhurst

    Max Priority

    Joined:
    Oct 21, 2010
    Messages:
    1
    typedef struct {
    Dynarray *blocks;
    unsigned int lastsize; /* Size of last block */
    unsigned int lastcount; /* Number of elements in the last block */
    unsigned int count;
    } List;

    You don't need to keep track of the count of elements.
    Knowing lastsize and lastcount allows you to calculate it.

    MP
     
    Max Priority, Oct 21, 2010
    #4
  5. MartinBroadhurst

    Gene Guest

    On Oct 20, 7:02 pm, MartinBroadhurst <>
    wrote:
    > In my experience, the most common data structures used for lists
    > supporting addition and removal at the tail, and random access, are
    > linked lists and dynamic arrays.
    >
    > * Linked lists require following pointers for traversal (3 lookups per
    > element, poor cache performance), and have O(n) indexed access.
    >   Adding and removing elements cause many small allocations and
    > deallocations of memory.
    > * Dynamic arrays have efficient traversal (1 lookup per element, good
    > cache performance) and indexed access is O(1).
    >   Adding elements requires reallocation when full, however, which
    > involves copying all elements.
    >
    > Instead, I should like to propose a structure consisting of a dynamic
    > array of increasing-sized arrays:
    >
    > 0 ->  0  1  2  3
    > 1 ->  4  5  6  7
    > 2 ->  8  9 10 11 12 13 14 15
    > 3 -> 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
    >
    > * Each array after the first two is twice the size of its predecessor.
    >   This means that the structure has the same growth strategy as most
    > dynamic arrays (double when full), reducing the number of
    > allocations.
    > * As new arrays are appended to the end, no copying is required,
    > except for the pointers when the containing array is resized.
    > * The regular increase in the size of the arrays makes indexed access
    > efficient.
    >
    > I have placed the source code for an implemenation here:http://www.martinbroadhurst.com/c/list.html
    >


    This is pretty close to a VList: http://en.wikipedia.org/wiki/VList
    Though they were originally defined as immutable, this was primarily
    for thread safety reasons. Mutable VLists work fine.
     
    Gene, Oct 23, 2010
    #5
  6. On 23 Oct, 18:40, (Richard Harter) wrote:

    Thanks very much for reading it, Richard.

    >
    > The link in your post isn't right; it should have been
    > <http://www.martinbroadhurst.com/c/list.c.html>.
    >


    My link was to a page containing the source for the dynarray that
    contains the list, as well as the list itself. Apologies if this was
    confusing.

    > Your design is good for
    > arrays, not so good for linked lists.


    Yes, absolutely, it's an "iterable stack". Inserting in the middle
    would be O(n) theoretically, and in practice worse than for a dynamic
    array since there would be several block shifts rather than just one.

    Perhaps I misnamed it. I'm not sure if the word "list" has as precise
    a meaning in computer science as "stack" or "queue" do. I tend to
    think of a list as an ordered collection that supports, at a minimum,
    adding at the tail.

    One could, however, *embed* a linked list into the structure, by
    making it a collection of linked list nodes rather than void
    pointers.

    I wrote an implementation of a linked list embedded in an ordinary
    dynamic array a while ago, my rationale being that it might provide
    better locality of reference and more efficient allocation than one
    based on individually allocated nodes.

    Embedding a linked list into this structure, however, would remove the
    copying problem associated with growing dynamic arrays.
    Further, when a linked list is embedded in a dynamic array, the next
    and previous pointers need to be integer offsets into the array buffer
    rather than pointers, since otherwise they would be invalidated by
    reallocation. As the "array-of-arrays" structure doesn't reallocate,
    previous and next can be pointers.

    Martin
     
    MartinBroadhurst, Oct 24, 2010
    #6
  7. On 23 Oct, 21:01, Gene <> wrote:
    >
    > This is pretty close to a VList:  http://en.wikipedia.org/wiki/VList
    > Though they were originally defined as immutable, this was primarily
    > for thread safety reasons.  Mutable VLists work fine.- Hide quoted text -
    >


    Thanks for the pointer, Gene. I'll read Bagwell's paper.

    Martin
     
    MartinBroadhurst, Oct 24, 2010
    #7
  8. On 21 Oct, 00:02, MartinBroadhurst <>
    wrote:
    >


    Below are the timings in seconds for inserting 50,000 elements into
    various sequential data structures. These are on an Intel Celeron
    667MHz with 256MB of RAM, running Ubuntu Server 10.10 with glibc
    version 2.10.1.

    Doubly-linked list 0.022233
    Singly-linked list 0.021964
    Doubly-linked list in dynamic array 0.014827
    Deque 0.012803
    Dynamic array 0.006111
    Dynamic array of increasing-sized arrays 0.005775

    The dynamic array of increasing-sized arrays was a little over 4%
    faster across the range 50,000 to 250,000 elements. Dynamic array was
    faster for fewer or more elements than this.

    Full result is here:

    http://www.martinbroadhurst.com/c/list.html#performance

    Martin
     
    MartinBroadhurst, Oct 26, 2010
    #8
  9. MartinBroadhurst

    BartC Guest

    "MartinBroadhurst" <> wrote in message
    news:...
    > On 21 Oct, 00:02, MartinBroadhurst <>
    > wrote:
    >>

    >
    > Below are the timings in seconds for inserting 50,000 elements into
    > various sequential data structures. These are on an Intel Celeron
    > 667MHz with 256MB of RAM, running Ubuntu Server 10.10 with glibc
    > version 2.10.1.
    >
    > Doubly-linked list 0.022233
    > Singly-linked list 0.021964
    > Doubly-linked list in dynamic array 0.014827
    > Deque 0.012803
    > Dynamic array 0.006111
    > Dynamic array of increasing-sized arrays 0.005775
    >
    > The dynamic array of increasing-sized arrays was a little over 4%
    > faster across the range 50,000 to 250,000 elements. Dynamic array was
    > faster for fewer or more elements than this.
    >
    > Full result is here:
    >
    > http://www.martinbroadhurst.com/c/list.html#performance


    You say "inserting" 50000 elements above. Yet the link says the elements are
    added to the tail of each list.

    I thought the dynamic array timing seemed pretty good.

    Try inserting elements at the head of the list, or in the middle.

    --
    Bartc
     
    BartC, Oct 26, 2010
    #9
  10. On 26 Oct, 16:40, "BartC" <> wrote:
    >
    > You say "inserting" 50000 elements above. Yet the link says the elements are
    > added to the tail of each list.
    >


    My mistake, I meant inserting in the sense of inserting into a stack.

    > I thought the dynamic array timing seemed pretty good.
    >


    I also. I had been led to believe that dynamic arrays would suffer
    from excessive copying on reallocation by things like this -

    http://www.drdobbs.com/architecture...ionid=KB05E2AGD4CNPQE1GHRSKHWATMY32JVN?pgno=5

    - but, at least with glibc, and in that size range, that's not the
    case.

    > Try inserting elements at the head of the list, or in the middle.
    >


    As I said in my original post, it's designed for adding at the tail. I
    wouldn't expect it to compare favourably with a dynamic array or
    linked list for insertion in the middle. My thinking was that it would
    be a good data structure when you have an unknown amount of sequential
    data and simply want to store it and read it back.

    I *would* expect the linked list embedded in an array to be good for
    insertion, though.

    Martin
     
    MartinBroadhurst, Oct 26, 2010
    #10
  11. MartinBroadhurst

    Seebs Guest

    On 2010-10-29, Trevor <> wrote:
    > They love to smack down the newbie with their superior *knowledge", but have
    > they actually *found out*?
    > Realloc *hardly ever* copies.


    This, like everything else you've posted, is a gross oversimplification.

    But wait! I can simplify this further: You're rude, you're arrogant,
    and you don't seem to be able to disagree in a way suited to someone
    who's gotten to his adult teeth yet.

    *plonk*

    -s
    --
    Copyright 2010, all wrongs reversed. Peter Seebach /
    http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
    http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
    I am not speaking for my employer, although they do rent some of my opinions.
     
    Seebs, Oct 29, 2010
    #11
  12. On 29 Oct, 22:04, "Trevor" <> wrote:
    >
    > You've invented a "new" data structure, but you've solved a problem that
    > doesn't exist!
    >


    That's pretty much the conclusion I came to on the basis of the
    empirical evidence I have.

    >
    > http://groups.google.co.uk/group/co...uk/group/comp.lang.c/browse_thread/thread/3e7...
    >


    I had read the first thread you mention, not the second. Things like
    this were my motivation in looking for a better structure than a
    dynamic array.

    > They love to smack down the newbie with their superior *knowledge", but have
    > they actually *found out*?


    They were probably speaking in general terms, or on the basis of their
    experience of particular platforms on which they had seen a problem
    with dynamic arrays.

    > Realloc *hardly ever* copies.
    >


    My experiments showed that it copied about 20% of the time, but this
    was just glibc. For all I know, glibc may be optimised for geometric
    expansion. After all, it's being used for std::vector(T>s in C++.
    Other allocators may be optimised for quite different scenarios. It is
    also important *which* 20% of the reallocs result in a copy. If it's
    the larger ones, this will have a greater impact.

    > 1) If you only want to add at the end, use a dynamic array
    > 2) If you need to do random inserts, use a linked list, but (as you've
    > discovered), embed it in a dynamic array
    >


    Again, that would be a valid conclusion from my results, reiterating
    though that they were using one particular allocator.

    There are, however, other considerations:

    1) If your memory is half-used by a dynamic array, and you need to add
    one more element, you will run out of memory. A linked list will
    handle this fine.
    It's easy to be complacent about memory usage when you're on a
    platform with virtual memory, but not all platforms have this.

    2) Dynamic arrays use up to twice as much memory as is needed at any
    one time. Memory used in linked lists is in a constant ratio with the
    elements. If the linked list is an "intrusive" one, where nodes
    contain a significant amount of data, this ratio will be much more
    favourable.

    3) Data in a list can be highly structured, with elements containing
    pointers to other elements. These pointers will be invalidated by
    reallocation.
    You could solve this problem by using the array-of-arrays structure I
    began this thread by describing, though.

    4) Some people favour the "open-work" interface exposed by linked
    lists rather than using a more general container. It is handy to be
    able to just have a node, and be able to tack another one on without
    needing to have access to the container and go through its interface.

    5) Linked lists can "recycle" nodes, by keeping nodes from removed
    elements in a chain and using them preferentially when a new node is
    required. This will mean less allocation and deallocation in a list
    that changes frequently.

    Martin
     
    MartinBroadhurst, Oct 30, 2010
    #12
  13. On Oct 30, 9:56 pm, "Scouser" <> wrote:
    >
    > They do that though, don't they?
    >
    > Scouser


    I'm gutted, I am. I'm as sick as a parrot.

    Martin
     
    MartinBroadhurst, Oct 30, 2010
    #13
  14. On 29 Oct, 21:04, "Trevor" <> wrote:
    > 2) If you need to do random inserts, use a linked list, but (as you've
    > discovered), embed it in a dynamic array
    >


    You've reminded me of something I did a while ago.
    If you have a linked list embedded in an array and you remove an
    element, you can move the last element in the buffer into its place.
    This means that:

    1. You don't need a free node list
    2. You don't need to keep track of the used range of the buffer
    because used = count
    3. Unordered traversal (like when you just want to call a function on
    each element in any order) is just array traversal, because the
    elements in the buffer are dense

    Linked list embedded in an array with compaction:
    http://www.martinbroadhurst.com/art...ist-embedded-in-an-array-with-compaction.html

    Martin
     
    MartinBroadhurst, Nov 3, 2010
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. thechaosengine

    Oddly sized sized password textbox

    thechaosengine, Sep 15, 2005, in forum: ASP .Net
    Replies:
    1
    Views:
    607
    David Hearn
    Sep 15, 2005
  2. Chris Johnson

    Global variable-sized arrays

    Chris Johnson, Nov 23, 2004, in forum: C Programming
    Replies:
    5
    Views:
    403
    Mark A. Odell
    Nov 23, 2004
  3. Replies:
    4
    Views:
    332
    Kai-Uwe Bux
    Dec 13, 2005
  4. Daniel T.

    Dynamic sized array?

    Daniel T., Oct 17, 2006, in forum: C++
    Replies:
    4
    Views:
    340
    Sumit Rajan
    Oct 17, 2006
  5. joe
    Replies:
    1
    Views:
    370
    Goran
    Aug 31, 2011
Loading...

Share This Page