List Performance

Discussion in 'Python' started by Ampedesign, Jun 30, 2008.

  1. Ampedesign

    Ampedesign Guest

    If I happen to have a list that contains over 50,000 items, will the
    size of the list severely impact the performance of appending to the
    list?
     
    Ampedesign, Jun 30, 2008
    #1
    1. Advertising

  2. Ampedesign

    Peter Otten Guest

    Ampedesign wrote:

    > If I happen to have a list that contains over 50,000 items, will the
    > size of the list severely impact the performance of appending to the
    > list?


    No.

    $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    20000 loops, best of 3: 0.554 usec per loop
    $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    20000 loops, best of 3: 0.529 usec per loop

    http://wiki.python.org/moin/TimeComplexity

    Peter
     
    Peter Otten, Jun 30, 2008
    #2
    1. Advertising

  3. Le Monday 30 June 2008 09:23:46 Peter Otten, vous avez écrit :
    > Ampedesign wrote:
    > > If I happen to have a list that contains over 50,000 items, will the
    > > size of the list severely impact the performance of appending to the
    > > list?

    >
    > No.
    >
    > $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    > 20000 loops, best of 3: 0.554 usec per loop
    > $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    > 20000 loops, best of 3: 0.529 usec per loop


    But it surely could, if the box happens to be out of memory and begin to swap,
    while it's not, of course, an issue with python lists...

    --
    _____________

    Maric Michaud
     
    Maric Michaud, Jun 30, 2008
    #3
  4. Ampedesign

    Peter Otten Guest

    Larry Bates wrote:

    > Peter Otten wrote:
    >> Ampedesign wrote:
    >>
    >>> If I happen to have a list that contains over 50,000 items, will the
    >>> size of the list severely impact the performance of appending to the
    >>> list?

    >>
    >> No.
    >>
    >> $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    >> 20000 loops, best of 3: 0.554 usec per loop
    >> $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    >> 20000 loops, best of 3: 0.529 usec per loop
    >>
    >> http://wiki.python.org/moin/TimeComplexity
    >>
    >> Peter

    >
    > Peter,
    >
    > So its actually faster to append to a long list than an empty one? That
    > certainly would not have been intuitively obvious now would it?


    You shouldn't blindly trust the numbers.

    Here's what happens if I repeat the measurements a few times:

    $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    20000 loops, best of 3: 0.531 usec per loop
    $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    20000 loops, best of 3: 0.511 usec per loop
    $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    20000 loops, best of 3: 0.512 usec per loop
    $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    20000 loops, best of 3: 0.51 usec per loop
    $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    20000 loops, best of 3: 0.514 usec per loop
    $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    20000 loops, best of 3: 0.506 usec per loop
    $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    20000 loops, best of 3: 0.512 usec per loop
    $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    20000 loops, best of 3: 0.543 usec per loop
    $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    20000 loops, best of 3: 0.522 usec per loop
    $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    20000 loops, best of 3: 0.51 usec per loop

    The difference is within the error margin. All you can say is that both
    operations take roughly the same time.

    In general, if no error margin (e. g. 0.5+-0.1) is given that is always a
    warning sign, be it opinion polls or timeit output.

    Peter
     
    Peter Otten, Jun 30, 2008
    #4
  5. Larry Bates wrote:
    > [...]
    > So its actually faster to append to a long list than an empty one? That
    > certainly would not have been intuitively obvious now would it?


    Maybe not intuitively, but if you know how dynamically growing data
    structures are implemented, it's plausible. They overallocate, and the
    amount of overallocation is dependent on the current size. Relevant
    source snippet from Python 2.6:

    /* This over-allocates proportional to the list size, making room
    * for additional growth. The over-allocation is mild, but is
    * enough to give linear-time amortized behavior over a long
    * sequence of appends() in the presence of a poorly-performing
    * system realloc().
    * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
    */
    new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);

    If, on the other hand, we knew beforehand how big the list will get
    approximately, we could avoid all these reallocations. No problem with
    Python's C API:

    PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size);

    But you can't do it directly from Python, unless you (ab)use ctypes.

    -- Gerhard
     
    Gerhard Häring, Jun 30, 2008
    #5
  6. Le Monday 30 June 2008 15:13:30 Larry Bates, vous avez écrit :
    > Peter Otten wrote:
    > > Ampedesign wrote:
    > >> If I happen to have a list that contains over 50,000 items, will the
    > >> size of the list severely impact the performance of appending to the
    > >> list?

    > >
    > > No.
    > >
    > > $ python -m timeit -n20000 -s"items = []" "items.append(42)"
    > > 20000 loops, best of 3: 0.554 usec per loop
    > > $ python -m timeit -n20000 -s"items = [42]*10**6" "items.append(42)"
    > > 20000 loops, best of 3: 0.529 usec per loop
    > >
    > > http://wiki.python.org/moin/TimeComplexity
    > >
    > > Peter

    >
    > Peter,
    >
    > So its actually faster to append to a long list than an empty one? That
    > certainly would not have been intuitively obvious now would it?
    >


    That test only demonstrates that it's faster to append to a 1 million items
    list than an empty one (and this on a particular platform with a particular
    python version). Different sizes may give different result. I guess this is
    because of some internal optimisations (items are probably allocated by
    chunks, so sometimes append() involves a realloc, sometimes not).

    So the only thing you should remember is that list.append() has a complexity
    of O(1), and thus should be considered a constant time operation for any
    length. Just be aware of the note:

    [1] = These operations rely on the "Amortized" part of "Amortized Worst Case".
    Individual actions may take surprisingly long, depending on the history of
    the container.

    Also note that 50000 items is a lot for a human being, not for a modern
    computer.

    --
    Cédric Lucantis
     
    Cédric Lucantis, Jun 30, 2008
    #6
  7. Le Monday 30 June 2008 15:52:56 Gerhard Häring, vous avez écrit :
    > Larry Bates wrote:
    > > [...]
    > > So its actually faster to append to a long list than an empty one? That
    > > certainly would not have been intuitively obvious now would it?

    >
    > Maybe not intuitively, but if you know how dynamically growing data
    > structures are implemented, it's plausible. They overallocate, and the
    > amount of overallocation is dependent on the current size. Relevant
    > source snippet from Python 2.6:
    >
    > /* This over-allocates proportional to the list size, making room
    > * for additional growth. The over-allocation is mild, but is
    > * enough to give linear-time amortized behavior over a long
    > * sequence of appends() in the presence of a poorly-performing
    > * system realloc().
    > * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
    > */
    > new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
    >
    > If, on the other hand, we knew beforehand how big the list will get
    > approximately, we could avoid all these reallocations. No problem with
    > Python's C API:
    >
    > PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size);
    >
    > But you can't do it directly from Python, unless you (ab)use ctypes.
    >
    > -- Gerhard
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list


    Well, as I posted few days ago, one could envisage, as a pure python
    optimization for dealing with long list, to replace an algorithm with a lot
    of append by something like this :

    mark = object()

    datas = [ mark ] * expected_size

    # working with the datas while maintaining the effective currrently used size

    Of course one could even subclass list and redefine __len__, append, and some
    other methods to deal with this "allocated by block" list.


    --
    _____________

    Maric Michaud
     
    Maric Michaud, Jun 30, 2008
    #7
  8. Ampedesign

    Terry Reedy Guest

    Maric Michaud wrote:
    > Le Monday 30 June 2008 15:52:56 Gerhard Häring, vous avez écrit :
    >> Larry Bates wrote:


    >> If, on the other hand, we knew beforehand how big the list will get
    >> approximately, we could avoid all these reallocations. No problem with
    >> Python's C API:
    >>
    >> PyAPI_FUNC(PyObject *) PyList_New(Py_ssize_t size);
    >>
    >> But you can't do it directly from Python, unless you (ab)use ctypes.
    >>
    >> -- Gerhard
    >>
    >> --
    >> http://mail.python.org/mailman/listinfo/python-list

    >
    > Well, as I posted few days ago, one could envisage, as a pure python
    > optimization for dealing with long list, to replace an algorithm with a lot
    > of append by something like this :
    >
    > mark = object()
    >
    > datas = [ mark ] * expected_size


    datas = [None] * expected_size
    has been a standard idiom since before object() existed ;-)
    and works fine *unless* one wants to add None explicitly
    and have that be different from 'unused'.

    >
    > # working with the datas while maintaining the effective currrently used size
    >
    > Of course one could even subclass list and redefine __len__, append, and some
    > other methods to deal with this "allocated by block" list.


    An interesting idea if one does this at least a few times and wants to
    use .append and .extend instead of explicit indexing.

    One could also make such a subclass a 'no-grow' list if appropriate
    (when an attempt to grow it would indicate a bug).

    tjr
     
    Terry Reedy, Jun 30, 2008
    #8
  9. Le Monday 30 June 2008 22:21:35 Terry Reedy, vous avez écrit :
    > > Well, as I posted few days ago, one could envisage, as a pure python
    > > optimization for dealing with long list, to replace an algorithm with a
    > > lot of append by something like this :
    > >
    > > mark = object()
    > >
    > > datas = [ mark ] * expected_size

    >
    > datas = [None] * expected_size
    > has been a standard idiom since before object() existed ;-)
    > and works fine *unless* one wants to add None explicitly
    > and have that be different from 'unused'.



    Yes, in fact I used a marker because it I thought of it primarily as outbound
    for the list (like \0 for strings in C), but it doesnt' matter what is the
    object you put in the list, if you know at every moment its size.

    A subclass of list will indeed have to override most of the methods of its
    parent (not just some as I assumed before), using extend for reallocation
    with some sort of iterator with size, as it work in my previous example with
    xrange, something like that :

    >>>[31]: class iter_with_len(object) :

    def __init__(self, size, obj=None) :
    self.size = size
    self.obj = obj
    def __len__(self) : return self.size
    def __iter__(self) : return itertools.repeat(self.obj, len(self))
    ....:
    ....:




    --
    _____________

    Maric Michaud
     
    Maric Michaud, Jun 30, 2008
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. jm
    Replies:
    1
    Views:
    526
    alien2_51
    Dec 12, 2003
  2. roopa
    Replies:
    6
    Views:
    779
    Jerry Coffin
    Aug 27, 2004
  3. dackz
    Replies:
    0
    Views:
    502
    dackz
    Feb 6, 2007
  4. Debajit Adhikary
    Replies:
    17
    Views:
    707
    Debajit Adhikary
    Oct 18, 2007
  5. Software Engineer
    Replies:
    0
    Views:
    356
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page