Deleting from a list while iterating

Discussion in 'Python' started by Rhamphoryncus, Dec 3, 2006.

  1. The problems of this are well known, and a suggestion for making this
    easier was recently posted on python-dev. However, I believe this can
    be done just as well without a change to the language. What's more,
    most of the suggested methods (in my search results as well as the
    suggestion itself) do not scale well, which my approach would solve.

    My approach is to make a set of indexes to removed while iterating,
    then use a list comprehension to filter them out after. Timings of
    this and two other common approaches follow:

    setapproach = """\
    def func(count):
    from random import random
    items = [random() for i in xrange(count)]
    remove = set()
    for index, x in enumerate(items):
    #...do something...
    if x < 0.5:
    remove.add(index)
    items = [x for index, x in enumerate(items) if index not in remove]
    """

    copyapproach = """\
    def func(count):
    from random import random
    items = [random() for i in xrange(count)]
    for x in items[:]:
    if x < 0.5:
    items.remove(x)
    """

    reverseapproach = """\
    def func(count):
    from random import random
    items = [random() for i in xrange(count)]
    for index in range(len(items) - 1, -1, -1):
    if items[index] < 0.5:
    del items[index]
    """

    >>> import timeit
    >>> timeit.Timer(stmt='func(1000)', setup=setapproach).timeit(1)

    0.0016040802001953125
    >>> timeit.Timer(stmt='func(1000)', setup=copyapproach).timeit(1)

    0.0085191726684570312
    >>> timeit.Timer(stmt='func(1000)', setup=reverseapproach).timeit(1)

    0.0011308193206787109

    >>> timeit.Timer(stmt='func(10000)', setup=setapproach).timeit(1)

    0.021183013916015625
    >>> timeit.Timer(stmt='func(10000)', setup=copyapproach).timeit(1)

    1.0268981456756592
    >>> timeit.Timer(stmt='func(10000)', setup=reverseapproach).timeit(1)

    0.038264989852905273

    >>> timeit.Timer(stmt='func(100000)', setup=setapproach).timeit(1)

    0.23896384239196777
    >>> timeit.Timer(stmt='func(100000)', setup=copyapproach).timeit(1)

    274.57498288154602
    >>> timeit.Timer(stmt='func(100000)', setup=reverseapproach).timeit(1)

    2.2382969856262207

    As you can see, although reverse iteration is somewhat faster at
    smaller sizes, a set is substantially faster at larger sizes, and I
    believe is more readable anyway. Copying shouldn't even be considered
    unless you know the size will always be trivial (< 1000).

    I'm sure there's a few other approaches that would do even better under
    certain conditions. One is a generator, if your input and output
    should both be iterators. Another is using slicing to move contiguous
    sections of retained items over the removed items. I leave both of
    these as an exercise for the reader.

    --
    Adam Olsen, aka Rhamphoryncus
    Rhamphoryncus, Dec 3, 2006
    #1
    1. Advertising

  2. In <>, Rhamphoryncus
    wrote:

    > My approach is to make a set of indexes to removed while iterating,
    > then use a list comprehension to filter them out after. Timings of
    > this and two other common approaches follow:
    >
    > setapproach = """\
    > def func(count):
    > from random import random
    > items = [random() for i in xrange(count)]
    > remove = set()
    > for index, x in enumerate(items):
    > #...do something...
    > if x < 0.5:
    > remove.add(index)
    > items = [x for index, x in enumerate(items) if index not in remove]
    > """


    Why do you make it that complicated? If you are going to build a new list
    anyway, this can be done without the `set()` and just one listcomp:

    items = [x for x in items if x < 0.5]

    No need to iterate twice over the `items`. The two other approaches you
    gave are just needed if it's important that the elements are deleted "in
    place", i.e. that you don't rebind `items` to a new object.

    Ciao,
    Marc 'BlackJack' Rintsch
    Marc 'BlackJack' Rintsch, Dec 3, 2006
    #2
    1. Advertising

  3. Rhamphoryncus wrote:

    > As you can see, although reverse iteration is somewhat faster at
    > smaller sizes, a set is substantially faster at larger sizes, and I
    > believe is more readable anyway.


    your set approach doesn't modify the list in place, though; it creates
    a new list, in a rather roundabout way. if modification in place isn't
    important, the normal way is of course to create a *new* list:

    items = [i for i in items if not i < 0.5]

    on my machine, that's about two orders of magnitude faster than your
    "fast" approach for n=100000.

    (or twice as fast, if I don't factor out the time it takes to *create*
    the original list from the benchmark).

    </F>
    Fredrik Lundh, Dec 3, 2006
    #3
  4. Fredrik Lundh wrote:

    > on my machine, that's about two orders of magnitude faster than your
    > "fast" approach for n=100000.


    oops. forget that; it's three times faster, if you're actually creating
    the entire list, and not just a generator that will create it on demand ;-)

    </F>
    Fredrik Lundh, Dec 3, 2006
    #4
  5. Marc 'BlackJack' Rintsch wrote:

    > No need to iterate twice over the `items`. The two other approaches you
    > gave are just needed if it's important that the elements are deleted "in
    > place", i.e. that you don't rebind `items` to a new object.


    and even when you do, that can often be written as, e.g:

    items[:] = (i for i in items if not i < 0.5)

    </F>
    Fredrik Lundh, Dec 3, 2006
    #5
  6. Rhamphoryncus schrieb:
    > setapproach = """\
    > def func(count):
    > from random import random
    > items = [random() for i in xrange(count)]
    > remove = set()
    > for index, x in enumerate(items):
    > #...do something...
    > if x < 0.5:
    > remove.add(index)
    > items = [x for index, x in enumerate(items) if index not in remove]
    > """


    This is different from the other approaches in that it doesn't
    modify items. If you wanted a new list, you could incrementally
    build one already in the first pass, no need to collect the
    indices first (as BlackJack explains).

    If you wanted in-place modification, I'd do

    newitems = []
    for x in items:
    if not (x < 0.5):
    newitems.append(x)
    items[:] = newitems

    > copyapproach = """\
    > def func(count):
    > from random import random
    > items = [random() for i in xrange(count)]
    > for x in items[:]:
    > if x < 0.5:
    > items.remove(x)
    > """


    This happens to work for your example, but is incorrect
    in the general case: you meant to write

    del items[i+removed]

    here, as .remove(x) will search the list again, looking
    for the first value to remove. If your condition for
    removal happens to leave some items equal to x in the
    list, it would remove the wrong item. Because the numbering
    changes while the iteration is in progress, you have
    to count the number of removed items also.

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Dec 3, 2006
    #6
  7. Marc 'BlackJack' Rintsch wrote:
    > Why do you make it that complicated? If you are going to build a new list
    > anyway, this can be done without the `set()` and just one listcomp:


    Fredrik Lundh wrote:
    > your set approach doesn't modify the list in place, though; it creates
    > a new list, in a rather roundabout way. if modification in place isn't
    > important, the normal way is of course to create a *new* list:
    >
    > items = [i for i in items if not i < 0.5]
    >
    > on my machine, that's about two orders of magnitude faster than your
    > "fast" approach for n=100000.


    Sorry, I should have clarified that the original post assumed you
    needed info from the "do something" phase to determine if an element is
    removed or not. As you say, a list comprehension is superior if that
    is not necessary.

    --
    Adam Olsen, aka Rhamphoryncus
    Rhamphoryncus, Dec 3, 2006
    #7
  8. Rhamphoryncus wrote:

    > Sorry, I should have clarified that the original post assumed you
    > needed info from the "do something" phase to determine if an element is
    > removed or not. As you say, a list comprehension is superior if that
    > is not necessary.


    that's spelled

    out = []
    for i in items:
    ... do something ...
    if i > 0.5:
    out.append(i)

    in Python, and is only a little slower than a list comprehension, as
    written above. if performance is really important, move the method
    lookup out of the loop:

    out = []
    append = out.append
    for i in items:
    ... do something ...
    if i > 0.5:
    append(i)

    </F>
    Fredrik Lundh, Dec 3, 2006
    #8
  9. Martin v. Löwis wrote:
    > Rhamphoryncus schrieb:
    > > setapproach = """\
    > > def func(count):
    > > from random import random
    > > items = [random() for i in xrange(count)]
    > > remove = set()
    > > for index, x in enumerate(items):
    > > #...do something...
    > > if x < 0.5:
    > > remove.add(index)
    > > items = [x for index, x in enumerate(items) if index not in remove]
    > > """

    >
    > This is different from the other approaches in that it doesn't
    > modify items. If you wanted a new list, you could incrementally
    > build one already in the first pass, no need to collect the
    > indices first (as BlackJack explains).


    I didn't feel this distinction was worth mentioning, since "oldlist[:]
    = newlist" is so trivial. The only solution that really avoids making
    a copy is the reverse-iteration one.


    > If you wanted in-place modification, I'd do
    >
    > newitems = []
    > for x in items:
    > if not (x < 0.5):
    > newitems.append(x)
    > items[:] = newitems


    I agree, that does seem simpler.


    > > copyapproach = """\
    > > def func(count):
    > > from random import random
    > > items = [random() for i in xrange(count)]
    > > for x in items[:]:
    > > if x < 0.5:
    > > items.remove(x)
    > > """

    >
    > This happens to work for your example, but is incorrect
    > in the general case: you meant to write
    >
    > del items[i+removed]
    >
    > here, as .remove(x) will search the list again, looking
    > for the first value to remove. If your condition for
    > removal happens to leave some items equal to x in the
    > list, it would remove the wrong item. Because the numbering
    > changes while the iteration is in progress, you have
    > to count the number of removed items also.


    I agree that the example I gave here sucks. However, I copied it from
    another posting as a recommended method to get removal to work *right*
    (without noticing how slow it is).

    There seems to have been many distinct approaches to this problem over
    the years. Hopefully we can converge on a single ideal solution (or a
    few situational ones) as TOOWTDI.

    --
    Adam Olsen, aka Rhamphoryncus
    Rhamphoryncus, Dec 3, 2006
    #9
  10. Rhamphoryncus schrieb:
    >> This is different from the other approaches in that it doesn't
    >> modify items. If you wanted a new list, you could incrementally
    >> build one already in the first pass, no need to collect the
    >> indices first (as BlackJack explains).

    >
    > I didn't feel this distinction was worth mentioning, since "oldlist[:]
    > = newlist" is so trivial. The only solution that really avoids making
    > a copy is the reverse-iteration one.


    The distinction is relevant for performance, as the slice assignment
    has a complexity linear with the number of elements.

    Whether making a copy is expensive depends on the precise data: the
    solution that makes the copy in reverse may have quadratic complexity
    (if the number of removed elements increases with the list size);
    the version that appends all remaining elements to a new list and
    then does slice assignment should behave better-than-quadratic
    (in recent versions, it should give you linear performance).

    Regards,
    Martin
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=, Dec 3, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Patrick von Harsdorf

    iterating over collection, deleting entries

    Patrick von Harsdorf, Apr 25, 2004, in forum: Python
    Replies:
    3
    Views:
    273
    Larry Bates
    Apr 26, 2004
  2. Fernando Rodríguez

    getting the index while iterating through a list

    Fernando Rodríguez, May 12, 2004, in forum: Python
    Replies:
    4
    Views:
    426
    Steven Rumbalski
    May 12, 2004
  3. Replies:
    4
    Views:
    393
    Steven D'Aprano
    Feb 26, 2007
  4. carl
    Replies:
    5
    Views:
    2,337
    James Kanze
    Nov 25, 2009
  5. marc magrans de abril
    Replies:
    6
    Views:
    471
Loading...

Share This Page