How to del item of a list in loop?

  • Thread starter Reinhold Birkenfeld
  • Start date
R

Reinhold Birkenfeld

skull said:
Hi everybody, it is my first post in this newsgroup.
I am a newbie for python though I have several years development experience in c++.
recently, I was stumped when I tried to del item of a list when iteration.

here is the wrong way I did:

lst = [1, 2, 3]
for i in lst:
print i
if i == 2:
lst.remove(i)

the result is:

1
2
as you would see, '3' is missing. this problem is caused by 'lst.remove(i)'.
apparently, 'marked-and-sweep' is a solution to deal with this issue.
but I think there SHOULD BE more 'wise' trick. I want to get your help.

Quick solution:

for i in lst[:]

iterates over a copy.

Reinhold
 
F

Fredrik Lundh

skull said:
Hi everybody, it is my first post in this newsgroup.
I am a newbie for python though I have several years development experience in c++.
recently, I was stumped when I tried to del item of a list when iteration.

here is the wrong way I did:

lst = [1, 2, 3]
for i in lst:
print i
if i == 2:
lst.remove(i)

the result is:

1
2
as you would see, '3' is missing. this problem is caused by 'lst.remove(i)'.

the internal loop counter is incremented whether you remove stuff or not, so
when you're moving things around by removing items in the middle, the loop
will skip over items. this is basic for-in usage, and any decent tutorial should
explain this.


to fix this, you can build a new output list, loop backwards, or loop over a
copy. the last is easiest to implement:

lst = [1, 2, 3]
for i in lst[:]:
print i
if i == 2:
lst.remove(i)

but the others may have advantages in certain use cases.


if you want more details, see the language reference:

http://docs.python.org/ref/for.html

(note the warning)

</F>
 
R

Reinhold Birkenfeld

Reinhold said:
skull said:
Hi everybody, it is my first post in this newsgroup.
I am a newbie for python though I have several years development experience in c++.
recently, I was stumped when I tried to del item of a list when iteration.

here is the wrong way I did:

lst = [1, 2, 3]
for i in lst:
print i
if i == 2:
lst.remove(i)

the result is:

1
2
as you would see, '3' is missing. this problem is caused by 'lst.remove(i)'.
apparently, 'marked-and-sweep' is a solution to deal with this issue.
but I think there SHOULD BE more 'wise' trick. I want to get your help.

Quick solution:

for i in lst[:]

iterates over a copy.

Addition: In Py2.4, I can't find a problem with

for i in reversed(lst)

Any objections?

Reinhold
 
M

Mitja

lst = [1, 2, 3]
for i in lst:
print i
if i == 2:
lst.remove(i)

the result is:

1
2

As others have suggested, you can use a copy of the list.
Alternatively and depending on what you're trying to accomplish (how
complicated it is), lst = [i for i in lst if i!=2] might look better.
 
F

Fredrik Lundh

skull said:
It makes a copy operation!

so? in python, shallow copies are cheap.
here is a faster and 'ugly' solution:

faster? did you try it, or are you combining a C++ mindset with an
urge to do premature optimizations? (hint: it's slower)

if you care about performance, you shouldn't use "remove", btw. building
a new list is more efficient, especially if you use a list comprehension:

lst = [i for i in lst if i != 2]

(if you have 2.4, try replacing [] with () and see what happens)

</F>
 
M

Michael Hoffman

skull said:
but I still have an other thing to worry about coming with this way: does
performance sucks when the list is big enough?
It makes a copy operation!

here is a faster and 'ugly' solution:

lst = [1, 2, 3]
i = 0
while i < len(lst):
if lst == 2:
lst.remove(i)
else:
i += 1


Actually, that is the slowest of the three methods proposed so far for
large lists on my system.

import timeit

def method_copy(lst):
for i in lst[:]:
if i == 2:
lst.remove(i)
return lst

def method_listcomp(lst):
return [i for i in lst if i!=2]

def method_while(lst):
i = 0
while i < len(lst):
if lst == 2:
lst.remove(i)
else:
i += 1
return lst

lsts = dict(three=range(3),
hundred=range(100),
thousand=range(1000),
million=range(1000000))

if __name__ == "__main__":
for lst_name, lst in lsts.iteritems():
print "%s" % lst_name
for method_name in ["method_copy", "method_listcomp", "method_while"]:
stmt = '%s(lsts["%s"])' % (method_name, lst_name)
setup = "from __main__ import %s, lsts" % method_name

times = 3000000 / len(lst) # reduce the number of times you repeat for big lists
print " %s: %s" % (method_name, timeit.Timer(stmt, setup).repeat(3, times))

RESULTS:

Michael@MINIMOO ~
$ python -V && uname -a
Python 2.4
CYGWIN_NT-5.1 MINIMOO 1.5.12(0.116/4/2) 2004-11-10 08:34 i686 unknown unknown Cygwin

Michael@MINIMOO ~
$ python testlist.py
thousand
method_copy: [1.0479998588562012, 1.0080001354217529, 1.0119998455047607]
method_listcomp: [1.5119998455047607, 1.5110001564025879, 1.503000020980835]
method_while: [3.8680000305175781, 3.8680000305175781, 3.872999906539917]
hundred
method_copy: [1.1269998550415039, 1.127000093460083, 1.190000057220459]
method_listcomp: [2.0360000133514404, 2.0480000972747803, 2.0069999694824219]
method_while: [3.5299999713897705, 3.5540001392364502, 3.5130000114440918]
three
method_copy: [6.1210000514984131, 6.1679999828338623, 6.1239998340606689]
method_listcomp: [9.7590000629425049, 9.5309998989105225, 9.5]
method_while: [6.6609997749328613, 6.625, 6.6800000667572021]
million
method_copy: [1.3420000076293945, 1.3029999732971191, 1.3389999866485596]
method_listcomp: [1.5409998893737793, 1.5500001907348633, 1.5289998054504395]
method_while: [3.9619998931884766, 3.9210000038146973, 3.9590001106262207]

Now your "while" method does use less memory, but it is not as fast as the
copy method.

If you want to get really hairy, you can compare the bytecode instructions
for these three methods:

$ python
Python 2.4 (#1, Dec 4 2004, 20:10:33)
[GCC 3.3.3 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more information. 13 0 LOAD_CONST 1 (0)
3 STORE_FAST 1 (i)

14 6 SETUP_LOOP 68 (to 77) 12 LOAD_GLOBAL 1 (len)
15 LOAD_FAST 0 (lst)
18 CALL_FUNCTION 1
21 COMPARE_OP 0 (<)
24 JUMP_IF_FALSE 48 (to 75)
27 POP_TOP

15 28 LOAD_FAST 0 (lst)
31 LOAD_FAST 1 (i)
34 BINARY_SUBSCR
35 LOAD_CONST 2 (2)
38 COMPARE_OP 2 (==)
41 JUMP_IF_FALSE 17 (to 61)
44 POP_TOP

16 45 LOAD_FAST 0 (lst)
48 LOAD_ATTR 3 (remove)
51 LOAD_FAST 1 (i)
54 CALL_FUNCTION 1
57 POP_TOP
58 JUMP_ABSOLUTE 9
18 62 LOAD_FAST 1 (i)
65 LOAD_CONST 3 (1)
68 INPLACE_ADD
69 STORE_FAST 1 (i)
72 JUMP_ABSOLUTE 9 76 POP_BLOCK

19 >> 77 LOAD_FAST 0 (lst)
80 RETURN_VALUE 4 0 SETUP_LOOP 45 (to 48)
3 LOAD_FAST 0 (lst)
6 SLICE+0
7 GET_ITER 11 STORE_FAST 1 (i)

5 14 LOAD_FAST 1 (i)
17 LOAD_CONST 1 (2)
20 COMPARE_OP 2 (==)
23 JUMP_IF_FALSE 17 (to 43)
26 POP_TOP

6 27 LOAD_FAST 0 (lst)
30 LOAD_ATTR 2 (remove)
33 LOAD_FAST 1 (i)
36 CALL_FUNCTION 1
39 POP_TOP
40 JUMP_ABSOLUTE 8
7 >> 48 LOAD_FAST 0 (lst)
51 RETURN_VALUE 10 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_FAST 1 (_[1])
7 LOAD_FAST 0 (lst)
10 GET_ITER 14 STORE_FAST 2 (i)
17 LOAD_FAST 2 (i)
20 LOAD_CONST 1 (2)
23 COMPARE_OP 3 (!=)
26 JUMP_IF_FALSE 11 (to 40)
29 POP_TOP
30 LOAD_FAST 1 (_[1])
33 LOAD_FAST 2 (i)
36 LIST_APPEND
37 JUMP_ABSOLUTE 11
>> 40 POP_TOP 41 JUMP_ABSOLUTE 11
>> 44 DELETE_FAST 1 (_[1])
47 RETURN_VALUE

For method_while, almost all of the times the list runs through,
you still have to add 1 to i, which is a lot of instructions:

18 62 LOAD_FAST 1 (i)
65 LOAD_CONST 3 (1)
68 INPLACE_ADD
69 STORE_FAST 1 (i)
72 JUMP_ABSOLUTE 9

With the other methods, when you find a false result, all you have to
do is the JUMP_ABSOLUTE. That saves you some time over several
million repetitions.

Well, that was longer than I thought it would be. HTH.
 
N

Nick Coghlan

Reinhold said:
Addition: In Py2.4, I can't find a problem with

for i in reversed(lst)

Some poking around suggests it's fine - and it avoids copying the list. Don't
delete anything earlier in the list than the current element though (which
list.remove() will do quite happily when data is duplicated).

However, there will still be data movement costs when deleting elements from the
middle of the list. In addition, remove() itself has to do a linear search for
the value being removed.

An off-place solution based on a list comprehension is usually going to be your
best performer - it's an O(n) operation, based on the size of the original list.
The in-place mechanisms can turn out to be O(n**2) due to worst-case memory
movement effects (lists don't allow gaps, so deleting items will usually trigger
data movement).

I think this is about the best you can do for an in-place version:
for i, x in enumerate(reversed(lst)):
if x == 2:
del lst[-i]

The effbot's version is still going to be faster though:
lst = [x for x in lst if x != 2]

Cheers,
Nick.
 
S

skull

Hi everybody, it is my first post in this newsgroup.
I am a newbie for python though I have several years development experience in c++.
recently, I was stumped when I tried to del item of a list when iteration.

here is the wrong way I did:

lst = [1, 2, 3]
for i in lst:
print i
if i == 2:
lst.remove(i)

the result is:

1
2
as you would see, '3' is missing. this problem is caused by 'lst.remove(i)'.
apparently, 'marked-and-sweep' is a solution to deal with this issue.
but I think there SHOULD BE more 'wise' trick. I want to get your help.

Thanks in advance.

- skull
 
J

John Machin

Fredrik said:
lst = [i for i in lst if i != 2]

(if you have 2.4, try replacing [] with () and see what happens)

The result is a generator with a name ("lst") that's rather misleading
in the context. Achieving the same result as the list comprehension, by
doing lst = list(i for ... etc etc), appears to be slower.
 
J

John Machin

Nick said:
I think this is about the best you can do for an in-place version:
for i, x in enumerate(reversed(lst)):
if x == 2:
del lst[-i]

Don't think, implement and measure. You may be surprised. Compare these
two for example:

!def method_del_bkwds(lst, x):
! for inx in xrange(len(lst) - 1, -1, -1):
! if lst[inx] == x:
! del lst[inx]
! return lst

!def method_coghlan(lst, x):
! for i, obj in enumerate(reversed(lst)):
! if obj == x:
! del lst[-i]
! return lst
The effbot's version is still going to be faster though:
lst = [x for x in lst if x != 2]

Have you measured this?
 
S

skull

Thank you for your replys.
lst[:] is did a solution, it makes a copy of list specially for iteration and
removes items from the original one.

but I still have an other thing to worry about coming with this way: does
performance sucks when the list is big enough?
It makes a copy operation!

here is a faster and 'ugly' solution:

lst = [1, 2, 3]
i = 0
while i < len(lst):
if lst == 2:
lst.remove(i)
else:
i += 1
Hi everybody, it is my first post in this newsgroup.
I am a newbie for python though I have several years development experience in c++.
recently, I was stumped when I tried to del item of a list when iteration.

here is the wrong way I did:

lst = [1, 2, 3]
for i in lst:
print i
if i == 2:
lst.remove(i)

the result is:

1
2
as you would see, '3' is missing. this problem is caused by 'lst.remove(i)'.
apparently, 'marked-and-sweep' is a solution to deal with this issue.
but I think there SHOULD BE more 'wise' trick. I want to get your help.

Thanks in advance.

- skull
 
J

John Machin

Michael said:
skull said:
but I still have an other thing to worry about coming with this way: does
performance sucks when the list is big enough?
It makes a copy operation!

here is a faster and 'ugly' solution:

lst = [1, 2, 3]
i = 0
while i < len(lst):
if lst == 2:
lst.remove(i)
else:
i += 1


Actually, that is the slowest of the three methods proposed so far for
large lists on my system.


Assuming, as have other posters, that the requirement is to remove all
elements whose value is 2: it doesn't work. The result is [2, 3]
instead of the expected [1, 3].
method_while: [3.8680000305175781, 3.8680000305175781,
3.872999906539917]

Three significant figures is plenty. Showing just the minimum of the
results might be better.
If you want to get really hairy, you can compare the bytecode instructions
for these three methods:
Timing and bytecode-peeking a broken function are a little "premature".
 
S

skull

Reinhold Birkenfeld said:
Quick solution:

for i in lst[:]

iterates over a copy.

Addition: In Py2.4, I can't find a problem with

for i in reversed(lst)

Any objections?

Reinhold

I just downloaded py2.4, and made a test using reversed.
it sure be no problem, I thought maybe the reversed holded a copy of list,
and eventually iterated through the copy.
but the truth is not as I thought so:

import sys

class Test:
pass

lst = [Test(),Test(),Test()]

E1: for i in lst[:]:
E2: for i in reversed(lst):
print sys.getrefcount(i)

###################
E1 outputs:
4
4
4

E2 outputs:
3
3
3

It looks that the reversed does not make a copy of list in contrast with lst[:].
so should we regard: reversed is faster than lst[:]?
I do not have any idea about why it is.

- skull
 
J

John Machin

Nick said:
I think this is about the best you can do for an in-place version:
for i, x in enumerate(reversed(lst)):
if x == 2:
del lst[-i]
I think del lst[-i-1] might be functionally better.
 
M

Michael Hoffman

John said:
Three significant figures is plenty. Showing just the minimum of the
results might be better.

It might be, but how much time do you want to spend on getting your
results for a benchmark that will be run once in the "better" format?

Next time you can run the benchmark yourself and it will be in exactly
the format you want. Give me a break.
 
J

John Machin

Michael said:
It might be, but how much time do you want to spend on getting your
results for a benchmark that will be run once in the "better" format?

About the same time as the "worse" format. The Mona Lisa was painted
once. The Taj Mahal was built once.
Next time you can run the benchmark yourself and it will be in exactly
the format you want.

I've done that already. I've taken your code and improved it along the
suggested lines, added timing for first, middle, and last elements,
added several more methods, and added a testing facility as well. Would
you like a copy?
 
B

Bengt Richter

Hi everybody, it is my first post in this newsgroup.
I am a newbie for python though I have several years development experience in c++.
recently, I was stumped when I tried to del item of a list when iteration.

here is the wrong way I did:

lst = [1, 2, 3]
for i in lst:
print i
if i == 2:
lst.remove(i)

the result is:

1
2
as you would see, '3' is missing. this problem is caused by 'lst.remove(i)'.
apparently, 'marked-and-sweep' is a solution to deal with this issue.
but I think there SHOULD BE more 'wise' trick. I want to get your help.

Thanks in advance.
No one seems to have suggested this in-place way yet,
so I'll trot it out once again ;-)
>>> lst = [1, 2, 3]
>>> i = 0
>>> for item in lst:
... if item !=2:
... lst = item
... i += 1
...
[1, 3]


Regards,
Bengt Richter
 
J

John Machin

Bengt said:
No one seems to have suggested this in-place way yet,
so I'll trot it out once again ;-)
lst = [1, 2, 3]
i = 0
for item in lst:
... if item !=2:
... lst = item
... i += 1
...
[1, 3]


Works, but slowly. Here's another that appears to be the best on large
lists, at least for removing 1 element. It's O(len(list) *
number_to_be_removed).

!def method_try_remove(lst, remove_this):
! try:
! while 1:
! lst.remove(remove_this)
! except:
! pass
 
B

Bengt Richter

Bengt said:
No one seems to have suggested this in-place way yet,
so I'll trot it out once again ;-)
lst = [1, 2, 3]
i = 0
for item in lst:
... if item !=2:
... lst = item
... i += 1
...
del lst[i:]
lst
[1, 3]


Works, but slowly. Here's another that appears to be the best on large
lists, at least for removing 1 element. It's O(len(list) *
number_to_be_removed).


So my loop above should beat the while below pretty handily if deleting 2 from lst=[2]*10**7 ;-)
I.e., it should be O(len(lst)) period, though obviously lst.remove will be C-implemented,
and there will be a crossover point. Psyco ought to do well with the above, IWT.
!def method_try_remove(lst, remove_this):
! try:
! while 1:
! lst.remove(remove_this)
! except:
! pass

Maybe
... i = 0
... for lst, item in itertools.izip(lst, lst): i += item!=vrem
... del lst[i:]
...
>>> lst = [1, 2, 3]
>>> foo(lst, 2)
>>> lst
[1, 3]

I wonder if py2.4 generator expression as a souce of slice assignment, i.e.,
(did someone post this one? Someone must have ;-)

lst[:] = (item for item in lst if item!=remove_this)

would do what my loop is doing, in place. If the slice assignment accepts an
iterator, that might do it, if [:] assignment overwrites the old list
until it has to be extended or in this case possibly trimmed at the end.

If so, its hardly worth writing a function, since the generator expression should
have fast local access to its variables.

Regards,
Bengt Richter
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Fredrik said:
faster? did you try it, or are you combining a C++ mindset with an
urge to do premature optimizations? (hint: it's slower)

Hmm. Given

import time

length = 2**15

L = range(length)

start = time.clock()
for i in L[:]:
if i%2 == 1:
L.remove(i)
print time.clock()-start

L = range(length)
start = time.clock()
i = 0
while i < len(L):
if L%2 == 1:
del L
else:
i += 1
print time.clock()-start

on my system (python 2.4, Linux 2.6, Pentium 4 3.2GHz), I get

4.85
0.29

So using an explicit index is *much* faster (the OP's version
was incorrect, of course, as it used .remove with an index).
if you care about performance, you shouldn't use "remove"

Yes, but primarily because remove needs to search the entire
list for the item to be removed.

Whether del is more efficient than building a new list
depends on the number of items to be removed.

Regards,
Martin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top