Ju said:
I want to remove about 50000 elements from a list,which has 100000
elements.
sample code like below:
... a.remove(x)
...[4, 5, 6, 7, 8, 9]
when a and b is small size, it will finished quickly, but when a and b
have many elements.
such as:... a.remove(x)
...
it will very slowly. Shall I change to another data structure and choos
a better arithmetic?
any suggestion is welcome.
thanks a lot!
A list isn't going to respond very well to random removals of large
numbers of elements like this. Remember that it must do linear search
to find the value to be removed and then basically build a completely
new list with the remaining values. Depending on the data in question,
you might be able to leverage things. Are the two lists sorted? If so
you can iterate over both of them and build a third list with the results.
This is not particularly 'elegant' but it is fast for sorted lists:
import time
a=range(100000)
b=range(50000)
start_time=time.time()
for x in b:
a.remove(x)
stop_time=time.time()
print "brute force elapsed time=%.2f seconds" % (stop_time-start_time)
start_time=time.time()
n=[]
i1=0
i2=0
while 1:
try: v1=a[i1]
except IndexError:
break
try: v2=b[i2]
except IndexError:
n.extend(a[i1:])
break
if v1 < v2:
n.append(v1)
i1+=1
continue
elif v1 > v2:
i2+=1
continue
else:
i1+=1
i2+=1
continue
stop_time=time.time()
print "new elapsed time=%.2f seconds" % (stop_time-start_time)
start_time=time.time()
a=set(a)
b=set(b)
a.symmetric_difference(b)
stop_time=time.time()
print "sets elapsed time=%.2f seconds" % (stop_time-start_time)
brute force elapsed time=4.13 seconds
new elapsed time=0.05 seconds
sets elapsed time=0.03 seconds
There may be other ways using bisect or some other module. If data
is random, unsorted or has duplicates, I would look at in-memory
database like SQLlite instead.
If the values are unique you can do something like:
a=set(range(100000))
b=set(range(50000))
a.symmetric_difference(b)
Which is the fastest way.
It all depends on the "real" data which we can't tell from your
test using range().
-Larry Bates