filter list fast

L

lars_woetmann

I have a list I filter using another list and I would like this to be
as fast as possible
right now I do like this:

[x for x in list1 if x not in list2]

i tried using the method filter:

filter(lambda x: x not in list2, list1)

but it didn't make much difference, because of lambda I guess
is there any way I can speed this up
 
B

Ben Cartwright

lars_woetmann said:
I have a list I filter using another list and I would like this to be
as fast as possible
right now I do like this:

[x for x in list1 if x not in list2]

i tried using the method filter:

filter(lambda x: x not in list2, list1)

but it didn't make much difference, because of lambda I guess
is there any way I can speed this up

Both of these techniques are O(n^2). You can reduce it to O(n log n)
by using sets:
set2 = set(list2)
[x for x in list1 if x not in set2]

Checking to see if an item is in a set is much more efficient than a
list.

--Ben
 
F

Fredrik Lundh

lars_woetmann said:
I have a list I filter using another list and I would like this to be
as fast as possible right now I do like this:

[x for x in list1 if x not in list2]

i tried using the method filter:

filter(lambda x: x not in list2, list1)

but it didn't make much difference, because of lambda I guess
is there any way I can speed this up

if list2 is a list object, "not in list2" is an O(N) operation.

maybe you should use sets instead ? does the following work
better ?

set2 = set(list2)
result = [x for x in list1 if x not in set2]

?

</F>
 
K

Klaus Alexander Seistrup

Lars said:
I have a list I filter using another list and I would like
this to be as fast as possible
right now I do like this:

[x for x in list1 if x not in list2]

i tried using the method filter:

filter(lambda x: x not in list2, list1)

but it didn't make much difference, because of lambda I guess
is there any way I can speed this up

If you use a reasonably new python version, you could use sets:

#v+
a = set(range(10))
a set([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b = set(range(5, 15))
b set([5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
a.difference(b) set([0, 1, 2, 3, 4])
a-b set([0, 1, 2, 3, 4])
list(a-b) [0, 1, 2, 3, 4]

#v-

Cheers,
 
D

Diez B. Roggisch

Both of these techniques are O(n^2). You can reduce it to O(n log n)
by using sets:
set2 = set(list2)
[x for x in list1 if x not in set2]

Checking to see if an item is in a set is much more efficient than a
list.

Is the set-lookup reliably O(log n)? I was under the impression that it is
hash-based, and this should be O(1) usually, but couldbve O(n) worst-case
(hash the same for _all_ entries).

Regards,

Diez
 
P

Peter Hansen

Diez said:
Both of these techniques are O(n^2). You can reduce it to O(n log n)
by using sets:
set2 = set(list2)
[x for x in list1 if x not in set2]

Checking to see if an item is in a set is much more efficient than a
list.

Is the set-lookup reliably O(log n)? I was under the impression that it is
hash-based, and this should be O(1) usually, but couldbve O(n) worst-case
(hash the same for _all_ entries).

That's largely a theoretical concern. Google for something like

'''dict worst-case performance "tim peters"'''

to learn more. (The third article there (no doubt obsolete in some
ways, given that it was in 2000) says that Python "keeps at least 1/3 of
the internal hash table entries unused, making collisions very rarely a
problem... It's possible to contrive keys that will cause collisions
systematically ... but unlikely to happen by accident in 2.0")

-Peter
 
L

lars_woetmann

comparing
[x for x in list1 if x not in list2]
with
set1, set2 = set(list1), set(list2)
list(set1-set2)

gives something like
len(list2) speedup
------------------------------
100 10
1000 100
10000 1000

the speedup is constant for different len(list1)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top