Which is faster? (if not b in m) or (if m.count(b) > 0)

F

Farel

Which is Faster in Python and Why?

jc = {}; m = []
x = [ [1,2,3,4,5,6,7,8,9],[..],.......] # upwards of 10000 entries

def mcountb():
for item in x:
b = item[:]; b.sort(); bc = 0
for bitem in b: bc += int(bitem)
try: m = jc[bc]
except: m = []
if m.count(b) == 0:
m.append(b); jc[bc]=m

def binm()
for item in x:
b = item[:]; b.sort(); bc = 0
for bitem in b: bc += int(bitem)
try: m = jc[bc]
except: m = []
if not b in m:
m.append(b); jc[bc]=m
 
M

Marc 'BlackJack' Rintsch

Which is Faster in Python and Why?

``if not b in m`` looks at each element of `m` until it finds `b` in it
and stops then. Assuming `b` is in `m`, otherwise all elements of `m` are
"touched".

``if m.count(b) > 0`` will always goes through all elements of `m` in the
`count()` method.

Ciao,
Marc 'BlackJack' Rintsch
 
F

Felipe Almeida Lessa

Em Ter, 2006-02-14 às 20:14 -0800, Farel escreveu:
Which is Faster in Python and Why?

jc = {}; m = []
x = [ [1,2,3,4,5,6,7,8,9],[..],.......] # upwards of 10000 entries
def binm()
for item in x:
b = item[:]; b.sort(); bc = 0
for bitem in b: bc += int(bitem)
try: m = jc[bc]
except: m = []
if not b in m:
m.append(b); jc[bc]=m

Why do you copy the list and sort it if you're just summing its
elements? Instead of

b = item[:]; b.sort(); bc = 0
for bitem in b: bc += int(bitem)

you can do just

bc = sum(int(i) for i in item)

or, if the order *really* matters, what is very strange, you can do

bc = sum(int(i) for i in sorted(item))

Another thing, if you want to have in the values of the dictionary jc
only unique elements, you can use sets:
a = set()
print a set([])
a.add(10)
print a set([10])
a.add(10)
print a set([10])
a.add(10)
print a
set([10])

So, the final example would be (assuming that you don't need to sum in
order):

def binm():
for item in x:
bc = sum(int(i) for i in item)
try:
jc[bc].add(b)
except KeyError:
jc[bc] = set()

Looks prettier, have less statements and is (or should be) faster. This
only works in Python 2.4, but you can adapt it to run on other versions,
too.

Cheers,
Felipe.

--
"Quem excele em empregar a força militar subjulga os exércitos dos
outros povos sem travar batalha, toma cidades fortificadas dos outros
povos sem as atacar e destrói os estados dos outros povos sem lutas
prolongadas. Deve lutar sob o Céu com o propósito primordial da
'preservação'. Desse modo suas armas não se embotarão, e os ganhos
poderão ser preservados. Essa é a estratégia para planejar ofensivas."

-- Sun Tzu, em "A arte da guerra"
 
S

Steven D'Aprano

``if not b in m`` looks at each element of `m` until it finds `b` in it
and stops then. Assuming `b` is in `m`, otherwise all elements of `m` are
"touched".

``if m.count(b) > 0`` will always goes through all elements of `m` in the
`count()` method.

But the first technique executes in (relatively slow) pure Python, while
the count method executes (relatively fast) C code. So even though count
may do more work, it may do it faster.

The only way to tell for sure is to actually time the code, not try to
guess.
 
G

Georg Brandl

Steven said:
But the first technique executes in (relatively slow) pure Python, while
the count method executes (relatively fast) C code. So even though count
may do more work, it may do it faster.

Why does "not b in m" execute in pure Python? list.__contains__ is implemented
in C code as well as list.count.

Georg
 
K

Kent Johnson

Steven said:
But the first technique executes in (relatively slow) pure Python, while
the count method executes (relatively fast) C code. So even though count
may do more work, it may do it faster.

'a in b' actually takes fewer bytecodes because 'in' has it's own
bytecode but b.count requires an attribute lookup:
... a in b
... ... b.count(a)
... 2 0 LOAD_GLOBAL 0 (a)
3 LOAD_GLOBAL 1 (b)
6 COMPARE_OP 6 (in)
9 POP_TOP
10 LOAD_CONST 0 (None)
13 RETURN_VALUE 2 0 LOAD_GLOBAL 0 (b)
3 LOAD_ATTR 1 (count)
6 LOAD_GLOBAL 2 (a)
9 CALL_FUNCTION 1
12 POP_TOP
13 LOAD_CONST 0 (None)
16 RETURN_VALUE

Not much difference in speed when the item is not in the list, a slight
edge to 'a in b' for a large list:

D:\>python -m timeit -s "lst = range(1000)" "1001 in lst"
10000 loops, best of 3: 55.5 usec per loop

D:\>python -m timeit -s "lst = range(1000)" "lst.count(1001) > 1"
10000 loops, best of 3: 55.5 usec per loop

D:\>python -m timeit -s "lst = range(1000000)" "lst.count(-1) > 1"
10 loops, best of 3: 62.2 msec per loop

D:\>python -m timeit -s "lst = range(1000000)" "(-1) in lst"
10 loops, best of 3: 60.8 msec per loop

Of course if the item appears in the list, 'a in b' has a huge advantage:

D:\>python -m timeit -s "lst = range(1000000)" "(1) in lst"
10000000 loops, best of 3: 0.171 usec per loop

Kent
 
S

Steven D'Aprano

Georg said:
Why does "not b in m" execute in pure Python? list.__contains__ is implemented
in C code as well as list.count.

Thank you to Kent Johnson for actually *timing* the two
pieces of code with various sets of data, cutting
through all the wild guesses (including mine). Which
was my point.

To answer Georg's question, perhaps I was wrong -- I
was under the impression that "target in source"
executes more or less like:

for obj in source:
if target == obj: return True
return False

Perhaps this was only true in older versions of Python,
or perhaps I was misinformed, or perhaps I imagined the
whole thing.

The important lesson is that your intuitions about what
will be fast are not necessarily correct, and all the
theorizing in the world is no substitute for good
testing. (On the other hand, lousy testing is
practically worthless.)
 
S

Steve Holden

Steven D'Aprano wrote:
[...]
(On the other hand, lousy testing is practically worthless.)

+1 QOTW
 
F

Farel

Well, I could, but I was looking for answers from persons more
experienced in the ways of Python then myself....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top