List comprehension timing difference.

Bart Kastermans · Sep 2, 2011

In the following code I create the graph with vertices
sgb-words.txt (the file of 5 letter words from the
stanford graphbase), and an edge if two words differ
by one letter. The two methods I wrote seem to me to
likely perform the same computations, the list comprehension
is faster though (281 seconds VS 305 seconds on my dell mini).

Is the right interpretation of this timing difference
that the comprehension is performed in the lower level
C code?

As this time I have no other conjecture about the cause.

---------------------------------------------------------
import time
import copy

data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())

def d (w1, w2):
count = 0
for idx in range(0,5):
if w1[idx] != w2[idx]:
count += 1
return count

print "creating graph"
t0 = time.clock ()
graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a < b]
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

t0 = time.clock ()
graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i < j:
graph2.append ([i,j])
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

MRAB · Sep 2, 2011

In the following code I create the graph with vertices
sgb-words.txt (the file of 5 letter words from the
stanford graphbase), and an edge if two words differ
by one letter. The two methods I wrote seem to me to
likely perform the same computations, the list comprehension
is faster though (281 seconds VS 305 seconds on my dell mini).

Is the right interpretation of this timing difference
that the comprehension is performed in the lower level
C code?

As this time I have no other conjecture about the cause.

---------------------------------------------------------
import time
import copy

data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())

def d (w1, w2):
count = 0
for idx in range(0,5):
if w1[idx] != w2[idx]:
count += 1
return count

print "creating graph"
t0 = time.clock ()
graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a< b]
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

t0 = time.clock ()
graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i< j:
graph2.append ([i,j])
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

Are they actually equivalent? Does graph == graph2?

The first version (list comprehension) creates a list of pairs of
values:

[a, b]

whereas the second version (for loops) creates a list of pairs of
indexes:

[i, j]

The second version has subscripting ("data" and "data[j]"), which
will slow it down.

Bart Kastermans · Sep 2, 2011

MRAB said:
On 02/09/2011 01:35, Bart Kastermans wrote:

graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a< b]
graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i< j:
graph2.append ([i,j])

Click to expand...

Are they actually equivalent? Does graph == graph2?

The first version (list comprehension) creates a list of pairs of
values:

[a, b]

whereas the second version (for loops) creates a list of pairs of
indexes:

[i, j]

The second version has subscripting ("data" and "data[j]"), which
will slow it down.

Click to expand...

You are absolutely right. I had changed the code from the
equivalent:

graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i < j:
graph2.append ([data,data[j]])

But then also tried the equivalent

for a in data:
for b in data:
if d(a,b) == 1 and a < b:
graph2.append([a,b])

Which does away with the indexing, and is just about exactly as
fast as the list comprehension.

That'll teach me; I almost didn't ask the question thinking it might
be silly. And it was, but I thought it for the wrong reason. I tell my
students there are no stupid questions, I should listen to myself
more when I do. Thanks!

ting · Sep 2, 2011

if d(a,b) == 1 and a < b:

It will probably be faster if you reverse the evaluation order of that
expression.

if a<b and d(a,b)==1:

That way the d() function is called less than half the time. Of course
this assumes that a<b is a faster evaluation than d(a,b), but I think
that's true for your example.

Bart Kastermans · Sep 3, 2011

It will probably be faster if you reverse the evaluation order of that
expression.

if a<b and d(a,b)==1:

That way the d() function is called less than half the time. Of course
this assumes that a<b is a faster evaluation than d(a,b), but I think
that's true for your example.

Indeed makes quite a difference, goes from 275 seconds down to
153 seconds.

looping versus comprehension	0	Jan 30, 2013
Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
list comprehension syntax..?	4	Aug 1, 2006
List comprehension - NameError: name '_[1]' is not defined ?	27	Jan 15, 2009
Python code problem	2	Apr 23, 2023
I Need Fix In Code	1	Apr 12, 2023
Unsupported operand type(s) for +: 'float' and 'tuple'	6	Jun 10, 2011

List comprehension timing difference.

Bart Kastermans

MRAB

Bart Kastermans

ting

Bart Kastermans

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads