List comprehension timing difference.

  • Thread starter Bart Kastermans
  • Start date
B

Bart Kastermans

In the following code I create the graph with vertices
sgb-words.txt (the file of 5 letter words from the
stanford graphbase), and an edge if two words differ
by one letter. The two methods I wrote seem to me to
likely perform the same computations, the list comprehension
is faster though (281 seconds VS 305 seconds on my dell mini).

Is the right interpretation of this timing difference
that the comprehension is performed in the lower level
C code?

As this time I have no other conjecture about the cause.

---------------------------------------------------------
import time
import copy

data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())

def d (w1, w2):
count = 0
for idx in range(0,5):
if w1[idx] != w2[idx]:
count += 1
return count

print "creating graph"
t0 = time.clock ()
graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a < b]
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

t0 = time.clock ()
graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i < j:
graph2.append ([i,j])
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."
 
M

MRAB

In the following code I create the graph with vertices
sgb-words.txt (the file of 5 letter words from the
stanford graphbase), and an edge if two words differ
by one letter. The two methods I wrote seem to me to
likely perform the same computations, the list comprehension
is faster though (281 seconds VS 305 seconds on my dell mini).

Is the right interpretation of this timing difference
that the comprehension is performed in the lower level
C code?

As this time I have no other conjecture about the cause.

---------------------------------------------------------
import time
import copy

data = map (lambda x: x.strip(), open('sgb-words.txt').readlines())

def d (w1, w2):
count = 0
for idx in range(0,5):
if w1[idx] != w2[idx]:
count += 1
return count

print "creating graph"
t0 = time.clock ()
graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a< b]
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."

t0 = time.clock ()
graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i< j:
graph2.append ([i,j])
t1 = time.clock ()
print "took " + str (t1 - t0) + " seconds."


Are they actually equivalent? Does graph == graph2?

The first version (list comprehension) creates a list of pairs of
values:

[a, b]

whereas the second version (for loops) creates a list of pairs of
indexes:

[i, j]

The second version has subscripting ("data" and "data[j]"), which
will slow it down.
 
B

Bart Kastermans

MRAB said:
On 02/09/2011 01:35, Bart Kastermans wrote:
graph = [[a,b] for a in data for b in data if d(a,b) ==1 and a< b]
graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i< j:
graph2.append ([i,j])

Are they actually equivalent? Does graph == graph2?

The first version (list comprehension) creates a list of pairs of
values:

[a, b]

whereas the second version (for loops) creates a list of pairs of
indexes:

[i, j]

The second version has subscripting ("data" and "data[j]"), which
will slow it down.


You are absolutely right. I had changed the code from the
equivalent:

graph2 = []
for i in range (0, len(data)):
for j in range(0,len(data)):
if d(data,data[j]) == 1 and i < j:
graph2.append ([data,data[j]])

But then also tried the equivalent

for a in data:
for b in data:
if d(a,b) == 1 and a < b:
graph2.append([a,b])

Which does away with the indexing, and is just about exactly as
fast as the list comprehension.


That'll teach me; I almost didn't ask the question thinking it might
be silly. And it was, but I thought it for the wrong reason. I tell my
students there are no stupid questions, I should listen to myself
more when I do. Thanks!
 
T

ting

if d(a,b) == 1 and a < b:

It will probably be faster if you reverse the evaluation order of that
expression.

if a<b and d(a,b)==1:

That way the d() function is called less than half the time. Of course
this assumes that a<b is a faster evaluation than d(a,b), but I think
that's true for your example.
 
B

Bart Kastermans

It will probably be faster if you reverse the evaluation order of that
expression.

if a<b and d(a,b)==1:

That way the d() function is called less than half the time. Of course
this assumes that a<b is a faster evaluation than d(a,b), but I think
that's true for your example.

Indeed makes quite a difference, goes from 275 seconds down to
153 seconds.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top