how can this iterator be optimized?

B

Basilisk96

Hello all,

I have the following function that uses an intermediate iterator
"rawPairs":

def MakePairs(path):
import os
import operator
join = os.path.join
rawPairs = (
(join(path, s), func(s))
for s in os.listdir(path)
if func(s) is not None and func(s).endswith("some criterion")
)
#Use the second item in the pair as the sort criterion
result = sorted(rawPairs, key=operator.itemgetter(1))
return result

where "func" is a single-argument function that returns either a
string or None, but is an expensive call.
I am pretty sure that the sorted() construct cannot be improved much
further, but...
....does anyone have ideas on improving the "rawPairs" iterator so that
it calls "func(s)" only once per iteration? Perhaps a lambda
construct, but I am not sure how to go about it...?

Cheers,
Basilisk96
 
G

Gabriel Genellina

def MakePairs(path):
import os
import operator
join = os.path.join
rawPairs = (
(join(path, s), func(s))
for s in os.listdir(path)
if func(s) is not None and func(s).endswith("some criterion")
)
#Use the second item in the pair as the sort criterion
result = sorted(rawPairs, key=operator.itemgetter(1))
return result

where "func" is a single-argument function that returns either a
string or None, but is an expensive call.
I am pretty sure that the sorted() construct cannot be improved much
further, but...
...does anyone have ideas on improving the "rawPairs" iterator so that
it calls "func(s)" only once per iteration? Perhaps a lambda
construct, but I am not sure how to go about it...?

Don't use a generator then. If you're going to finally return a list (and
sorted does exactly that), build a list right from the start:

def MakePairs(path):
join = os.path.join
result = []
append = result.append
for s in os.listdir(path):
key = func(s)
if key is not None and key.endswith("some criterion"):
append((join(path, s), key))
#Use the second item in the pair as the sort criterion
result.sort(key=operator.itemgetter(1))
return result
 
B

Basilisk96

Don't use a generator then. If you're going to finally return a list (and  
sorted does exactly that), build a list right from the start:

Good point. However, for the sake of argument, what if I removed the
sorting requirement and simply wanted the generator? I usually strive
for comprehensions if a for loop can be reduced to such. Would there
be any speed advantage in this case of a comprehension-style generator
vs. a for-yield loop (assuming there is a way to call func(s) once per
iteration in the comprehension)? In my example, func() operates on
filename strings, so it's not too bad.. but it's possible for this
pattern to apply to more substantial operations. My conjecture is
that higher func() loads would favor more the use of a simple for-
yield loop.

Cheers,
Basilisk96
 
T

Terry Reedy

Basilisk96 said:
Good point. However, for the sake of argument, what if I removed the
sorting requirement and simply wanted the generator?

Write a generator function with the same loop and a yield.

? I usually strive
for comprehensions if a for loop can be reduced to such.

You are overdoing it, I think. Generator expressions are for saving
programmer time, not machine time.
 
S

Steven D'Aprano

Good point. However, for the sake of argument, what if I removed the
sorting requirement and simply wanted the generator?

The same applies. You're problem is that you call an expensive function
three times instead of once. Some compilers can optimize that away, but
not in Python, because there's no guarantee that the function will be the
same function between calls.

I usually strive
for comprehensions if a for loop can be reduced to such.

Any particular reason?

Would there be
any speed advantage in this case of a comprehension-style generator vs.
a for-yield loop (assuming there is a way to call func(s) once per
iteration in the comprehension)?

I understand that comprehensions are slightly slower. I recently found a
40% speed up on a trivial generator expression when converted to a for
loop. Admittedly the gen expr did very little, and I haven't tested list
comprehensions.


In my example, func() operates on
filename strings, so it's not too bad.. but it's possible for this
pattern to apply to more substantial operations. My conjecture is that
higher func() loads would favor more the use of a simple for- yield
loop.

If there's only one call to func(), and you ignore the (probably) fixed
cost of jumping into a generator each time, then it shouldn't make any
difference.

If you are comparing one call to func() in a for loop versus three calls
to func() in a list comp or generator expression, then of course the for
loop will be more efficient.
 
J

josh logan

Hello all,

I have the following function that uses an intermediate iterator
"rawPairs":

def MakePairs(path):
    import os
    import operator
    join = os.path.join
    rawPairs = (
        (join(path, s), func(s))
        for s in os.listdir(path)
        if func(s) is not None and func(s).endswith("some criterion")
    )
    #Use the second item in the pair as the sort criterion
    result = sorted(rawPairs, key=operator.itemgetter(1))
    return result

where "func" is a single-argument function that returns either a
string or None, but is an expensive call.
I am pretty sure that the sorted() construct cannot be improved much
further, but...
...does anyone have ideas on improving the "rawPairs" iterator so that
it calls "func(s)" only once per iteration?  Perhaps a lambda
construct, but I am not sure how to go about it...?

Cheers,
Basilisk96

Hi,
Try something like this:

import os
from os.path import join
from itertools import ifilter #assuming 2.5 and earlier, for 3.0 just
use the filter builtin
from operator import itemgetter

def isvalid(pair):
return (s[1] is not None) and s[1].endswith('some criteria')

def makepairs(path):
pair_iter = ((join(path,s), func(s)) for s in os.listdir(path))
pair_iter = ifilter(isvalid, pair_iter)
result = sorted(pair_iter, key=itemgetter(1))

return result
 
B

Basilisk96

Any particular reason?

Only two.
1.) I was impressed by their clarity and conciseness when I first
discovered them.
2.) I also read now and then that simple list comprehensions are
faster when compared with their for-loop equivalents because of the
way comprehensions are implemented under the hood. My example is a far
cry from a "simple" comprehension, however. :)

If there's only one call to func(), and you ignore the (probably) fixed
cost of jumping into a generator each time, then it shouldn't make any
difference.

If you are comparing one call to func() in a for loop versus three calls
to func() in a list comp or generator expression, then of course the for
loop will be more efficient.

I agree. I would rather call func() only once per iteration in any
case. I will revise it to a plain for loop with a single call.

Thanks,
-Basilisk96
 
J

josh logan

Only two.
1.) I was impressed by their clarity and conciseness when I first
discovered them.
2.) I also read now and then that simple list comprehensions are
faster when compared with their for-loop equivalents because of the
way comprehensions are implemented under the hood. My example is a far
cry from a "simple" comprehension, however. :)



I agree. I would rather call func() only once per iteration in any
case. I will revise it to a plain for loop with a single call.

Thanks,
-Basilisk96

Just as long as you do realize that it is possible to do what you were
looking with one call to func() using chained generators, as
demonstrated in my post above. Whichever way is clearest for you would
be the way I'd go.
 
P

Paul McGuire

...
where "func" is a single-argument function that returns either a
string or None, but is an expensive call.
I am pretty sure that the sorted() construct cannot be improved much
further, but...
...does anyone have ideas on improving the "rawPairs" iterator so that
it calls "func(s)" only once per iteration?  Perhaps a lambda
construct, but I am not sure how to go about it...?

Cheers,
Basilisk96

If func is expensive, you could try memoizing it. Then subsequent
"calls" just do arg lookups. Michele Simianato has posted a good
memoizing decorator on the Python wiki.

-- Paul
 
B

Basilisk96

If func is expensive, you could try memoizing it.  Then subsequent
"calls" just do arg lookups.  Michele Simianato has posted a good
memoizing decorator on the Python wiki.

-- Paul

That's the trick! I almost forgot about that one. Thanks!
-Basilisk96
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,021
Latest member
AkilahJaim

Latest Threads

Top