Turning f(callback) into a generator

P

Peter Otten

It's easy to write a function that wraps a generator and provides a
callback. E. g.:

import os, sys

def walk(path, visit):
""" Emulate os.path.walk() (simplified) using os.walk()"""
for dir, folders, files in os.walk(path):
visit(dir, folders + files)


if __name__ == "__main__":
walk(".", lambda d, f: sys.stdout.write(d + "\n"))


However, I did not succeed in turning the old os.path.walk(), i. e. a
function taking a callback, into a generator. Is there a general way to do
it without having to store all intermediate results first?


Peter

PS. No, I don't have a use case. Threads welcome if all else fails :)
 
D

Diez B. Roggisch

Hi,
However, I did not succeed in turning the old os.path.walk(), i. e. a
function taking a callback, into a generator. Is there a general way to do
it without having to store all intermediate results first?

This works for me:

import os.path

def path_gen(start):
res = []
def cb(r, dir, names):
for n in names:
r.append(n)

os.path.walk(start, cb, res)
for n in res:
yield n


g = path_gen("/etc")

for n in g:
print n


Diez
 
D

Diez B. Roggisch

def path_gen(start):
res = []
def cb(r, dir, names):
for n in names:
r.append(n)

os.path.walk(start, cb, res)
for n in res:
yield n


g = path_gen("/etc")

for n in g:
print n

Just found out that lists support extend, which allows the ugly loop for
appending names in cb to be written this way:

r.extend(names)

Diez
 
P

Peter Otten

Diez said:
However, I did not succeed in turning the old os.path.walk(), i. e. a
function taking a callback, into a generator. Is there a general way to
do it without having to store all intermediate results first?

This works for me:

import os.path

def path_gen(start):
res = []
def cb(r, dir, names):
for n in names:
r.append(n)

os.path.walk(start, cb, res)

At this point, you have stored all results in res, and thus did not play by
the rules :)
for n in res:
yield n


g = path_gen("/etc")

for n in g:
print n

Both os.path.walk() and os.walk() basically need memory for the names in
*one* directory; when you take os.walk() to model os.path.walk() that
doesn't change, but the other way round - bang, memory usage explosion.

This is a strange asymmetry, and I was wondering if I've overlooked a simple
way to transform a callback into a generator, but I now tend to the
assumption that you *need* threads: One thread with the callback puts names
or name lists into the queue until it's full (some arbitrary limit that
also determines the total memory needed), another thread (the generator)
consumes names from the queue.

Peter
 
B

Bengt Richter

It's easy to write a function that wraps a generator and provides a
callback. E. g.:

import os, sys

def walk(path, visit):
""" Emulate os.path.walk() (simplified) using os.walk()"""
for dir, folders, files in os.walk(path):
visit(dir, folders + files)


if __name__ == "__main__":
walk(".", lambda d, f: sys.stdout.write(d + "\n"))


However, I did not succeed in turning the old os.path.walk(), i. e. a
function taking a callback, into a generator. Is there a general way to do
it without having to store all intermediate results first?


Peter

PS. No, I don't have a use case. Threads welcome if all else fails :)
I suspect that's what is necessary currently, until we get a yield that can suspend a
whole stack of frames at a yield inside nested calls to functions. Then it would
just be a matter of putting a yield in a callback routine and starting the
walk from the base generator-making function/method/whatever.

Maybe deep generators could be created via an __iter__ method of the function type
as an alternative/extension to has-yield-in-function-code-body magic.
Exit from the generator wouldn't happen until you exited the base frame.
Yield in a nested function call would just suspend right there.
Easier said than done, of course ;-)

Regards,
Bengt Richter
 
D

Diez B. Roggisch

Hi,
This is a strange asymmetry, and I was wondering if I've overlooked a
simple way to transform a callback into a generator, but I now tend to the
assumption that you *need* threads: One thread with the callback puts
names or name lists into the queue until it's full (some arbitrary limit
that also determines the total memory needed), another thread (the
generator) consumes names from the queue.

I also thought about that - its the only thing that allows for real lazyness
- you can wait in the callback until the generator clears a semaphore. But
if the context-changes are worth the effort is questionable.

Diez
 
P

Peter Otten

Diez said:
I also thought about that - its the only thing that allows for real
lazyness - you can wait in the callback until the generator clears a
semaphore. But if the context-changes are worth the effort is
questionable.

There is another aspect I forgot to mention. Your approach would only start
to yield results after all results are found, i. e. in the example you
would have to scan the whole harddisk even when the first visited file
might have been the one you want.
So I think it is best to either not change the callback approach or use
threads as outlined by Jimmy Retzlaff.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top