postprocessing in os.walk

K

kj

Perl's directory tree traversal facility is provided by the function
find of the File::Find module. This function accepts an optional
callback, called postprocess, that gets invoked "just before leaving
the currently processed directory." The documentation goes on to
say "This hook is handy for summarizing a directory, such as
calculating its disk usage", which is exactly what I use it for in
a maintenance script.

This maintenance script is getting long in the tooth, and I've been
meaning to add a few enhancements to it for a while, so I thought
that in the process I'd port it to Python, using the os.walk
function, but I see that os.walk does not have anything like this
File::Find::find's postprocess hook. Is there a good way to simulate
it (without having to roll my own File::Find::find in Python)?

TIA!

kynn
 
J

jordilin

Well, you could use the alternative os.path.walk instead. You can pass
a callback as a parameter, which will be invoked every time you
bump into a new directory. The signature is os.path.walk
(path,visit,arg). Take a look at the python library documentation.
 
D

Dave Angel

kj said:
Perl's directory tree traversal facility is provided by the function
find of the File::Find module. This function accepts an optional
callback, called postprocess, that gets invoked "just before leaving
the currently processed directory." The documentation goes on to
say "This hook is handy for summarizing a directory, such as
calculating its disk usage", which is exactly what I use it for in
a maintenance script.

This maintenance script is getting long in the tooth, and I've been
meaning to add a few enhancements to it for a while, so I thought
that in the process I'd port it to Python, using the os.walk
function, but I see that os.walk does not have anything like this
File::Find::find's postprocess hook. Is there a good way to simulate
it (without having to roll my own File::Find::find in Python)?

TIA!

kynn
Why would you need a special hook when the os.walk() generator yields
exactly once per directory? So whatever work you do on the list of
files you get, you can then put the summary logic immediately after.

Or if you really feel you need a special hook, then write a wrapper for
os.walk(), which takes a hook function as a parameter, and after
yielding each file in a directory, calls the hook. Looks like about 5
lines.

DaveA
 
K

kj

In said:
Why would you need a special hook when the os.walk() generator yields
exactly once per directory? So whatever work you do on the list of
files you get, you can then put the summary logic immediately after.
Or if you really feel you need a special hook, then write a wrapper for
os.walk(), which takes a hook function as a parameter, and after
yielding each file in a directory, calls the hook. Looks like about 5
lines.

I think you're missing the point. The hook in question has to be
called *immediately after* all the subtrees that are rooted in
subdirectories contained in the current directory have been visited
by os.walk.

I'd love to see your "5 lines" for *that*.

kj
 
P

Peter Otten

kj said:
In <[email protected]> Dave Angel



I think you're missing the point. The hook in question has to be
called *immediately after* all the subtrees that are rooted in
subdirectories contained in the current directory have been visited
by os.walk.

I'd love to see your "5 lines" for *that*.

import os

def find(root, process):
for pdf in os.walk(root, topdown=False):
process(*pdf)

def process(path, dirs, files):
print path

find(".", process)

Peter
 
P

Paul Rubin

kj said:
I think you're missing the point. The hook in question has to be
called *immediately after* all the subtrees that are rooted in
subdirectories contained in the current directory have been visited
by os.walk.

I'd love to see your "5 lines" for *that*.

I'm having trouble understanding the specification. To find the disk
usage (in bytes) of a directory:

import os,stat
def find_disk_usage(dirname):
return sum(sum(os.stat(dirpath+'/'+filename)[stat.ST_SIZE]
for filename in fn_list)
for dirpath, dirlist, fn_list in os.walk(dirname))
 
D

Dave Angel

Peter said:
kj wrote:



import os

def find(root, process):
for pdf in os.walk(root, topdown=False):
process(*pdf)

def process(path, dirs, files):
print path

find(".", process)

Peter
Thanks Peter,

To expand it to five lines, and make it the generator I mentioned,

import os

def find(root, process):
for pdf in os.walk(root, topdown=False):
for file in pdf[2]:
yield os.path.join(pdf[0],file)
process(*pdf)

def process(path, dirs, files):
print "hooked --", path

for fullpath in find("..", process):
print fullpath


This is a generator which yields each file in a directory tree, and
after all the files below a particular directory are processed,
"immediately" calls the hook

DaveA
 
E

Ethan Furman

[snippetty snip]
I think you're missing the point. The hook in question has to be
called *immediately after* all the subtrees that are rooted in
subdirectories contained in the current directory have been visited
by os.walk.

I'd love to see your "5 lines" for *that*.

kj

So now that you've seen a couple examples, perhaps you noticed the flag
"topdown=False"? With that (un)set, I repeat the question -- why do you
need a hook?

~Ethan~
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top