os.walk and recursive deletion

M

Martin Marcher

Hello,

I'm playing around with os.walk and I made up del_tree(path) which I
think is correct (in terms of the algorithm, but not as python wants
it :)).

As soon as some directory is deleted the iterator of os.walk chokes.
OK that is somehow clear to me as it can't be valid anymore since it
can't go to the just deleted directory but it is in the iterator.

I generated the test data with (bash, don't know of other shell
expansion features)

# root="test"
# mkdir $root
# mkdir -p
# $root/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}
# touch $root/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{0,1,2,3,4,5,6,7,8,9}
# touch $root/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{0,1,2,3,4,5,6,7,8,9}

Here goes the script, as far as I tested it it runs fine in test mode.
Now that is just use the print statements like attached below. as soon
as os.rmdir and os.unlink are involved the iterator chokes. How do I
get around that, or what should I use to be able to recursively delete
everything under a certain subdir?

#--snip
import sys
import os
import unittest

def del_tree(path):
walker = os.walk(path)
for entry in walker:
cwd = entry[0]
subdirs = entry[1]
files = entry[2]
print "Checking directory: %s" %(cwd, )
if files:
for file in files:
file = os.path.join(cwd, file)
print "The file is now: %s" % (file, )
#os.unlink(file)
print "OK directory %s has no more files, checking for
subdirs..." % (cwd, )
else:
print "OK directory %s NEVER HAD FILES, checking for
subdirs..." % (cwd, )
if not subdirs:
print "We can delete: %s" % (cwd, )
#os.rmdir(cwd)
else:
for subdir in subdirs:
subdir = os.path.join(cwd, subdir)
print "We need to recurse into: %s" % (subdir, )
del_tree(subdir)
#os.rmdir(path)
print "Removing: %s" % (path, )
#--snap
 
P

Peter Otten

Martin said:
Hello,

I'm playing around with os.walk and I made up del_tree(path) which I
think is correct (in terms of the algorithm, but not as python wants
it :)).

As soon as some directory is deleted the iterator of os.walk chokes.
OK that is somehow clear to me as it can't be valid anymore since it
can't go to the just deleted directory but it is in the iterator.

I generated the test data with (bash, don't know of other shell
expansion features)

# root="test"
# mkdir $root
# mkdir -p
# $root/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}
# touch $root/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{0,1,2,3,4,5,6,7,8,9}
# touch $root/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z}/{0,1,2,3,4,5,6,7,8,9}

Here goes the script, as far as I tested it it runs fine in test mode.
Now that is just use the print statements like attached below. as soon
as os.rmdir and os.unlink are involved the iterator chokes. How do I
get around that, or what should I use to be able to recursively delete
everything under a certain subdir?

#--snip
import sys
import os
import unittest

def del_tree(path):
walker = os.walk(path)
for entry in walker:
cwd = entry[0]
subdirs = entry[1]
files = entry[2]
print "Checking directory: %s" %(cwd, )
if files:
for file in files:
file = os.path.join(cwd, file)
print "The file is now: %s" % (file, )
#os.unlink(file)
print "OK directory %s has no more files, checking for
subdirs..." % (cwd, )
else:
print "OK directory %s NEVER HAD FILES, checking for
subdirs..." % (cwd, )
if not subdirs:
print "We can delete: %s" % (cwd, )
#os.rmdir(cwd)
else:
for subdir in subdirs:
subdir = os.path.join(cwd, subdir)
print "We need to recurse into: %s" % (subdir, )
del_tree(subdir)
#os.rmdir(path)
print "Removing: %s" % (path, )
#--snap

While it is possible to mix recursion and os.walk()...

def del_tree(path):
for cwd, subdirs, files in os.walk(path):
print "Checking directory: %s" %(cwd, )
if files:
for file in files:
file = os.path.join(cwd, file)
print "The file is now: %s" % (file, )
os.unlink(file)
print "OK directory %s has no more files, checking for subdirs..." % (cwd, )
else:
print "OK directory %s NEVER HAD FILES, checking for subdirs..." % (cwd, )
for subdir in subdirs:
subdir = os.path.join(cwd, subdir)
print "We need to recurse into: %s" % (subdir, )
del_tree(subdir)
break # ugly but necessary
print "Removing: %s" % (path, )
os.rmdir(path)

....the idea behind it is that you use it instead of recursion:

def del_tree(root):
for path, dirs, files in os.walk(root, False):
for fn in files:
os.unlink(os.path.join(path, fn))
for dn in dirs:
os.rmdir(os.path.join(path, dn))
os.rmdir(root)

The second parameter to os.walk() ensures that the tree is walked
depth-first, i. e. the contents of a directory are seen before the
directory itself. Personally, I would prefer a recursive implementation
based on os.listdir(),

def del_tree(root):
for name in os.listdir(root):
path = os.path.join(root, name)
if os.path.isdir(path):
del_tree(path)
else:
os.unlink(path)
os.rmdir(root)

an approach that is also taken by shutils.rmtree().

Peter
 
M

Marc 'BlackJack' Rintsch

Hello,

I'm playing around with os.walk and I made up del_tree(path) which I
think is correct (in terms of the algorithm, but not as python wants
it :)).

It's not correct in terms of the algorithm if you take the algorithm of
`os.walk()` into the equation.
Here goes the script, as far as I tested it it runs fine in test mode.
Now that is just use the print statements like attached below. as soon
as os.rmdir and os.unlink are involved the iterator chokes. How do I
get around that, or what should I use to be able to recursively delete
everything under a certain subdir?
#--snip
import sys
import os
import unittest

def del_tree(path):
walker = os.walk(path)

`os.walk()` is itself diving recursivly into the subdirectories…
for entry in walker:
cwd = entry[0]
subdirs = entry[1]
files = entry[2]
print "Checking directory: %s" %(cwd, )
if files:
for file in files:
file = os.path.join(cwd, file)
print "The file is now: %s" % (file, )
#os.unlink(file)
print "OK directory %s has no more files, checking for
subdirs..." % (cwd, )
else:
print "OK directory %s NEVER HAD FILES, checking for
subdirs..." % (cwd, )
if not subdirs:
print "We can delete: %s" % (cwd, )
#os.rmdir(cwd)
else:
for subdir in subdirs:
subdir = os.path.join(cwd, subdir)
print "We need to recurse into: %s" % (subdir, )
del_tree(subdir)

…and here you are calling the your function recursively which then calls
again `os.walk()` on that subdirectory. That's a little bit too much.

Just use `os.listdir()` (and `os.path.isdir()`) in your recursive function.
#os.rmdir(path)
print "Removing: %s" % (path, )
#--snap

Or `shutil.rmtree()`. :)

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top