os.walk/list

E

ecu_jon

so i am trying to add md5 checksum calc to my file copy stuff, to make
sure the source and dest. are same file.
i implemented it fine with the single file copy part. something like :
for files in sourcepath:
f1=file(files ,'rb')
try:
shutil.copy2(files,
os.path.join(destpath,os.path.basename(files)))
except:
print "error file"
f2=file(os.path.join(destpath,os.path.basename(files)), 'rb')
truth = md5.new(f1.read()).digest() ==
md5.new(f2.read()).digest()
if truth == 0:
print "file copy error"

this worked swimmingly. i moved on to my backupall function, something
like
for (path, dirs, files) in os.walk(source):
#os.walk drills down thru all the folders of source
for fname in dirs:
currentdir = destination+leftover
try:
os.mkdir(os.path.join(currentdir,fname),0755)
except:
print "error folder"
for fname in files:
leftover = path.replace(source, '')
currentdir = destination+leftover
f1=file(files ,'rb')
try:
shutil.copy2(os.path.join(path,fname),
os.path.join(currentdir,fname))
f2 = file(os.path.join(currentdir,fname,files))
except:
print "error file"
truth = md5.new(f1.read()).digest() ==
md5.new(f2.read()).digest()
if truth == 0:
print "file copy error"

but here, "fname" is a list, not a single file.i didn't really want to
spend a lot of time on the md5 part. thought it would be an easy add-
on. i don't really want to write the file names out to a list and
parse through them one a time doing the calc, but it sounds like i
will have to do something like that.
 
P

Peter Otten

ecu_jon said:
so i am trying to add md5 checksum calc to my file copy stuff, to make
sure the source and dest. are same file.
i implemented it fine with the single file copy part. something like :
for files in sourcepath:
f1=file(files ,'rb')
try:
shutil.copy2(files,
os.path.join(destpath,os.path.basename(files)))
except:
print "error file"
f2=file(os.path.join(destpath,os.path.basename(files)), 'rb')
truth = md5.new(f1.read()).digest() ==
md5.new(f2.read()).digest()
if truth == 0:
print "file copy error"

this worked swimmingly. i moved on to my backupall function, something
like
for (path, dirs, files) in os.walk(source):
#os.walk drills down thru all the folders of source
for fname in dirs:
currentdir = destination+leftover
try:
os.mkdir(os.path.join(currentdir,fname),0755)
except:
print "error folder"
for fname in files:
leftover = path.replace(source, '')
currentdir = destination+leftover
f1=file(files ,'rb')
try:
shutil.copy2(os.path.join(path,fname),
os.path.join(currentdir,fname))
f2 = file(os.path.join(currentdir,fname,files))
except:
print "error file"
truth = md5.new(f1.read()).digest() ==
md5.new(f2.read()).digest()
if truth == 0:
print "file copy error"

but here, "fname" is a list, not a single file.i didn't really want to
spend a lot of time on the md5 part. thought it would be an easy add-
on. i don't really want to write the file names out to a list and
parse through them one a time doing the calc, but it sounds like i
will have to do something like that.

If you have something working for one file, don't copy the code into the
os.walk() for-loop, put it into a function, say:

def safe_copy(sourcefile, destfolder):
# your code

Then call that thoroughly tested function from within the os.walk() loop

for path, folders, files in os.walk(sourceroot):
destfolder = ... # os.path.relpath() might help here
# ... (make subdirectories)
for name in files:
sourcefile = os.path.join(path, name)
safe_copy(sourcefile, destfolder)

If you find a bug in safe_copy() you'll only have to fix it in one place.
Also, you can test it with a single file which should be easier and faster
than processing a whole directory tree.

Generally speaking breaking code into small functions that can be tested
individually is a powerful technique. And you don't have to stop here, you
can break safe_copy() into

def safe_copy(sourcefile, destfolder):
destfile = ...
copyfile(sourcefile, destfile)
if not equal_content(sourcefile, destfile):
# print a warning or raise an exception

Sometimes you'll even find that the smaller more specialized routines
already exist in the standard library.
 
E

ecu_jon

yes i agree breaking stuff into smaller chunks is a good way to do it.
even were i to do something like

def safe_copy()
f1=file(files ,'rb')
f2 = file(os.path.join(currentdir,fname,files))
truth = md5.new(f1.read()).digest() ==
md5.new(f2.read()).digest()
if truth == 0:
print "file copy error"

that would probably work for the single file copy functions. but would
still breakdown during the for ...os.walk(), again because "fname" is
a list there, and the crypto functions work 1 file at a time. even
changing crypto functions wouldn't change that.
 
P

Peter Otten

ecu_jon said:
yes i agree breaking stuff into smaller chunks is a good way to do it.
even were i to do something like

def safe_copy()
f1=file(files ,'rb')
f2 = file(os.path.join(currentdir,fname,files))
truth = md5.new(f1.read()).digest() ==
md5.new(f2.read()).digest()
if truth == 0:
print "file copy error"

that would probably work for the single file copy functions. but would
still breakdown during the for ...os.walk(), again because "fname" is
a list there, and the crypto functions work 1 file at a time. even
changing crypto functions wouldn't change that.

That's what function parameters are for:

def safe_copy(fname):
....

for x in fname:
safe_copy(x)

That way fname is a filename inside safe_copy() and a list of filenames on
the global module level. "fname" for a list and "files" for a single file
are still badly chosen names because they confuse the reader instead of
clarifying what's going on.

I strongly recommend that you work through a python tutorial for non-
programmers before you continue with your efforts.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top