os.walk/list

Discussion in 'Python' started by ecu_jon, Mar 20, 2011.

  1. ecu_jon

    ecu_jon Guest

    so i am trying to add md5 checksum calc to my file copy stuff, to make
    sure the source and dest. are same file.
    i implemented it fine with the single file copy part. something like :
    for files in sourcepath:
    f1=file(files ,'rb')
    try:
    shutil.copy2(files,
    os.path.join(destpath,os.path.basename(files)))
    except:
    print "error file"
    f2=file(os.path.join(destpath,os.path.basename(files)), 'rb')
    truth = md5.new(f1.read()).digest() ==
    md5.new(f2.read()).digest()
    if truth == 0:
    print "file copy error"

    this worked swimmingly. i moved on to my backupall function, something
    like
    for (path, dirs, files) in os.walk(source):
    #os.walk drills down thru all the folders of source
    for fname in dirs:
    currentdir = destination+leftover
    try:
    os.mkdir(os.path.join(currentdir,fname),0755)
    except:
    print "error folder"
    for fname in files:
    leftover = path.replace(source, '')
    currentdir = destination+leftover
    f1=file(files ,'rb')
    try:
    shutil.copy2(os.path.join(path,fname),
    os.path.join(currentdir,fname))
    f2 = file(os.path.join(currentdir,fname,files))
    except:
    print "error file"
    truth = md5.new(f1.read()).digest() ==
    md5.new(f2.read()).digest()
    if truth == 0:
    print "file copy error"

    but here, "fname" is a list, not a single file.i didn't really want to
    spend a lot of time on the md5 part. thought it would be an easy add-
    on. i don't really want to write the file names out to a list and
    parse through them one a time doing the calc, but it sounds like i
    will have to do something like that.
    ecu_jon, Mar 20, 2011
    #1
    1. Advertising

  2. ecu_jon

    Peter Otten Guest

    ecu_jon wrote:

    > so i am trying to add md5 checksum calc to my file copy stuff, to make
    > sure the source and dest. are same file.
    > i implemented it fine with the single file copy part. something like :
    > for files in sourcepath:
    > f1=file(files ,'rb')
    > try:
    > shutil.copy2(files,
    > os.path.join(destpath,os.path.basename(files)))
    > except:
    > print "error file"
    > f2=file(os.path.join(destpath,os.path.basename(files)), 'rb')
    > truth = md5.new(f1.read()).digest() ==
    > md5.new(f2.read()).digest()
    > if truth == 0:
    > print "file copy error"
    >
    > this worked swimmingly. i moved on to my backupall function, something
    > like
    > for (path, dirs, files) in os.walk(source):
    > #os.walk drills down thru all the folders of source
    > for fname in dirs:
    > currentdir = destination+leftover
    > try:
    > os.mkdir(os.path.join(currentdir,fname),0755)
    > except:
    > print "error folder"
    > for fname in files:
    > leftover = path.replace(source, '')
    > currentdir = destination+leftover
    > f1=file(files ,'rb')
    > try:
    > shutil.copy2(os.path.join(path,fname),
    > os.path.join(currentdir,fname))
    > f2 = file(os.path.join(currentdir,fname,files))
    > except:
    > print "error file"
    > truth = md5.new(f1.read()).digest() ==
    > md5.new(f2.read()).digest()
    > if truth == 0:
    > print "file copy error"
    >
    > but here, "fname" is a list, not a single file.i didn't really want to
    > spend a lot of time on the md5 part. thought it would be an easy add-
    > on. i don't really want to write the file names out to a list and
    > parse through them one a time doing the calc, but it sounds like i
    > will have to do something like that.


    If you have something working for one file, don't copy the code into the
    os.walk() for-loop, put it into a function, say:

    def safe_copy(sourcefile, destfolder):
    # your code

    Then call that thoroughly tested function from within the os.walk() loop

    for path, folders, files in os.walk(sourceroot):
    destfolder = ... # os.path.relpath() might help here
    # ... (make subdirectories)
    for name in files:
    sourcefile = os.path.join(path, name)
    safe_copy(sourcefile, destfolder)

    If you find a bug in safe_copy() you'll only have to fix it in one place.
    Also, you can test it with a single file which should be easier and faster
    than processing a whole directory tree.

    Generally speaking breaking code into small functions that can be tested
    individually is a powerful technique. And you don't have to stop here, you
    can break safe_copy() into

    def safe_copy(sourcefile, destfolder):
    destfile = ...
    copyfile(sourcefile, destfile)
    if not equal_content(sourcefile, destfile):
    # print a warning or raise an exception

    Sometimes you'll even find that the smaller more specialized routines
    already exist in the standard library.
    Peter Otten, Mar 20, 2011
    #2
    1. Advertising

  3. ecu_jon

    ecu_jon Guest

    yes i agree breaking stuff into smaller chunks is a good way to do it.
    even were i to do something like

    def safe_copy()
    f1=file(files ,'rb')
    f2 = file(os.path.join(currentdir,fname,files))
    truth = md5.new(f1.read()).digest() ==
    md5.new(f2.read()).digest()
    if truth == 0:
    print "file copy error"

    that would probably work for the single file copy functions. but would
    still breakdown during the for ...os.walk(), again because "fname" is
    a list there, and the crypto functions work 1 file at a time. even
    changing crypto functions wouldn't change that.
    ecu_jon, Mar 20, 2011
    #3
  4. ecu_jon

    Peter Otten Guest

    ecu_jon wrote:

    > yes i agree breaking stuff into smaller chunks is a good way to do it.
    > even were i to do something like
    >
    > def safe_copy()
    > f1=file(files ,'rb')
    > f2 = file(os.path.join(currentdir,fname,files))
    > truth = md5.new(f1.read()).digest() ==
    > md5.new(f2.read()).digest()
    > if truth == 0:
    > print "file copy error"
    >
    > that would probably work for the single file copy functions. but would
    > still breakdown during the for ...os.walk(), again because "fname" is
    > a list there, and the crypto functions work 1 file at a time. even
    > changing crypto functions wouldn't change that.


    That's what function parameters are for:

    def safe_copy(fname):
    ....

    for x in fname:
    safe_copy(x)

    That way fname is a filename inside safe_copy() and a list of filenames on
    the global module level. "fname" for a list and "files" for a single file
    are still badly chosen names because they confuse the reader instead of
    clarifying what's going on.

    I strongly recommend that you work through a python tutorial for non-
    programmers before you continue with your efforts.
    Peter Otten, Mar 21, 2011
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. SD
    Replies:
    1
    Views:
    436
  2. Ministeyr
    Replies:
    2
    Views:
    624
    bruno at modulix
    Mar 21, 2006
  3. Ministeyr
    Replies:
    0
    Views:
    266
    Ministeyr
    Mar 21, 2006
  4. Marcus Alves Grando
    Replies:
    7
    Views:
    455
    Marcus Alves Grando
    Nov 14, 2007
  5. harshu010
    Replies:
    0
    Views:
    238
    harshu010
    May 25, 2008
Loading...

Share This Page