synching with os.walk()

A

Andre Meyer

Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André
 
1

120psi

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

I've run into wanting to work with parallel directory structures
before, and what I generally do is something like:

for root, dirs, files in os.walk( dir1 ):
dir2_root = dir2 + root[len(dir1):]
for f in files:
dir1_path = os.path.join( root, f )
dir2_path = os.path.join( dir2_root, f )

Does this work for your needs?
-- Nils
 
P

Paddy

Andre said:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

thanks for your help
André

Walk each tree individually gathering file names relative to the head
of the tree and modification data.

compare the two sets of data to generate:
1. A list of what needs to be copied from the original to the copy.
2. A list of what needs to be copied from the copy to the original

Do the copying.

|You might want to show the user what needs to be done and give them
the option of aborting after generating the copy lists.

- Paddy.
 
P

Paddy

Paddy said:
Walk each tree individually gathering file names relative to the head
of the tree and modification data.

compare the two sets of data to generate:
1. A list of what needs to be copied from the original to the copy.
2. A list of what needs to be copied from the copy to the original

Do the copying.

|You might want to show the user what needs to be done and give them
the option of aborting after generating the copy lists.

- Paddy.
P.S. If you are on a Unix type system you can use tar to do the copying
as you can easily compress the data if it needs to go over a sow link,
and tar will take care of creating any needed directories in the
destination if you create new directories as well as new files.
- Paddy.
 
T

Thomas Ploch

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

I've run into wanting to work with parallel directory structures
before, and what I generally do is something like:

for root, dirs, files in os.walk( dir1 ):
dir2_root = dir2 + root[len(dir1):]
for f in files:
dir1_path = os.path.join( root, f )
dir2_path = os.path.join( dir2_root, f )

Wouldn't it be better to implement tree traversing into a class, then
you can traverse two directory trees at once and can do funny things
with it?

Thomas
 
P

Paddy

Paddy said:
P.S. If you are on a Unix type system you can use tar to do the copying
as you can easily compress the data if it needs to go over a sow link,

Sow links, transfers your data and then may form a tasty sandwich when
cooked.

(The original should, of course, read ...slow...)
- Pad.
 
A

Antoine De Groote

Andre said:
Hi all

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for
walking through two hierarchies at once? I want to synchronise two
directories (just backup for now), but cannot see how I can traverse a
second one. I do this now with os.listdir() recursively, which works
fine, but I am afraid that recursion can become inefficient for large
hierarchies.

thanks for your help
André

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/191017
might be what you are looking for, or at least a starting point...

Regards,
antoine
 
P

petercable

os.walk() is a nice generator for performing actions on all files in a
directory and subdirectories. However, how can one use os.walk() for walking
through two hierarchies at once? I want to synchronise two directories (just
backup for now), but cannot see how I can traverse a second one. I do this
now with os.listdir() recursively, which works fine, but I am afraid that
recursion can become inefficient for large hierarchies.

I wrote a script to perform this function using the dircmp class in the
filecmp module. I did something similar to this:
import filecmp, os, shutil

def backup(d1,d2):
print 'backing up %s to %s' % (d1,d2)
compare = filecmp.dircmp(d1,d2)
for item in compare.left_only:
fullpath = os.path.join(d1, item)
if os.path.isdir(fullpath):
shutil.copytree(fullpath,os.path.join(d2,item))
elif os.path.isfile(fullpath):
shutil.copy2(fullpath,d2)
for item in compare.diff_files:
shutil.copy2(os.path.join(d1,item),d2)
for item in compare.common_dirs:
backup(os.path.join(d1,item),os.path.join(d2,item))

if __name__ == '__main__':
import sys
if len(sys.argv) == 3:
backup(sys.argv[1], sys.argv[2])

My script has some error checking and keeps up to 5 previous versions
of a changed file. I find it very efficient, even with recursion, as it
only actually copies those files that have changed. I sync somewhere
around 5 GB worth of files nightly across the network and I haven't had
any trouble.

Of course, if I just had rsync available, I would use that.

Hope this helps,

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top