Is there an alternative to os.walk?

Discussion in 'Python' started by Bruce, Oct 4, 2006.

  1. Bruce

    Bruce Guest

    Hi all,
    I have a question about traversing file systems, and could use some
    help. Because of directories with many files in them, os.walk appears
    to be rather slow. I`m thinking there is a potential for speed-up since
    I don`t need os.walk to report filenames of all the files in every
    directory it visits. Is there some clever way to use os.walk or another
    tool that would provide functionality like os.walk except for the
    listing of the filenames?
     
    Bruce, Oct 4, 2006
    #1
    1. Advertising

  2. Bruce wrote:
    > Hi all,
    > I have a question about traversing file systems, and could use some
    > help. Because of directories with many files in them, os.walk appears
    > to be rather slow.


    Provide more info/code. I suspect it is not os.walk itself that is slow,
    but rather the code that processes its result...

    > I`m thinking there is a potential for speed-up since
    > I don`t need os.walk to report filenames of all the files in every
    > directory it visits. Is there some clever way to use os.walk or another
    > tool that would provide functionality like os.walk except for the
    > listing of the filenames?


    You may want to take a look at os.path.walk then.

    --Irmen
     
    Irmen de Jong, Oct 4, 2006
    #2
    1. Advertising

  3. Bruce

    waylan Guest

    Bruce wrote:
    > Hi all,
    > I have a question about traversing file systems, and could use some
    > help. Because of directories with many files in them, os.walk appears
    > to be rather slow. I`m thinking there is a potential for speed-up since
    > I don`t need os.walk to report filenames of all the files in every
    > directory it visits. Is there some clever way to use os.walk or another
    > tool that would provide functionality like os.walk except for the
    > listing of the filenames?


    You might want to check out the path module [1] (not os.path). The
    following is from the docs:

    > The method path.walk() returns an iterator which steps recursively
    > through a whole directory tree. path.walkdirs() and path.walkfiles()
    > are the same, but they yield only the directories and only the files,
    > respectively.


    Oh, and you can thank Paul Bissex for pointing me to path [2].

    [1]: http://www.jorendorff.com/articles/python/path/
    [2]: http://e-scribe.com/news/289
     
    waylan, Oct 4, 2006
    #3
  4. Bruce

    Bruce Guest

    waylan wrote:
    > Bruce wrote:
    > > Hi all,
    > > I have a question about traversing file systems, and could use some
    > > help. Because of directories with many files in them, os.walk appears
    > > to be rather slow. I`m thinking there is a potential for speed-up since
    > > I don`t need os.walk to report filenames of all the files in every
    > > directory it visits. Is there some clever way to use os.walk or another
    > > tool that would provide functionality like os.walk except for the
    > > listing of the filenames?

    >
    > You might want to check out the path module [1] (not os.path). The
    > following is from the docs:
    >
    > > The method path.walk() returns an iterator which steps recursively
    > > through a whole directory tree. path.walkdirs() and path.walkfiles()
    > > are the same, but they yield only the directories and only the files,
    > > respectively.

    >
    > Oh, and you can thank Paul Bissex for pointing me to path [2].
    >


    > [1]: http://www.jorendorff.com/articles/python/path/
    > [2]: http://e-scribe.com/news/289


    A little late but.. thanks for the replies, was very useful. Here`s
    what I do in this case:

    def search(a_dir):
    valid_dirs = []
    walker = os.walk(a_dir)
    while 1:
    try:
    dirpath, dirnames, filenames = walker.next()
    except StopIteration:
    break
    if dirtest(dirpath,filenames):
    valid_dirs.append(dirpath)
    return valid_dirs

    def dirtest(a_dir):
    testfiles = ['a','b','c']
    for f in testfiles:
    if not os.path.exists(os.path.join(a_dir,f)):
    return 0
    return 1

    I think you`re right - it`s not os.walk that makes this slow, it`s the
    dirtest method that takes so much more time when there are many files
    in a directory. Also, thanks for pointing me to the path module, was
    interesting.
     
    Bruce, Oct 7, 2006
    #4
  5. Bruce

    Tim Roberts Guest

    "Bruce" <> wrote:
    >
    >A little late but.. thanks for the replies, was very useful. Here`s
    >what I do in this case:
    >
    >def search(a_dir):
    > valid_dirs = []
    > walker = os.walk(a_dir)
    > while 1:
    > try:
    > dirpath, dirnames, filenames = walker.next()
    > except StopIteration:
    > break
    > if dirtest(dirpath,filenames):
    > valid_dirs.append(dirpath)
    > return valid_dirs
    >
    >def dirtest(a_dir):
    > testfiles = ['a','b','c']
    > for f in testfiles:
    > if not os.path.exists(os.path.join(a_dir,f)):
    > return 0
    > return 1
    >
    >I think you`re right - it`s not os.walk that makes this slow, it`s the
    >dirtest method that takes so much more time when there are many files
    >in a directory. Also, thanks for pointing me to the path module, was
    >interesting.


    Umm, may I point out that you don't NEED the "os.path.exists" call, because
    you are already being HANDED a list of all the filenames in that directory?
    You could "dirtest" with this much faster routinee:

    def dirtest(a_dir,filenames):
    for f in ['a','b','c']:
    if not f in filenames:
    return 0
    return 1
    --
    - Tim Roberts,
    Providenza & Boekelheide, Inc.
     
    Tim Roberts, Oct 8, 2006
    #5
  6. Bruce

    hanumizzle Guest

    On 10/8/06, Tim Roberts <> wrote:

    > Umm, may I point out that you don't NEED the "os.path.exists" call, because
    > you are already being HANDED a list of all the filenames in that directory?
    > You could "dirtest" with this much faster routinee:
    >
    > def dirtest(a_dir,filenames):
    > for f in ['a','b','c']:
    > if not f in filenames:
    > return 0
    > return 1


    Or False / True for sufficiently new versions of Python. :)

    -- Theerasak
     
    hanumizzle, Oct 8, 2006
    #6
  7. Bruce

    Ant Guest

    The idiomatic way of doing the tree traversal is:

    def search(a_dir):
    valid_dirs = []
    for dirpath, dirnames, filenames in os.walk(a_dir):
    if dirtest(filenames):
    valid_dirs.append(dirpath)
    return valid_dirs

    Also since you are given a list of filenames in the directory, then why
    not just check the list of those files for your test files:

    def dirtest(filenames):
    testfiles = ['a','b','c']
    for f in testfiles:
    if not f in filenames:
    return False
    return False

    You'd have to test this to see if it made a difference in performance,
    but it makes for more readable code
     
    Ant, Oct 8, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David Johnstone
    Replies:
    5
    Views:
    534
    Jim Higson
    Apr 20, 2006
  2. Ramprasad A Padmanabhan

    is there an alternative to strstr

    Ramprasad A Padmanabhan, Oct 27, 2003, in forum: C Programming
    Replies:
    18
    Views:
    1,193
    Jarno A Wuolijoki
    Oct 28, 2003
  3. ina
    Replies:
    3
    Views:
    299
    Peter Otten
    Jul 1, 2005
  4. Marcus Alves Grando
    Replies:
    7
    Views:
    472
    Marcus Alves Grando
    Nov 14, 2007
  5. kiwanuka
    Replies:
    2
    Views:
    338
    Alf P. Steinbach
    Jan 28, 2010
Loading...

Share This Page