os.path.walk not pruning descent tree (and I'm not happy with that behavior?)

Discussion in 'Python' started by Joe Ardent, May 28, 2007.

  1. Joe Ardent

    Joe Ardent Guest

    Good day, everybody! From what I can tell from the archives, this is
    everyone's favorite method from the standard lib, and everyone loves
    answering questions about it. Right? :)

    Anyway, my question regards the way that the visit callback modifies
    the names list. Basically, my simple example is:

    ##############################
    def listUndottedDirs( d ):
    dots = re.compile( '\.' )

    def visit( arg, dirname, names ):
    for f in names:
    if dots.match( f ):
    i = names.index( f )
    del names
    else:
    print "%s: %s" % ( dirname, f )

    os.path.walk( d, visit, None )
    ###############################

    Basically, I don't want to visit any hidden subdirs (this is a unix
    system), nor am I interested in dot-files. If I call the function
    like, "listUndottedDirs( '/usr/home/ardent' )", however, EVEN THOUGH
    IT IS REMOVING DOTTED DIRS AND FILES FROM names, it will recurse into
    the dotted directories; eg, if I have ".kde3/" in that directory, it
    will begin listing the contents of /usr/home/ardent/.kde3/ . Here's
    what the documentation says about this method:

    "The visit function may modify names to influence the set of
    directories visited below dirname, e.g. to avoid visiting certain
    parts of the tree. (The object referred to by names must be modified
    in place, using del or slice assignment.)"

    So... What am I missing? Any help would be greatly appreciated.
     
    Joe Ardent, May 28, 2007
    #1
    1. Advertisements

  2. Joe Ardent

    Peter Otten Guest

    I don't know what to make of the smiley, so I'll be explicit: use os.walk()
    instead of os.path.walk().


    Your problem is that you are deleting items from a list while iterating over
    it:

    # WRONG
    .... if name.startswith("."):
    .... del names[names.index(name)]
    ....['.beta', 'gamma']

    Here's one way to avoid that mess:
    ['gamma']

    The slice [:] on the left side is necessary to change the list in-place.

    Peter
     
    Peter Otten, May 28, 2007
    #2
    1. Advertisements

  3. Well, in fact, the preferred (and easier) way is to use os.walk - but
    os.path.walk is fine too.


    There is nothing wrong with os.walk - you are iterating over the names
    list *and* removing elements from it at the same time, and that's not
    good... Some ways to avoid it:

    - iterate over a copy (the [:] is important):

    for fname in names[:]:
    if fname[:1]=='.':
    names.remove(fname)

    - iterate backwards:

    for i in range(len(names)-1, -1, -1):
    fname = names
    if fname[:1]=='.':
    names.remove(fname)

    - collect first and remove later:

    to_be_deleted = [fname for fname in names if fname[:1]=='.']
    for fname in to_be_deleted:
    names.remove[fname]

    - filter and reassign in place (the [:] is important):

    names[:] = [fname for fname in names if fname[:1]!='.']

    (Notice that I haven't used a regular expression, and the remove method)
     
    Gabriel Genellina, May 28, 2007
    #3
  4. I'm really sorry, for all that private mails, thunderbird is awfully
    stupid dealing with mailing lists folder.


    Gabriel Genellina a écrit :


    This is not about iterating backward, this is about iterating over the
    index of each element instead of iterating over the element (which must
    be done begining by the end). In fact this code is both inefficient and
    contains a subtle bug. If two objects compare equals in the list, you
    will remove the wrong one.

    It should be :

    for i in range(len(names)-1, -1, -1):
    if names[:1]=='.':
    del names

    Seems the best here.
    Not so. Unless "names" is referenced in another namespace, simple
    assignment is enough.
     
    Maric Michaud, May 28, 2007
    #4


  5. Yes, sure, this is what I should have written. Thanks for the correction!
    But this is exactly the case; the visit function is called from inside the
    os.path.walk code, and you have to modify the names parameter in-place for
    the caller to notice it (and skip the undesided files and folders).
     
    Gabriel Genellina, May 28, 2007
    #5
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.