walk directory & ignore all files/directories begin with '.'

Discussion in 'Python' started by albert kao, May 13, 2010.

  1. albert kao

    albert kao Guest

    I want to walk a directory and ignore all the files or directories
    which names begin in '.' (e.g. '.svn').
    Then I will process all the files.
    My test program walknodot.py does not do the job yet.
    Python version is 3.1 on windows XP.
    Please help.

    Code:
    #!c:/Python31/python.exe -u
    import os
    import re
    
    path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
    for dirpath, dirs, files in os.walk(path):
        print ("dirpath " + dirpath)
        p = re.compile('\\\.(\w)+$')
        if p.match(dirpath):
            continue
        print ("dirpath " + dirpath)
        for dir in dirs:
            print ("dir " + dir)
            if dir.startswith('.'):
                continue
    
            print (files)
            for filename in files:
                print ("filename " + filename)
                if filename.startswith('.'):
                    continue
                print ("dirpath filename " + dirpath + "\\" + filename)
        	    # process the files here
    
    C:\python>walknodot.py
    dirpath C:\test\com.comp.hw.prod.proj.war\bin
    dirpath C:\test\com.comp.hw.prod.proj.war\bin
    dir .svn
    dir com
    []
    dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    ....

    I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
    twice.
    Please help.
    albert kao, May 13, 2010
    #1
    1. Advertising

  2. albert kao

    MRAB Guest

    albert kao wrote:
    > I want to walk a directory and ignore all the files or directories
    > which names begin in '.' (e.g. '.svn').
    > Then I will process all the files.
    > My test program walknodot.py does not do the job yet.
    > Python version is 3.1 on windows XP.
    > Please help.
    >
    >
    Code:
    > #!c:/Python31/python.exe -u
    > import os
    > import re
    > 
    > path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
    > for dirpath, dirs, files in os.walk(path):
    >     print ("dirpath " + dirpath)
    >     p = re.compile('\\\.(\w)+$')
    >     if p.match(dirpath):
    >         continue
    >     print ("dirpath " + dirpath)
    >     for dir in dirs:
    >         print ("dir " + dir)
    >         if dir.startswith('.'):
    >             continue
    > 
    >         print (files)
    >         for filename in files:
    >             print ("filename " + filename)
    >             if filename.startswith('.'):
    >                 continue
    >             print ("dirpath filename " + dirpath + "\\" + filename)
    >     	    # process the files here
    > 
    >
    > C:\python>walknodot.py
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin
    > dir .svn
    > dir com
    > []
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    > ...
    >
    > I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
    > twice.
    > Please help.


    The problem is with your use of the 'match' method, which will look for
    a match only at the start of the string. You need to use the 'search'
    method instead.

    The regular expression is also incorrect. The string literal:

    '\\\.(\w)+$'

    passes the characters:

    \\.(\w)+$

    to the re module as the regular expression, which will match a
    backslash, then any character, then a word, then the end of the string.
    What you want is:

    \\\.\w+$

    (you don't need the parentheses) which is best expressed as the 'raw'
    string literal:

    r'\\\.\w+$'
    MRAB, May 13, 2010
    #2
    1. Advertising

  3. albert kao

    albert kao Guest

    On May 13, 3:10 pm, MRAB <> wrote:
    > albert kao wrote:
    > > I want to walk a directory and ignore all the files or directories
    > > which names begin in '.' (e.g. '.svn').
    > > Then I will process all the files.
    > > My test program walknodot.py does not do the job yet.
    > > Python version is 3.1 on windows XP.
    > > Please help.

    >
    > >
    Code:
    > > #!c:/Python31/python.exe -u
    > > import os
    > > import re[/color]
    >[color=green]
    > > path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
    > > for dirpath, dirs, files in os.walk(path):
    > >     print ("dirpath " + dirpath)
    > >     p = re.compile('\\\.(\w)+$')
    > >     if p.match(dirpath):
    > >         continue
    > >     print ("dirpath " + dirpath)
    > >     for dir in dirs:
    > >         print ("dir " + dir)
    > >         if dir.startswith('.'):
    > >             continue[/color]
    >[color=green]
    > >         print (files)
    > >         for filename in files:
    > >             print ("filename " + filename)
    > >             if filename.startswith('.'):
    > >                 continue
    > >             print ("dirpath filename " + dirpath + "\\" + filename)
    > >                # process the files here
    > > 

    >
    > > C:\python>walknodot.py
    > > dirpath C:\test\com.comp.hw.prod.proj.war\bin
    > > dirpath C:\test\com.comp.hw.prod.proj.war\bin
    > > dir .svn
    > > dir com
    > > []
    > > dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    > > dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    > > ...

    >
    > > I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
    > > twice.
    > > Please help.

    >
    > The problem is with your use of the 'match' method, which will look for
    > a match only at the start of the string. You need to use the 'search'
    > method instead.
    >
    > The regular expression is also incorrect. The string literal:
    >
    >      '\\\.(\w)+$'
    >
    > passes the characters:
    >
    >      \\.(\w)+$
    >
    > to the re module as the regular expression, which will match a
    > backslash, then any character, then a word, then the end of the string.
    > What you want is:
    >
    >      \\\.\w+$
    >
    > (you don't need the parentheses) which is best expressed as the 'raw'
    > string literal:
    >
    >      r'\\\.\w+$'

    Following your advice and add the case for C:\test
    \com.comp.hw.prod.proj.war\bin\.svn\tmp
    p = re.compile(r'\\\.\w+$')
    if p.search(dirpath):
    continue
    p = re.compile(r'\\\.\w+\\')
    if p.search(dirpath):
    continue

    Problem is solved.
    Thanks.
    albert kao, May 13, 2010
    #3
  4. albert kao

    Tim Chase Guest

    On 05/13/2010 12:58 PM, albert kao wrote:
    > I want to walk a directory and ignore all the files or directories
    > which names begin in '.' (e.g. '.svn').
    > Then I will process all the files.
    > My test program walknodot.py does not do the job yet.
    > Python version is 3.1 on windows XP.
    > Please help.
    >
    >
    Code:
    > #!c:/Python31/python.exe -u
    > import os
    > import re
    >
    > path = "C:\\test\\com.comp.hw.prod.proj.war\\bin"
    > for dirpath, dirs, files in os.walk(path):
    >      print ("dirpath " + dirpath)
    >      p = re.compile('\\\.(\w)+$')
    >      if p.match(dirpath):
    >          continue
    >      print ("dirpath " + dirpath)
    >      for dir in dirs:
    >          print ("dir " + dir)
    >          if dir.startswith('.'):
    >              continue
    >
    >          print (files)
    >          for filename in files:
    >              print ("filename " + filename)
    >              if filename.startswith('.'):
    >                  continue
    >              print ("dirpath filename " + dirpath + "\\" + filename)
    >      	    # process the files here
    > 
    >
    > C:\python>walknodot.py
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin
    > dir .svn
    > dir com
    > []
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    > dirpath C:\test\com.comp.hw.prod.proj.war\bin\.svn
    > ...
    >
    > I do not expect C:\test\com.comp.hw.prod.proj.war\bin\.svn to appear
    > twice.


    Note that the first time .svn appears, it's as "dir .svn" while
    the second time it appears, it's via "dirpath ...\.svn"

    If you don't modify the list of dirs in place, os.walk will
    descend into all the dirs by default. (Also, you shouldn't mask
    the built-in dir() function by naming your variables "dir")

    While it can be detected with regexps, I like the clarity of just
    using ".startswith()" on the strings, producing something like:

    for curdir, dirs, files in os.walk(root):
    # modify "dirs" in place to prevent
    # future code in os.walk from seeing those
    # that start with "."
    dirs[:] = [d for d in dirs if not d.startswith('.')]

    print curdir
    for f in files:
    if f.startswith('.'): continue
    print (os.path.join(curdir, f))

    -tkc
    Tim Chase, May 14, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marcus Alves Grando
    Replies:
    7
    Views:
    457
    Marcus Alves Grando
    Nov 14, 2007
  2. harshu010
    Replies:
    0
    Views:
    239
    harshu010
    May 25, 2008
  3. albert kao
    Replies:
    2
    Views:
    229
    James Mills
    May 13, 2010
  4. albert kao
    Replies:
    5
    Views:
    668
    Walter Wefft
    May 16, 2010
  5. Adam Petrie
    Replies:
    8
    Views:
    294
    Adam Petrie
    Oct 11, 2004
Loading...

Share This Page