parsing directory for certain filetypes

Discussion in 'Python' started by royG, Mar 10, 2008.

  1. royG

    royG Guest

    hi
    i wrote a function to parse a given directory and make a sorted list
    of files with .txt,.doc extensions .it works,but i want to know if it
    is too bloated..can this be rewritten in more efficient manner?

    here it is...

    from string import split
    from os.path import isdir,join,normpath
    from os import listdir

    def parsefolder(dirname):
    filenms=[]
    folder=dirname
    isadr=isdir(folder)
    if (isadr):
    dirlist=listdir(folder)
    filenm=""
    for x in dirlist:
    filenm=x
    if(filenm.endswith(("txt","doc"))):
    nmparts=[]
    nmparts=split(filenm,'.' )
    if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
    filenms.append(filenm)
    filenms.sort()
    filenameslist=[]
    filenameslist=[normpath(join(folder,y)) for y in filenms]
    numifiles=len(filenameslist)
    print filenameslist
    return filenameslist


    folder='F:/mysys/code/tstfolder'
    parsefolder(folder)


    thanks,
    RG
     
    royG, Mar 10, 2008
    #1
    1. Advertising

  2. royG

    sam Guest

    royG napisał(a):

    > i wrote a function to parse a given directory and make a sorted list
    > of files with .txt,.doc extensions .it works,but i want to know if it
    > is too bloated..can this be rewritten in more efficient manner?
    >


    Probably this should be rewriten and should be very compact. Maybe you should
    grab string:

    find $dirname -type f -a \( -name '*.txt' -o -name '*.doc' \)

    and split by "\n"?


    --
    UFO Occupation
    www.totalizm.org
     
    sam, Mar 10, 2008
    #2
    1. Advertising

  3. royG

    jay graves Guest

    Re: parsing directory for certain filetypes

    On Mar 10, 8:57 am, royG <> wrote:
    > i wrote a function to parse a given directory and make a sorted list
    > of files with .txt,.doc extensions .it works,but i want to know if it
    > is too bloated..can this be rewritten in more efficient manner?


    Try the 'glob' module.

    ....
    Jay
     
    jay graves, Mar 10, 2008
    #3
  4. royG

    Robert Bossy Guest

    royG wrote:
    > hi
    > i wrote a function to parse a given directory and make a sorted list
    > of files with .txt,.doc extensions .it works,but i want to know if it
    > is too bloated..can this be rewritten in more efficient manner?
    >
    > here it is...
    >
    > from string import split
    > from os.path import isdir,join,normpath
    > from os import listdir
    >
    > def parsefolder(dirname):
    > filenms=[]
    > folder=dirname
    > isadr=isdir(folder)
    > if (isadr):
    > dirlist=listdir(folder)
    > filenm=""
    >

    This las line is unnecessary: variable scope rules in python are a bit
    different from what we're used to. You're not required to
    declare/initialize a variable, you're only required to assign a value
    before it is referenced.


    > for x in dirlist:
    > filenm=x
    > if(filenm.endswith(("txt","doc"))):
    > nmparts=[]
    > nmparts=split(filenm,'.' )
    > if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
    >

    I don't get it. You've already checked that filenm ends with "txt" or
    "doc"... What is the purpose of these three lines?
    Btw, again, nmparts=[] is unnecessary.

    > filenms.append(filenm)
    > filenms.sort()
    > filenameslist=[]
    >

    Unnecessary initialization.

    > filenameslist=[normpath(join(folder,y)) for y in filenms]
    > numifiles=len(filenameslist)
    >

    numifiles is not used so I guess this line is too much.

    > print filenameslist
    > return filenameslist
    >


    Personally, I'd use glob.glob:


    import os.path
    import glob

    def parsefolder(folder):
    path = os.path.normpath(os.path.join(folder, '*.py'))
    lst = [ fn for fn in glob.glob(path) ]
    lst.sort()
    return lst


    I leave you the exercice to add .doc files. But I must say (whoever's
    listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
    work.

    Cheers,
    RB
     
    Robert Bossy, Mar 10, 2008
    #4
  5. royG

    sam Guest

    Robert Bossy napisał(a):

    > I leave you the exercice to add .doc files. But I must say (whoever's
    > listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
    > work.


    "{" and "}" are bash invention and not POSIX standard unfortunately

    --
    UFO Occupation
    www.totalizm.org
     
    sam, Mar 10, 2008
    #5
  6. royG

    jay graves Guest

    Re: parsing directory for certain filetypes

    On Mar 10, 9:28 am, Robert Bossy <> wrote:
    > Personally, I'd use glob.glob:
    >
    > import os.path
    > import glob
    >
    > def parsefolder(folder):
    > path = os.path.normpath(os.path.join(folder, '*.py'))
    > lst = [ fn for fn in glob.glob(path) ]
    > lst.sort()
    > return lst
    >


    Why the 'no-op' list comprehension? Typo?

    ....
    Jay
     
    jay graves, Mar 10, 2008
    #6
  7. royG

    Tim Chase Guest

    > i wrote a function to parse a given directory and make a sorted list
    > of files with .txt,.doc extensions .it works,but i want to know if it
    > is too bloated..can this be rewritten in more efficient manner?
    >
    > here it is...
    >
    > from string import split
    > from os.path import isdir,join,normpath
    > from os import listdir
    >
    > def parsefolder(dirname):
    > filenms=[]
    > folder=dirname
    > isadr=isdir(folder)
    > if (isadr):
    > dirlist=listdir(folder)
    > filenm=""
    > for x in dirlist:
    > filenm=x
    > if(filenm.endswith(("txt","doc"))):
    > nmparts=[]
    > nmparts=split(filenm,'.' )
    > if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
    > filenms.append(filenm)
    > filenms.sort()
    > filenameslist=[]
    > filenameslist=[normpath(join(folder,y)) for y in filenms]
    > numifiles=len(filenameslist)
    > print filenameslist
    > return filenameslist
    >
    >
    > folder='F:/mysys/code/tstfolder'
    > parsefolder(folder)


    It seems to me that this is awfully baroque with many unneeded
    superfluous variables. Is this not the same functionality (minus
    prints, unused result-counting, NOPs, and belt-and-suspenders
    extension-checking) as

    def parsefolder(dirname):
    if not isdir(dirname): return
    return sorted([
    normpath(join(dirname, fname))
    for fname in listdir(dirname)
    if fname.lower().endswith('.txt')
    or fname.lower().endswith('.doc')
    ])

    In Python2.5 (or 2.4 if you implement the any() function, ripped
    from the docs[1]), this could be rewritten to be a little more
    flexible...something like this (untested):

    def parsefolder(dirname, types=['.doc', '.txt']):
    if not isdir(dirname): return
    return sorted([
    normpath(join(dirname, fname))
    for fname in listdir(dirname)
    if any(
    fname.lower().endswith(s)
    for s in types)
    ])

    which would allow you to do both

    parsefolder('/path/to/wherever/')

    and

    parsefolder('/path/to/wherever/', ['.xls', '.ppt', '.htm'])

    In both cases, you don't define the case where isdir(dirname)
    fails. Caveat Implementor.

    -tkc


    [1] http://docs.python.org/lib/built-in-funcs.html
     
    Tim Chase, Mar 10, 2008
    #7
  8. royG

    Robert Bossy Guest

    Re: parsing directory for certain filetypes

    jay graves wrote:
    > On Mar 10, 9:28 am, Robert Bossy <> wrote:
    >
    >> Personally, I'd use glob.glob:
    >>
    >> import os.path
    >> import glob
    >>
    >> def parsefolder(folder):
    >> path = os.path.normpath(os.path.join(folder, '*.py'))
    >> lst = [ fn for fn in glob.glob(path) ]
    >> lst.sort()
    >> return lst
    >>
    >>

    >
    > Why the 'no-op' list comprehension? Typo?
    >

    My mistake, it is:

    import os.path
    import glob

    def parsefolder(folder):
    path = os.path.normpath(os.path.join(folder, '*.py'))
    lst = glob.glob(path)
    lst.sort()
    return lst
     
    Robert Bossy, Mar 10, 2008
    #8
  9. royG

    royG Guest

    Re: parsing directory for certain filetypes

    On Mar 10, 8:03 pm, Tim Chase wrote:

    > In Python2.5 (or 2.4 if you implement the any() function, ripped
    > from the docs[1]), this could be rewritten to be a little more
    > flexible...something like this (untested):
    >


    that was quite a good lesson for a beginner like me..
    thanks guys

    in the version using glob()
    >path = os.path.normpath(os.path.join(folder, '*.txt'))
    >lst = glob.glob(path)


    is it possible to check for more than one file extension? here i will
    have to create two path variables like
    path1 = os.path.normpath(os.path.join(folder, '*.txt'))
    path2 = os.path.normpath(os.path.join(folder, '*.doc'))

    and then use glob separately..
    or is there another way?

    RG
     
    royG, Mar 11, 2008
    #9
  10. Re: parsing directory for certain filetypes

    On Mar 11, 6:21 am, royG <> wrote:
    > On Mar 10, 8:03 pm, Tim Chase wrote:
    >
    > > In Python2.5 (or 2.4 if you implement the any() function, ripped
    > > from the docs[1]), this could be rewritten to be a little more
    > > flexible...something like this (untested):

    >
    > that was quite a good lesson for a beginner like me..
    > thanks guys
    >
    > in the version using glob()
    >
    > >path = os.path.normpath(os.path.join(folder, '*.txt'))
    > >lst = glob.glob(path)

    >
    > is it possible to check for more than one file extension? here i will
    > have to create two path variables like
    > path1 = os.path.normpath(os.path.join(folder, '*.txt'))
    > path2 = os.path.normpath(os.path.join(folder, '*.doc'))
    >
    > and then use glob separately..
    > or is there another way?
    >


    I don't think you can match multiple patterns directly with glob, but
    `fnmatch` - the module used by glob to do check for matches - has a
    `translate` function which will convert a glob pattern to a regular
    expression (string). So you can do something along the lines of the
    following:

    ---------------------------------------------

    import os
    from fnmatch import translate
    import re

    d = '/tmp'
    patt1 = '*.log'
    patt2 = '*.ini'
    patterns = [patt1, patt2]

    rx = '|'.join(translate(p) for p in patterns)
    patt = re.compile(rx)

    for f in os.listdir(d):
    if patt.match(f):
    print f

    ---------------------------------------------

    hth

    Gerard
     
    Gerard Flanagan, Mar 11, 2008
    #10
  11. royG

    jay graves Guest

    Re: parsing directory for certain filetypes

    On Mar 11, 12:21 am, royG <> wrote:
    > On Mar 10, 8:03 pm, Tim Chase wrote:
    > in the version using glob()
    >
    > >path = os.path.normpath(os.path.join(folder, '*.txt'))
    > >lst = glob.glob(path)

    >
    > is it possible to check for more than one file extension? here i will
    > have to create two path variables like
    > path1 = os.path.normpath(os.path.join(folder, '*.txt'))
    > path2 = os.path.normpath(os.path.join(folder, '*.doc'))
    >
    > and then use glob separately..
    > or is there another way?


    use a loop. (untested)

    def parsefolder(folder):
    lst = []
    for pattern in ('*.txt','*.doc'):
    path = os.path.normpath(os.path.join(folder, pattern))
    lst.extend(glob.glob(path))
    lst.sort()
    return lst
     
    jay graves, Mar 11, 2008
    #11
  12. royG

    Tim Chase Guest

    Re: parsing directory for certain filetypes

    royG wrote:
    > On Mar 10, 8:03 pm, Tim Chase wrote:
    >
    >> In Python2.5 (or 2.4 if you implement the any() function, ripped
    >> from the docs[1]), this could be rewritten to be a little more
    >> flexible...something like this (untested):
    >>

    >
    > that was quite a good lesson for a beginner like me..
    > thanks guys
    >
    > in the version using glob()
    >> path = os.path.normpath(os.path.join(folder, '*.txt'))
    >> lst = glob.glob(path)

    >
    > is it possible to check for more than one file extension? here i will
    > have to create two path variables like
    > path1 = os.path.normpath(os.path.join(folder, '*.txt'))
    > path2 = os.path.normpath(os.path.join(folder, '*.doc'))
    >
    > and then use glob separately..


    Though it doesn't use glob, the 2nd solution I gave (the one that
    uses the any() function you quoted) should be able to handle an
    arbitrary number of extensions...

    -tkc
     
    Tim Chase, Mar 11, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Justin Straube
    Replies:
    2
    Views:
    1,168
  2. Replies:
    1
    Views:
    298
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Aug 25, 2005
  3. embirath

    askopenfilename filetypes problem

    embirath, Jul 2, 2010, in forum: Python
    Replies:
    0
    Views:
    1,330
    embirath
    Jul 2, 2010
  4. Bradley Hintze

    argparse and filetypes

    Bradley Hintze, Mar 22, 2011, in forum: Python
    Replies:
    2
    Views:
    1,368
    Alex Willmer
    Mar 22, 2011
  5. Brandon
    Replies:
    2
    Views:
    177
    Jevon
    Jan 18, 2006
Loading...

Share This Page