parsing directory for certain filetypes

royG · Mar 10, 2008

hi
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist

folder='F:/mysys/code/tstfolder'
parsefolder(folder)

thanks,
RG

sam · Mar 10, 2008

royG napisaÅ‚(a):

i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

Probably this should be rewriten and should be very compact. Maybe you should
grab string:

find $dirname -type f -a $ -name '*.txt' -o -name '*.doc' $

and split by "\n"?

jay graves · Mar 10, 2008

i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

Try the 'glob' module.

....
Jay

Robert Bossy · Mar 10, 2008

royG said:
hi
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""

This las line is unnecessary: variable scope rules in python are a bit
different from what we're used to. You're not required to
declare/initialize a variable, you're only required to assign a value
before it is referenced.

for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):

I don't get it. You've already checked that filenm ends with "txt" or
"doc"... What is the purpose of these three lines?
Btw, again, nmparts=[] is unnecessary.

filenms.append(filenm)
filenms.sort()
filenameslist=[]

Unnecessary initialization.

filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)

numifiles is not used so I guess this line is too much.

print filenameslist
return filenameslist

Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst

I leave you the exercice to add .doc files. But I must say (whoever's
listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
work.

Cheers,
RB

sam · Mar 10, 2008

Robert Bossy napisaÅ‚(a):

I leave you the exercice to add .doc files. But I must say (whoever's
listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
work.

"{" and "}" are bash invention and not POSIX standard unfortunately

jay graves · Mar 10, 2008

Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst

Why the 'no-op' list comprehension? Typo?

....
Jay

Tim Chase · Mar 10, 2008

i wrote a function to parse a given directory and make a sorted list

of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist

folder='F:/mysys/code/tstfolder'
parsefolder(folder)

It seems to me that this is awfully baroque with many unneeded
superfluous variables. Is this not the same functionality (minus
prints, unused result-counting, NOPs, and belt-and-suspenders
extension-checking) as

def parsefolder(dirname):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if fname.lower().endswith('.txt')
or fname.lower().endswith('.doc')
])

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

def parsefolder(dirname, types=['.doc', '.txt']):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if any(
fname.lower().endswith(s)
for s in types)
])

which would allow you to do both

parsefolder('/path/to/wherever/')

and

parsefolder('/path/to/wherever/', ['.xls', '.ppt', '.htm'])

In both cases, you don't define the case where isdir(dirname)
fails. Caveat Implementor.

-tkc

[1] http://docs.python.org/lib/built-in-funcs.html

Robert Bossy · Mar 10, 2008

jay said:
Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst

Click to expand...

Why the 'no-op' list comprehension? Typo?

My mistake, it is:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = glob.glob(path)
lst.sort()
return lst

royG · Mar 11, 2008

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()

path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

RG

Gerard Flanagan · Mar 11, 2008

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

Click to expand...

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()

path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

Click to expand...

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

I don't think you can match multiple patterns directly with glob, but
`fnmatch` - the module used by glob to do check for matches - has a
`translate` function which will convert a glob pattern to a regular
expression (string). So you can do something along the lines of the
following:

---------------------------------------------

import os
from fnmatch import translate
import re

d = '/tmp'
patt1 = '*.log'
patt2 = '*.ini'
patterns = [patt1, patt2]

rx = '|'.join(translate(p) for p in patterns)
patt = re.compile(rx)

for f in os.listdir(d):
if patt.match(f):
print f

jay graves · Mar 11, 2008

On Mar 10, 8:03 pm, Tim Chase wrote:
in the version using glob()

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

use a loop. (untested)

def parsefolder(folder):
lst = []
for pattern in ('*.txt','*.doc'):
path = os.path.normpath(os.path.join(folder, pattern))
lst.extend(glob.glob(path))
lst.sort()
return lst

Tim Chase · Mar 11, 2008

royG said:
In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

Click to expand...

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()

path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

Click to expand...

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..

Though it doesn't use glob, the 2nd solution I gave (the one that
uses the any() function you quoted) should be able to handle an
arbitrary number of extensions...

-tkc

Web Page Parsing/Downloading	1	Nov 22, 2013
Directory Caching, suggestions and comments?	0	May 15, 2014
midi file toolkit	0	Sep 6, 2009
problem with recursion	6	Mar 3, 2005
PEP ? os.listdir enhancement	15	Jun 22, 2005
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
checking for when a file or folder exists, typing problems?	2	Jun 12, 2005
a strange SyntaxError	5	Dec 9, 2007

parsing directory for certain filetypes

royG

sam

jay graves

Robert Bossy

sam

jay graves

Tim Chase

Robert Bossy

royG

Gerard Flanagan

jay graves

Tim Chase

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads