parsing directory for certain filetypes

R

royG

hi
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist


folder='F:/mysys/code/tstfolder'
parsefolder(folder)


thanks,
RG
 
S

sam

royG napisał(a):
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

Probably this should be rewriten and should be very compact. Maybe you should
grab string:

find $dirname -type f -a \( -name '*.txt' -o -name '*.doc' \)

and split by "\n"?
 
J

jay graves

i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

Try the 'glob' module.

....
Jay
 
R

Robert Bossy

royG said:
hi
i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
This las line is unnecessary: variable scope rules in python are a bit
different from what we're used to. You're not required to
declare/initialize a variable, you're only required to assign a value
before it is referenced.

for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
I don't get it. You've already checked that filenm ends with "txt" or
"doc"... What is the purpose of these three lines?
Btw, again, nmparts=[] is unnecessary.
filenms.append(filenm)
filenms.sort()
filenameslist=[]
Unnecessary initialization.
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
numifiles is not used so I guess this line is too much.
print filenameslist
return filenameslist

Personally, I'd use glob.glob:


import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst


I leave you the exercice to add .doc files. But I must say (whoever's
listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
work.

Cheers,
RB
 
S

sam

Robert Bossy napisał(a):
I leave you the exercice to add .doc files. But I must say (whoever's
listening) that I was a bit disappointed that glob('*.{txt,doc}') didn't
work.

"{" and "}" are bash invention and not POSIX standard unfortunately
 
J

jay graves

Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst

Why the 'no-op' list comprehension? Typo?

....
Jay
 
T

Tim Chase

i wrote a function to parse a given directory and make a sorted list
of files with .txt,.doc extensions .it works,but i want to know if it
is too bloated..can this be rewritten in more efficient manner?

here it is...

from string import split
from os.path import isdir,join,normpath
from os import listdir

def parsefolder(dirname):
filenms=[]
folder=dirname
isadr=isdir(folder)
if (isadr):
dirlist=listdir(folder)
filenm=""
for x in dirlist:
filenm=x
if(filenm.endswith(("txt","doc"))):
nmparts=[]
nmparts=split(filenm,'.' )
if((nmparts[1]=='txt') or (nmparts[1]=='doc')):
filenms.append(filenm)
filenms.sort()
filenameslist=[]
filenameslist=[normpath(join(folder,y)) for y in filenms]
numifiles=len(filenameslist)
print filenameslist
return filenameslist


folder='F:/mysys/code/tstfolder'
parsefolder(folder)

It seems to me that this is awfully baroque with many unneeded
superfluous variables. Is this not the same functionality (minus
prints, unused result-counting, NOPs, and belt-and-suspenders
extension-checking) as

def parsefolder(dirname):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if fname.lower().endswith('.txt')
or fname.lower().endswith('.doc')
])

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

def parsefolder(dirname, types=['.doc', '.txt']):
if not isdir(dirname): return
return sorted([
normpath(join(dirname, fname))
for fname in listdir(dirname)
if any(
fname.lower().endswith(s)
for s in types)
])

which would allow you to do both

parsefolder('/path/to/wherever/')

and

parsefolder('/path/to/wherever/', ['.xls', '.ppt', '.htm'])

In both cases, you don't define the case where isdir(dirname)
fails. Caveat Implementor.

-tkc


[1] http://docs.python.org/lib/built-in-funcs.html
 
R

Robert Bossy

jay said:
Personally, I'd use glob.glob:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = [ fn for fn in glob.glob(path) ]
lst.sort()
return lst

Why the 'no-op' list comprehension? Typo?
My mistake, it is:

import os.path
import glob

def parsefolder(folder):
path = os.path.normpath(os.path.join(folder, '*.py'))
lst = glob.glob(path)
lst.sort()
return lst
 
R

royG

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

RG
 
G

Gerard Flanagan

In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

I don't think you can match multiple patterns directly with glob, but
`fnmatch` - the module used by glob to do check for matches - has a
`translate` function which will convert a glob pattern to a regular
expression (string). So you can do something along the lines of the
following:

---------------------------------------------

import os
from fnmatch import translate
import re

d = '/tmp'
patt1 = '*.log'
patt2 = '*.ini'
patterns = [patt1, patt2]

rx = '|'.join(translate(p) for p in patterns)
patt = re.compile(rx)

for f in os.listdir(d):
if patt.match(f):
print f
 
J

jay graves

On Mar 10, 8:03 pm, Tim Chase wrote:
in the version using glob()


is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..
or is there another way?

use a loop. (untested)

def parsefolder(folder):
lst = []
for pattern in ('*.txt','*.doc'):
path = os.path.normpath(os.path.join(folder, pattern))
lst.extend(glob.glob(path))
lst.sort()
return lst
 
T

Tim Chase

royG said:
In Python2.5 (or 2.4 if you implement the any() function, ripped
from the docs[1]), this could be rewritten to be a little more
flexible...something like this (untested):

that was quite a good lesson for a beginner like me..
thanks guys

in the version using glob()
path = os.path.normpath(os.path.join(folder, '*.txt'))
lst = glob.glob(path)

is it possible to check for more than one file extension? here i will
have to create two path variables like
path1 = os.path.normpath(os.path.join(folder, '*.txt'))
path2 = os.path.normpath(os.path.join(folder, '*.doc'))

and then use glob separately..

Though it doesn't use glob, the 2nd solution I gave (the one that
uses the any() function you quoted) should be able to handle an
arbitrary number of extensions...

-tkc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,201
Latest member
KourtneyBe

Latest Threads

Top