Dumb glob question

P

Python Dunce

I've run into an issue with glob and matching filenames with brackets '[]'
in them. The problem comes when I'm using part of such a filename as the
path I'm passing to glob. Here's a trimmed down dumb example. Let's say I
have a directory with the following files in it.

foo.par2
foo.vol0+1.par2
foo.vol1+1.par2
zzz [foo].par2
zzz [foo].vol0+1.par2
zzz [foo].vol1+1.par2

While processing one of the files I want to do certain things in batch so
I've been using glob as a means to get all of the files in a set. The
following code will print the filenames for parity volumes in each set
while working with the base checksum, unless there are brackets in the
name.


#re2 = re.compile(r'vol', re.IGNORECASE)

#for nuke in glob.glob('*.par2'):
# if not re2.search(nuke):
# list = glob.glob(nuke[:-5]+'*vol*')
# for name in list: print os.path.join(os.getcwd(),name)



I'm sure there is something obvious I'm missing. I figured I could use
something like re.escape on the trimmed filename for matching but that
hasn't worked either. Using win32api.FindFiles instead of glob works but
I'd obviously rather do it the _right_ way and have it work properly in
*nix too.
 
W

wittempj

code like below willprint all files ending on 'par2', except tose not
containong 'vol' from the 5th position. is that what you need?
-import glob
-for nuke in glob.glob(r"""c:\temp\*.par2"""):
- try:
- nuke.index('vol', 5)
- print nuke
- except ValueError, e:
- print e
 
P

Python Dunce

code like below willprint all files ending on 'par2', except tose not
containong 'vol' from the 5th position. is that what you need?
-import glob
-for nuke in glob.glob(r"""c:\temp\*.par2"""):
- try:
- nuke.index('vol', 5)
- print nuke
- except ValueError, e:
- print e

Not quite. I'm sorry my example wasn't very clear. While working with any
single file I need to be able to build a list of all the other files in a
particular set. Basically I just need globbing of the base filename.

glob.glob(basename+'.*some_extension')

So if I was working with 'foo.par2' at the moment...

glob.glob(filename[:-5]+'.*par2')

would catch all of the files belonging to the set including 'foo.par2'
'foo.vol0+1.par2' 'foo.vol1+1.par2' etc.

This works great (as expected) until you are working with a filename with
brackets '[]' in it. Then glob just returns an empty list. So if I happen
to be processing 'foo [bar].par2'

glob.glob(filename[:-5]+'.*par2')

doesn't return anything. Using win32api.FindFiles(filename[:-5]+'.*par2')
works perfectly, but I don't want to rely on win32api functions. I hope
that made more sense :).
 
M

Michael Hoffman

Python said:
So if I happen
to be processing 'foo [bar].par2'

glob.glob(filename[:-5]+'.*par2')

doesn't return anything. Using win32api.FindFiles(filename[:-5]+'.*par2')
works perfectly, but I don't want to rely on win32api functions. I hope
that made more sense :).

If you look in the source for glob.py, you will find that it calls the
fnmatch module, and this is the docstring for fnmatch.translate():

"""Translate a shell PATTERN to a regular expression.

There is no way to quote meta-characters.
"""

So you cannot do what you want with glob.

You can replace [] with ? in your glob string, if you are sure that
there won't be other characters there. That's a bit of a hack, and I
wouldn't do it.

In my mind it would probably be best to do:

re_vol = re.compile(re.escape(startpart) + ".*vol.*")
lst = [filename for filename in os.listdir(".") if re_vol.match(filename)]

I changed "list" to "lst" because the former shadows a built-in.
 
P

Python Dunce

Michael Hoffman said:
Python said:
So if I happen
to be processing 'foo [bar].par2'

glob.glob(filename[:-5]+'.*par2')

doesn't return anything. Using
win32api.FindFiles(filename[:-5]+'.*par2') works perfectly, but I don't
want to rely on win32api functions. I hope that made more sense :).

If you look in the source for glob.py, you will find that it calls the
fnmatch module, and this is the docstring for fnmatch.translate():

"""Translate a shell PATTERN to a regular expression.

There is no way to quote meta-characters.
"""

So you cannot do what you want with glob.

You can replace [] with ? in your glob string, if you are sure that
there won't be other characters there. That's a bit of a hack, and I
wouldn't do it.

In my mind it would probably be best to do:

re_vol = re.compile(re.escape(startpart) + ".*vol.*")
lst = [filename for filename in os.listdir(".") if
re_vol.match(filename)]

I changed "list" to "lst" because the former shadows a built-in.

Thanks, that should do the trick! I had tried basically the same thing
once but I was getting back empty lists. I think it was just a brain fart
involving a case sensitive regex that didn't match the files I was testing
it on :/.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top