Problem with glob and filenames containing '[' and ']'

David Squire · Sep 27, 2006

Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

----

#!/usr/bin/perl

use strict;
use warnings;

use CGI:

eurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',
) {
my $OriginalFileBase = deurlstr($EncodedFile);
$OriginalFileBase =~ s/\.[^.]+$//; # trim extension
$OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
characters that are meta in glob;
print "\$OriginalFileBase = $OriginalFileBase\n";
my @CandidateOrigFiles = glob ("$OriginalFileBase*");
print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
print "\n###########################################################\n";
}

----

Output:

Sep 27 - 9:31pm % ./test.pl
<ENTER THE CGI QUERY. End with CTRL+D>
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]
@CandidateOrigFiles:

###########################################################
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9
@CandidateOrigFiles:
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt.webbed
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
###########################################################

----

As you can see, the first iteration of the for loop produces no matches.
I have included the second, shortened filename, example to demonstrate
that the file I want really does exist. Likewise, at the bash prompt I
can do:

Sep 27 - 9:31pm % ls /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]*
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc

I am at a loss...

DS

David Squire · Sep 27, 2006

David said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

Hi again,

I have reduced this further, getting rid of de-url and a bunch of other
stuff related to my original context. Please see the reduced script and
output below. It seems that having an escaped space as well as an escape
'[' causes the failure to match. See the third last test case.

I hesitate to say it, but this begins to feel like a bug... (covers head).

----

#!/usr/bin/perl

use strict;
use warnings;

print "Directory contents:\n", `ls -1 f*`, "\n";
for my $GlobPattern (
'fred*',
'fred[1]*',
'fred\[1\]*',
'fred\[1]*',
'fred[1\]*',
'fre\ d*',
'fre\ d\[*',
'fre\ d\[1*',
'fre\ d\[1\]*',
'fre?d\[1\]*',
'fre\ d?1\]*',
) {
my @CandidateOrigFiles = glob ($GlobPattern);
print "\n######################################\n";
print "$GlobPattern: \@CandidateOrigFiles:\n", join "\n",
@CandidateOrigFiles;
}

----

Output:

Directory contents:
fred]
fred[1]
fre d[1].doc
fred[[1].doc
fred[1].doc

######################################
fred*: @CandidateOrigFiles:
fred[1]
fred[1].doc
fred[[1].doc
fred]
######################################
fred[1]*: @CandidateOrigFiles:

######################################
fred\[1\]*: @CandidateOrigFiles:
fred[1]
fred[1].doc
######################################
fred\[1]*: @CandidateOrigFiles:
fred[1]
fred[1].doc
######################################
fred[1\]*: @CandidateOrigFiles:
fred[1]
fred[1].doc
######################################
fre\ d*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d\[*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d\[1*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d\[1\]*: @CandidateOrigFiles:

######################################
fre?d\[1\]*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d?1\]*: @CandidateOrigFiles:
fre d[1].doc

anno4000 · Sep 27, 2006

David Squire said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).

----

#!/usr/bin/perl

use strict;
use warnings;

use CGI:eurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',
) {
my $OriginalFileBase = deurlstr($EncodedFile);
$OriginalFileBase =~ s/\.[^.]+$//; # trim extension
$OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
characters that are meta in glob;

You can use quotemeta() instead of your s///. That quotes a little more
(most visibly "/"), but that doesn't hurt.

Anno

[remainder left for reference]

print "\$OriginalFileBase = $OriginalFileBase\n";
my @CandidateOrigFiles = glob ("$OriginalFileBase*");
print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
print "\n###########################################################\n";
}

----

Output:

Sep 27 - 9:31pm % ./test.pl
<ENTER THE CGI QUERY. End with CTRL+D>
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]
@CandidateOrigFiles:

###########################################################
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9
@CandidateOrigFiles:
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt.webbed
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
###########################################################

----

As you can see, the first iteration of the for loop produces no matches.
I have included the second, shortened filename, example to demonstrate
that the file I want really does exist. Likewise, at the bash prompt I
can do:

Sep 27 - 9:31pm % ls /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]*
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc

I am at a loss...

DS

David Squire · Sep 27, 2006

Mumia said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

----

#!/usr/bin/perl

use strict;
use warnings;

use CGI:eurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',

Click to expand...

This creates two strings containing "Short \n assignment"

I think that's going to confuse glob big-time.

No. That's just an artifact of word-wrapping in your newsreader. See my
second, simpler, example.

DS

David Squire · Sep 27, 2006

David Squire said:
David Squire said:

Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

Click to expand...

I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).

Thanks. Would you be able to try my second, simpler, example too? That
seems to narrow down the oddness.

DS

Paul Lalli · Sep 27, 2006

David said:
I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

----

#!/usr/bin/perl

use strict;
use warnings;

use CGI:eurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',
) {
my $OriginalFileBase = deurlstr($EncodedFile);
$OriginalFileBase =~ s/\.[^.]+$//; # trim extension
$OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
characters that are meta in glob;
print "\$OriginalFileBase = $OriginalFileBase\n";
my @CandidateOrigFiles = glob ("$OriginalFileBase*");
print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
print "\n###########################################################\n";
}

----

Output:

Sep 27 - 9:31pm % ./test.pl
<ENTER THE CGI QUERY. End with CTRL+D>
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]
@CandidateOrigFiles:

###########################################################
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9
@CandidateOrigFiles:
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt.webbed
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
###########################################################

Hmm. Not sure I know what to tell you, as I don't seem able to
reproduce the results....

$ ls filewith\[bracket\]*
filewith[bracket].txt
$ perl -le'print for glob(q{filewith\[bracket\].*})'
filewith[bracket].txt

This is perl, v5.8.4 built for sun4-solaris

Paul Lalli

David Squire · Sep 27, 2006

Michele said:
I'm having trouble using glob to find filenames that contain '[' and

Click to expand...

Well I'm a big fan of glob() myself, and I recommend using it
especially when I see people using lower level opendir() & C. in
situations in which it's not strictly necessary, but this may be a
situation in which it may indeed be good to do so.

I've just written a work around to do so

']', even though I am escaping those meta-characters. Here is an example
script and output:

Click to expand...

However, I don't seem to have that problem:

C:\TEMP>touch foo[bar]

C:\TEMP>touch foo[baz]

C:\TEMP>perl -le "print for glob 'foo\\[*\\]'"
foo[bar]
foo[baz]

Yeah, as you will see from my second post, the critical thing seems to
be the presence of an escaped space as well. Thanks.

DS

David Squire · Sep 27, 2006

Mumia said:
Clearly, an escaped space does not cause the problem. It has something
to do with both an escaped space and an escaped bracket.

Yes, that's what "too" means in the subject line

It is the
combination that is the problem.

DS

anno4000 · Sep 27, 2006

David Squire said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

Click to expand...

I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).

Click to expand...

Thanks. Would you be able to try my second, simpler, example too? That
seems to narrow down the oddness.

Well yes, it's the blank in the path name that does it. Here is the
relevant bit from File::Glob, which implements CORE::glob():

Since v5.6.0, Perl's CORE::glob() is implemented in terms of
bsd_glob(). Note that they don't share the same proto-
type--CORE::glob() only accepts a single argument. Due to historical
reasons, CORE::glob() will also split its argument on whitespace,
treating it as multiple patterns, whereas bsd_glob() considers them as
one pattern.

So it's not a bug. The solution would be to use File::Glob::bsd_glob()
directly.

Anno

David Squire · Sep 27, 2006

David Squire said:
David Squire said:

Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:
I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).

Click to expand...

Thanks. Would you be able to try my second, simpler, example too? That
seems to narrow down the oddness.

Click to expand...

Well yes, it's the blank in the path name that does it. Here is the
relevant bit from File::Glob, which implements CORE::glob():

Since v5.6.0, Perl's CORE::glob() is implemented in terms of
bsd_glob(). Note that they don't share the same proto-
type--CORE::glob() only accepts a single argument. Due to historical
reasons, CORE::glob() will also split its argument on whitespace,
treating it as multiple patterns, whereas bsd_glob() considers them as
one pattern.

So it's not a bug. The solution would be to use File::Glob::bsd_glob()
directly.

Thanks for drawing my attention to this. Very non-intuitive and
non-shell-like, despite what perldoc -f glob says.

I am also puzzled that quite a few of my test cases (in my second post
with example code) including escaped blanks worked exactly as I would
have expected. For example (from that post), with files present:

fred]
fred[1]
fre d[1].doc
fred[[1].doc
fred[1].doc

I get, in one case:

######################################
fre\ d*: @CandidateOrigFiles:
fre d[1].doc

I can't see how that would happen if the parts of the pattern on each
side of the blank were treated as separately - but would be glad to be
enlightened.

The much larger script from which this is distilled, also worked as I
expected in almost all cases. I have thousands of cases, all with a
blank in the path, where there is no problem. It only arises in
combination with \[ and \] (and some of those files have other escaped
characters).

I have now written a work-around using opendir/readdir, but still find
this odd.

DS

Problem with displaying character that code number is 219 (after SetConsoleTextAttribute)?	3	Jan 9, 2023
Unittest - testing for filenames and filesize	15	Aug 23, 2012
Decoding no of ways and printing each decode message	2	Jun 1, 2021
How can I fix my pattern coding error in c++	0	Mar 19, 2023
Problem with SMPT mail	4	Jun 7, 2006
walktree browser filenames problem	2	Feb 4, 2005
nested dictionaries and functions in data structures.	0	Jan 7, 2014
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022

Problem with glob and filenames containing '[' and ']'

David Squire

David Squire

anno4000

David Squire

David Squire

Paul Lalli

David Squire

David Squire

anno4000

David Squire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads