Problem with glob and filenames containing '[' and ']'

D

David Squire

Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

----

#!/usr/bin/perl

use strict;
use warnings;

use CGI::Deurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',
) {
my $OriginalFileBase = deurlstr($EncodedFile);
$OriginalFileBase =~ s/\.[^.]+$//; # trim extension
$OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
characters that are meta in glob;
print "\$OriginalFileBase = $OriginalFileBase\n";
my @CandidateOrigFiles = glob ("$OriginalFileBase*");
print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
print "\n###########################################################\n";
}

----

Output:

Sep 27 - 9:31pm % ./test.pl
<ENTER THE CGI QUERY. End with CTRL+D>
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]
@CandidateOrigFiles:

###########################################################
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9
@CandidateOrigFiles:
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt.webbed
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
###########################################################


----

As you can see, the first iteration of the for loop produces no matches.
I have included the second, shortened filename, example to demonstrate
that the file I want really does exist. Likewise, at the bash prompt I
can do:

Sep 27 - 9:31pm % ls /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]*
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc

I am at a loss...


DS
 
D

David Squire

David said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

Hi again,

I have reduced this further, getting rid of de-url and a bunch of other
stuff related to my original context. Please see the reduced script and
output below. It seems that having an escaped space as well as an escape
'[' causes the failure to match. See the third last test case.

I hesitate to say it, but this begins to feel like a bug... (covers head).

----


#!/usr/bin/perl

use strict;
use warnings;

print "Directory contents:\n", `ls -1 f*`, "\n";
for my $GlobPattern (
'fred*',
'fred[1]*',
'fred\[1\]*',
'fred\[1]*',
'fred[1\]*',
'fre\ d*',
'fre\ d\[*',
'fre\ d\[1*',
'fre\ d\[1\]*',
'fre?d\[1\]*',
'fre\ d?1\]*',
) {
my @CandidateOrigFiles = glob ($GlobPattern);
print "\n######################################\n";
print "$GlobPattern: \@CandidateOrigFiles:\n", join "\n",
@CandidateOrigFiles;
}

----

Output:

Directory contents:
fred]
fred[1]
fre d[1].doc
fred[[1].doc
fred[1].doc


######################################
fred*: @CandidateOrigFiles:
fred[1]
fred[1].doc
fred[[1].doc
fred]
######################################
fred[1]*: @CandidateOrigFiles:

######################################
fred\[1\]*: @CandidateOrigFiles:
fred[1]
fred[1].doc
######################################
fred\[1]*: @CandidateOrigFiles:
fred[1]
fred[1].doc
######################################
fred[1\]*: @CandidateOrigFiles:
fred[1]
fred[1].doc
######################################
fre\ d*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d\[*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d\[1*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d\[1\]*: @CandidateOrigFiles:

######################################
fre?d\[1\]*: @CandidateOrigFiles:
fre d[1].doc
######################################
fre\ d?1\]*: @CandidateOrigFiles:
fre d[1].doc
 
A

anno4000

David Squire said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).
----

#!/usr/bin/perl

use strict;
use warnings;

use CGI::Deurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',
) {
my $OriginalFileBase = deurlstr($EncodedFile);
$OriginalFileBase =~ s/\.[^.]+$//; # trim extension
$OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
characters that are meta in glob;

You can use quotemeta() instead of your s///. That quotes a little more
(most visibly "/"), but that doesn't hurt.

Anno

[remainder left for reference]
print "\$OriginalFileBase = $OriginalFileBase\n";
my @CandidateOrigFiles = glob ("$OriginalFileBase*");
print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
print "\n###########################################################\n";
}

----

Output:

Sep 27 - 9:31pm % ./test.pl
<ENTER THE CGI QUERY. End with CTRL+D>
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]
@CandidateOrigFiles:

###########################################################
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9
@CandidateOrigFiles:
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt.webbed
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
###########################################################


----

As you can see, the first iteration of the for loop produces no matches.
I have included the second, shortened filename, example to demonstrate
that the file I want really does exist. Likewise, at the bash prompt I
can do:

Sep 27 - 9:31pm % ls /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]*
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc

I am at a loss...


DS
 
D

David Squire

Mumia said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

----

#!/usr/bin/perl

use strict;
use warnings;

use CGI::Deurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',

This creates two strings containing "Short \n assignment"

I think that's going to confuse glob big-time.

No. That's just an artifact of word-wrapping in your newsreader. See my
second, simpler, example.


DS
 
D

David Squire

David Squire said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).

Thanks. Would you be able to try my second, simpler, example too? That
seems to narrow down the oddness.


DS
 
P

Paul Lalli

David said:
I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

----

#!/usr/bin/perl

use strict;
use warnings;

use CGI::Deurl;

for my $EncodedFile (
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt',
'/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9.txt',
) {
my $OriginalFileBase = deurlstr($EncodedFile);
$OriginalFileBase =~ s/\.[^.]+$//; # trim extension
$OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
characters that are meta in glob;
print "\$OriginalFileBase = $OriginalFileBase\n";
my @CandidateOrigFiles = glob ("$OriginalFileBase*");
print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
print "\n###########################################################\n";
}

----

Output:

Sep 27 - 9:31pm % ./test.pl
<ENTER THE CGI QUERY. End with CTRL+D>
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9\[1\]
@CandidateOrigFiles:

###########################################################
$OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
assignment/20331975_week9
@CandidateOrigFiles:
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt
/damocles/documents/ENH1260/2006/2/Short
assignment/20331975_week9%5B1%5D.txt.webbed
/damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
###########################################################

Hmm. Not sure I know what to tell you, as I don't seem able to
reproduce the results....

$ ls filewith\[bracket\]*
filewith[bracket].txt
$ perl -le'print for glob(q{filewith\[bracket\].*})'
filewith[bracket].txt

This is perl, v5.8.4 built for sun4-solaris

Paul Lalli
 
D

David Squire

Michele said:
I'm having trouble using glob to find filenames that contain '[' and

Well I'm a big fan of glob() myself, and I recommend using it
especially when I see people using lower level opendir() & C. in
situations in which it's not strictly necessary, but this may be a
situation in which it may indeed be good to do so.

I've just written a work around to do so :)
']', even though I am escaping those meta-characters. Here is an example
script and output:

However, I don't seem to have that problem:

C:\TEMP>touch foo[bar]

C:\TEMP>touch foo[baz]

C:\TEMP>perl -le "print for glob 'foo\\[*\\]'"
foo[bar]
foo[baz]

Yeah, as you will see from my second post, the critical thing seems to
be the presence of an escaped space as well. Thanks.


DS
 
D

David Squire

Mumia said:
Clearly, an escaped space does not cause the problem. It has something
to do with both an escaped space and an escaped bracket.

Yes, that's what "too" means in the subject line :) It is the
combination that is the problem.

DS
 
A

anno4000

David Squire said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:

I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).

Thanks. Would you be able to try my second, simpler, example too? That
seems to narrow down the oddness.

Well yes, it's the blank in the path name that does it. Here is the
relevant bit from File::Glob, which implements CORE::glob():

Since v5.6.0, Perl's CORE::glob() is implemented in terms of
bsd_glob(). Note that they don't share the same proto-
type--CORE::glob() only accepts a single argument. Due to historical
reasons, CORE::glob() will also split its argument on whitespace,
treating it as multiple patterns, whereas bsd_glob() considers them as
one pattern.

So it's not a bug. The solution would be to use File::Glob::bsd_glob()
directly.

Anno
 
D

David Squire

David Squire said:
Hi folks,

I'm having trouble using glob to find filenames that contain '[' and
']', even though I am escaping those meta-characters. Here is an example
script and output:
I don't know what goes wrong for you. It works for me as expected
(after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
with something that exists on my box).
Thanks. Would you be able to try my second, simpler, example too? That
seems to narrow down the oddness.

Well yes, it's the blank in the path name that does it. Here is the
relevant bit from File::Glob, which implements CORE::glob():

Since v5.6.0, Perl's CORE::glob() is implemented in terms of
bsd_glob(). Note that they don't share the same proto-
type--CORE::glob() only accepts a single argument. Due to historical
reasons, CORE::glob() will also split its argument on whitespace,
treating it as multiple patterns, whereas bsd_glob() considers them as
one pattern.

So it's not a bug. The solution would be to use File::Glob::bsd_glob()
directly.

Thanks for drawing my attention to this. Very non-intuitive and
non-shell-like, despite what perldoc -f glob says.

I am also puzzled that quite a few of my test cases (in my second post
with example code) including escaped blanks worked exactly as I would
have expected. For example (from that post), with files present:

fred]
fred[1]
fre d[1].doc
fred[[1].doc
fred[1].doc

I get, in one case:

######################################
fre\ d*: @CandidateOrigFiles:
fre d[1].doc

I can't see how that would happen if the parts of the pattern on each
side of the blank were treated as separately - but would be glad to be
enlightened.

The much larger script from which this is distilled, also worked as I
expected in almost all cases. I have thousands of cases, all with a
blank in the path, where there is no problem. It only arises in
combination with \[ and \] (and some of those files have other escaped
characters).

I have now written a work-around using opendir/readdir, but still find
this odd.



DS
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top