Problem with glob and filenames containing '[' and ']'

Discussion in 'Perl Misc' started by David Squire, Sep 27, 2006.

  1. David Squire

    David Squire Guest

    Hi folks,

    I'm having trouble using glob to find filenames that contain '[' and
    ']', even though I am escaping those meta-characters. Here is an example
    script and output:

    ----

    #!/usr/bin/perl

    use strict;
    use warnings;

    use CGI::Deurl;

    for my $EncodedFile (
    '/damocles/documents/ENH1260/2006/2/Short
    assignment/20331975_week9%5B1%5D.txt',
    '/damocles/documents/ENH1260/2006/2/Short
    assignment/20331975_week9.txt',
    ) {
    my $OriginalFileBase = deurlstr($EncodedFile);
    $OriginalFileBase =~ s/\.[^.]+$//; # trim extension
    $OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
    characters that are meta in glob;
    print "\$OriginalFileBase = $OriginalFileBase\n";
    my @CandidateOrigFiles = glob ("$OriginalFileBase*");
    print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
    print "\n###########################################################\n";
    }

    ----

    Output:

    Sep 27 - 9:31pm % ./test.pl
    <ENTER THE CGI QUERY. End with CTRL+D>
    $OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
    assignment/20331975_week9\[1\]
    @CandidateOrigFiles:

    ###########################################################
    $OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
    assignment/20331975_week9
    @CandidateOrigFiles:
    /damocles/documents/ENH1260/2006/2/Short
    assignment/20331975_week9%5B1%5D.txt
    /damocles/documents/ENH1260/2006/2/Short
    assignment/20331975_week9%5B1%5D.txt.webbed
    /damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
    ###########################################################


    ----

    As you can see, the first iteration of the for loop produces no matches.
    I have included the second, shortened filename, example to demonstrate
    that the file I want really does exist. Likewise, at the bash prompt I
    can do:

    Sep 27 - 9:31pm % ls /damocles/documents/ENH1260/2006/2/Short\
    assignment/20331975_week9\[1\]*
    /damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc

    I am at a loss...


    DS
    David Squire, Sep 27, 2006
    #1
    1. Advertising

  2. David Squire

    David Squire Guest

    glob problem: escaped space seems to be significant too (was Re:problem with glob and filenames containing '[' and ']')

    David Squire wrote:
    > Hi folks,
    >
    > I'm having trouble using glob to find filenames that contain '[' and
    > ']', even though I am escaping those meta-characters. Here is an example
    > script and output:


    Hi again,

    I have reduced this further, getting rid of de-url and a bunch of other
    stuff related to my original context. Please see the reduced script and
    output below. It seems that having an escaped space as well as an escape
    '[' causes the failure to match. See the third last test case.

    I hesitate to say it, but this begins to feel like a bug... (covers head).

    ----


    #!/usr/bin/perl

    use strict;
    use warnings;

    print "Directory contents:\n", `ls -1 f*`, "\n";
    for my $GlobPattern (
    'fred*',
    'fred[1]*',
    'fred\[1\]*',
    'fred\[1]*',
    'fred[1\]*',
    'fre\ d*',
    'fre\ d\[*',
    'fre\ d\[1*',
    'fre\ d\[1\]*',
    'fre?d\[1\]*',
    'fre\ d?1\]*',
    ) {
    my @CandidateOrigFiles = glob ($GlobPattern);
    print "\n######################################\n";
    print "$GlobPattern: \@CandidateOrigFiles:\n", join "\n",
    @CandidateOrigFiles;
    }

    ----

    Output:

    Directory contents:
    fred]
    fred[1]
    fre d[1].doc
    fred[[1].doc
    fred[1].doc


    ######################################
    fred*: @CandidateOrigFiles:
    fred[1]
    fred[1].doc
    fred[[1].doc
    fred]
    ######################################
    fred[1]*: @CandidateOrigFiles:

    ######################################
    fred\[1\]*: @CandidateOrigFiles:
    fred[1]
    fred[1].doc
    ######################################
    fred\[1]*: @CandidateOrigFiles:
    fred[1]
    fred[1].doc
    ######################################
    fred[1\]*: @CandidateOrigFiles:
    fred[1]
    fred[1].doc
    ######################################
    fre\ d*: @CandidateOrigFiles:
    fre d[1].doc
    ######################################
    fre\ d\[*: @CandidateOrigFiles:
    fre d[1].doc
    ######################################
    fre\ d\[1*: @CandidateOrigFiles:
    fre d[1].doc
    ######################################
    fre\ d\[1\]*: @CandidateOrigFiles:

    ######################################
    fre?d\[1\]*: @CandidateOrigFiles:
    fre d[1].doc
    ######################################
    fre\ d?1\]*: @CandidateOrigFiles:
    fre d[1].doc

    ----

    DS
    David Squire, Sep 27, 2006
    #2
    1. Advertising

  3. David Squire

    -berlin.de Guest

    David Squire <> wrote in comp.lang.perl.misc:
    > Hi folks,
    >
    > I'm having trouble using glob to find filenames that contain '[' and
    > ']', even though I am escaping those meta-characters. Here is an example
    > script and output:


    I don't know what goes wrong for you. It works for me as expected
    (after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
    with something that exists on my box).

    > ----
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > use CGI::Deurl;
    >
    > for my $EncodedFile (
    > '/damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9%5B1%5D.txt',
    > '/damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9.txt',
    > ) {
    > my $OriginalFileBase = deurlstr($EncodedFile);
    > $OriginalFileBase =~ s/\.[^.]+$//; # trim extension
    > $OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
    > characters that are meta in glob;


    You can use quotemeta() instead of your s///. That quotes a little more
    (most visibly "/"), but that doesn't hurt.

    Anno

    [remainder left for reference]

    > print "\$OriginalFileBase = $OriginalFileBase\n";
    > my @CandidateOrigFiles = glob ("$OriginalFileBase*");
    > print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
    > print "\n###########################################################\n";
    > }
    >
    > ----
    >
    > Output:
    >
    > Sep 27 - 9:31pm % ./test.pl
    > <ENTER THE CGI QUERY. End with CTRL+D>
    > $OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
    > assignment/20331975_week9\[1\]
    > @CandidateOrigFiles:
    >
    > ###########################################################
    > $OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
    > assignment/20331975_week9
    > @CandidateOrigFiles:
    > /damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9%5B1%5D.txt
    > /damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9%5B1%5D.txt.webbed
    > /damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
    > ###########################################################
    >
    >
    > ----
    >
    > As you can see, the first iteration of the for loop produces no matches.
    > I have included the second, shortened filename, example to demonstrate
    > that the file I want really does exist. Likewise, at the bash prompt I
    > can do:
    >
    > Sep 27 - 9:31pm % ls /damocles/documents/ENH1260/2006/2/Short\
    > assignment/20331975_week9\[1\]*
    > /damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
    >
    > I am at a loss...
    >
    >
    > DS
    -berlin.de, Sep 27, 2006
    #3
  4. David Squire

    David Squire Guest

    Mumia W. (reading news) wrote:
    > On 09/27/2006 06:33 AM, David Squire wrote:
    >> Hi folks,
    >>
    >> I'm having trouble using glob to find filenames that contain '[' and
    >> ']', even though I am escaping those meta-characters. Here is an example
    >> script and output:
    >>
    >> ----
    >>
    >> #!/usr/bin/perl
    >>
    >> use strict;
    >> use warnings;
    >>
    >> use CGI::Deurl;
    >>
    >> for my $EncodedFile (
    >> '/damocles/documents/ENH1260/2006/2/Short
    >> assignment/20331975_week9%5B1%5D.txt',
    >> '/damocles/documents/ENH1260/2006/2/Short
    >> assignment/20331975_week9.txt',

    >
    > This creates two strings containing "Short \n assignment"
    >
    > I think that's going to confuse glob big-time.


    No. That's just an artifact of word-wrapping in your newsreader. See my
    second, simpler, example.


    DS
    David Squire, Sep 27, 2006
    #4
  5. David Squire

    David Squire Guest

    -berlin.de wrote:
    > David Squire <> wrote in comp.lang.perl.misc:
    >> Hi folks,
    >>
    >> I'm having trouble using glob to find filenames that contain '[' and
    >> ']', even though I am escaping those meta-characters. Here is an example
    >> script and output:

    >
    > I don't know what goes wrong for you. It works for me as expected
    > (after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
    > with something that exists on my box).


    Thanks. Would you be able to try my second, simpler, example too? That
    seems to narrow down the oddness.


    DS
    David Squire, Sep 27, 2006
    #5
  6. David Squire

    Paul Lalli Guest

    David Squire wrote:
    > I'm having trouble using glob to find filenames that contain '[' and
    > ']', even though I am escaping those meta-characters. Here is an example
    > script and output:
    >
    > ----
    >
    > #!/usr/bin/perl
    >
    > use strict;
    > use warnings;
    >
    > use CGI::Deurl;
    >
    > for my $EncodedFile (
    > '/damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9%5B1%5D.txt',
    > '/damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9.txt',
    > ) {
    > my $OriginalFileBase = deurlstr($EncodedFile);
    > $OriginalFileBase =~ s/\.[^.]+$//; # trim extension
    > $OriginalFileBase =~ s/([\[\]{}?*~\ ,'`"])/\\$1/g; # escape
    > characters that are meta in glob;
    > print "\$OriginalFileBase = $OriginalFileBase\n";
    > my @CandidateOrigFiles = glob ("$OriginalFileBase*");
    > print "\@CandidateOrigFiles:\n", join "\n", @CandidateOrigFiles;
    > print "\n###########################################################\n";
    > }
    >
    > ----
    >
    > Output:
    >
    > Sep 27 - 9:31pm % ./test.pl
    > <ENTER THE CGI QUERY. End with CTRL+D>
    > $OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
    > assignment/20331975_week9\[1\]
    > @CandidateOrigFiles:
    >
    > ###########################################################
    > $OriginalFileBase = /damocles/documents/ENH1260/2006/2/Short\
    > assignment/20331975_week9
    > @CandidateOrigFiles:
    > /damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9%5B1%5D.txt
    > /damocles/documents/ENH1260/2006/2/Short
    > assignment/20331975_week9%5B1%5D.txt.webbed
    > /damocles/documents/ENH1260/2006/2/Short assignment/20331975_week9[1].doc
    > ###########################################################
    >


    Hmm. Not sure I know what to tell you, as I don't seem able to
    reproduce the results....

    $ ls filewith\[bracket\]*
    filewith[bracket].txt
    $ perl -le'print for glob(q{filewith\[bracket\].*})'
    filewith[bracket].txt

    This is perl, v5.8.4 built for sun4-solaris

    Paul Lalli
    Paul Lalli, Sep 27, 2006
    #6
  7. David Squire

    David Squire Guest

    Michele Dondi wrote:
    > On Wed, 27 Sep 2006 12:33:26 +0100, David Squire
    > <> wrote:
    >
    >> I'm having trouble using glob to find filenames that contain '[' and

    >
    > Well I'm a big fan of glob() myself, and I recommend using it
    > especially when I see people using lower level opendir() & C. in
    > situations in which it's not strictly necessary, but this may be a
    > situation in which it may indeed be good to do so.


    I've just written a work around to do so :)

    >> ']', even though I am escaping those meta-characters. Here is an example
    >> script and output:

    >
    > However, I don't seem to have that problem:
    >
    > C:\TEMP>touch foo[bar]
    >
    > C:\TEMP>touch foo[baz]
    >
    > C:\TEMP>perl -le "print for glob 'foo\\[*\\]'"
    > foo[bar]
    > foo[baz]


    Yeah, as you will see from my second post, the critical thing seems to
    be the presence of an escaped space as well. Thanks.


    DS
    David Squire, Sep 27, 2006
    #7
  8. David Squire

    David Squire Guest

    Re: glob problem: escaped space seems to be significant too (was

    Mumia W. (reading news) wrote:

    >
    > Clearly, an escaped space does not cause the problem. It has something
    > to do with both an escaped space and an escaped bracket.


    Yes, that's what "too" means in the subject line :) It is the
    combination that is the problem.

    DS
    David Squire, Sep 27, 2006
    #8
  9. David Squire

    -berlin.de Guest

    David Squire <> wrote in comp.lang.perl.misc:
    > -berlin.de wrote:
    > > David Squire <> wrote in

    > comp.lang.perl.misc:
    > >> Hi folks,
    > >>
    > >> I'm having trouble using glob to find filenames that contain '[' and
    > >> ']', even though I am escaping those meta-characters. Here is an example
    > >> script and output:

    > >
    > > I don't know what goes wrong for you. It works for me as expected
    > > (after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
    > > with something that exists on my box).

    >
    > Thanks. Would you be able to try my second, simpler, example too? That
    > seems to narrow down the oddness.


    Well yes, it's the blank in the path name that does it. Here is the
    relevant bit from File::Glob, which implements CORE::glob():

    Since v5.6.0, Perl's CORE::glob() is implemented in terms of
    bsd_glob(). Note that they don't share the same proto-
    type--CORE::glob() only accepts a single argument. Due to historical
    reasons, CORE::glob() will also split its argument on whitespace,
    treating it as multiple patterns, whereas bsd_glob() considers them as
    one pattern.

    So it's not a bug. The solution would be to use File::Glob::bsd_glob()
    directly.

    Anno
    -berlin.de, Sep 27, 2006
    #9
  10. David Squire

    David Squire Guest

    -berlin.de wrote:
    > David Squire <> wrote in comp.lang.perl.misc:
    >> -berlin.de wrote:
    >>> David Squire <> wrote in

    >> comp.lang.perl.misc:
    >>>> Hi folks,
    >>>>
    >>>> I'm having trouble using glob to find filenames that contain '[' and
    >>>> ']', even though I am escaping those meta-characters. Here is an example
    >>>> script and output:
    >>> I don't know what goes wrong for you. It works for me as expected
    >>> (after replacing /damocles/documents/ENH1260/2006/2/Short assignment/
    >>> with something that exists on my box).

    >> Thanks. Would you be able to try my second, simpler, example too? That
    >> seems to narrow down the oddness.

    >
    > Well yes, it's the blank in the path name that does it. Here is the
    > relevant bit from File::Glob, which implements CORE::glob():
    >
    > Since v5.6.0, Perl's CORE::glob() is implemented in terms of
    > bsd_glob(). Note that they don't share the same proto-
    > type--CORE::glob() only accepts a single argument. Due to historical
    > reasons, CORE::glob() will also split its argument on whitespace,
    > treating it as multiple patterns, whereas bsd_glob() considers them as
    > one pattern.
    >
    > So it's not a bug. The solution would be to use File::Glob::bsd_glob()
    > directly.


    Thanks for drawing my attention to this. Very non-intuitive and
    non-shell-like, despite what perldoc -f glob says.

    I am also puzzled that quite a few of my test cases (in my second post
    with example code) including escaped blanks worked exactly as I would
    have expected. For example (from that post), with files present:

    fred]
    fred[1]
    fre d[1].doc
    fred[[1].doc
    fred[1].doc

    I get, in one case:

    ######################################
    fre\ d*: @CandidateOrigFiles:
    fre d[1].doc

    I can't see how that would happen if the parts of the pattern on each
    side of the blank were treated as separately - but would be glad to be
    enlightened.

    The much larger script from which this is distilled, also worked as I
    expected in almost all cases. I have thousands of cases, all with a
    blank in the path, where there is no problem. It only arises in
    combination with \[ and \] (and some of those files have other escaped
    characters).

    I have now written a work-around using opendir/readdir, but still find
    this odd.



    DS
    David Squire, Sep 27, 2006
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. B.J.
    Replies:
    4
    Views:
    725
    Toby Inkster
    Apr 23, 2005
  2. Georgy Pruss
    Replies:
    15
    Views:
    712
    Tim Roberts
    Dec 1, 2003
  3. Tim Peters
    Replies:
    1
    Views:
    348
    Duncan Booth
    Dec 1, 2003
  4. lameck kassana
    Replies:
    0
    Views:
    279
    lameck kassana
    Feb 26, 2009
  5. Matthew Denner
    Replies:
    1
    Views:
    158
Loading...

Share This Page