Module to match file names against a wildcard spec?

Discussion in 'Perl Misc' started by Henry Law, Jun 15, 2005.

  1. Henry Law

    Henry Law Guest

    I've searched CPAN and the web for an answer to this without finding
    anything (but I confess I found it hard to structure a query so I may
    have missed something). Maybe someone can point me in the right
    direction.

    A Perl program I'm writing reads in file names from a specified
    directory amd then processes them (how doesn't matter). If
    subdirectories are found they are processed recursively.

    I need to be able to restrict its operation by specifying groups of
    files via a wild card; the control syntax looks a bit like this

    # Include contents of C:\foo and all its subdirectories
    include C:\foo
    # But don't do text files in the root of \foo
    exclude C:\foo\*.txt
    # Note that text files elsewhere in the \foo tree, such as
    # C:\foo\bar\bletch.txt should be processed.

    (The above is Windows, obviously, but I need to write this so it works
    on Unix too).

    I'm getting really tangled up trying to turn my exclude specifications
    into regexes which I can then use to exclude relevant files; not only
    am I not a very experienced Perl coder but the logic of the task turns
    out to be quite complicated. For example

    #! /usr/bin/perl
    use strict;
    use warnings;
    my @names =
    ("F:/NOTES/DATA/ABBC2.NSF","F:/NOTES/DATA/FURBLE/ABBC2.NSF");
    my $excl_spec = "F:/NOTES/DATA/.*\.NSF";

    for (@names) {
    if (/$excl_spec/i) {
    print "$_ matches\n";
    } else {
    print "$_ doesn't match\n";
    }
    }

    When run this gives
    F:/NOTES/DATA/ABBC2.NSF matches
    F:/NOTES/DATA/FURBLE/ABBC2.NSF matches

    .... which isn't what I want because the .* eats up the additional
    subdirectory FURBLE as well.

    I've looked at File::Spec and File::CheckTree without finding what I
    want. Can anyone suggest either (1) A module that would help with
    wildcard processing of file names, or (2) A better way of coding this
    kind of thing?
    --

    Henry Law <>< Manchester, England
    Henry Law, Jun 15, 2005
    #1
    1. Advertising

  2. On 2005-06-15, Henry Law scribbled these
    curious markings:
    > I've searched CPAN and the web for an answer to this without finding
    > anything (but I confess I found it hard to structure a query so I may
    > have missed something). Maybe someone can point me in the right
    > direction.
    >
    > A Perl program I'm writing reads in file names from a specified
    > directory amd then processes them (how doesn't matter). If
    > subdirectories are found they are processed recursively.
    > [...]
    > I've looked at File::Spec and File::CheckTree without finding what I
    > want. Can anyone suggest either (1) A module that would help with
    > wildcard processing of file names, or (2) A better way of coding this
    > kind of thing?


    File::Find perhaps? Maybe File::Find::Rule if you find File::Find too
    difficult (though I can't imagine why; I've found it to be delightfully
    easy-to-use)?

    Best Regards,
    Christopher Nehren
    --
    I abhor a system designed for the "user", if that word is a coded
    pejorative meaning "stupid and unsophisticated". -- Ken Thompson
    If you ask the wrong people questions, you get "Joel on Software".
    Unix is user friendly. However, it isn't idiot friendly.
    Christopher Nehren, Jun 16, 2005
    #2
    1. Advertising

  3. Henry Law

    Henry Law Guest

    On 15 Jun 2005 23:32:05 GMT, Christopher Nehren
    <> wrote:

    >On 2005-06-15, Henry Law scribbled these
    >curious markings:


    >> A Perl program I'm writing reads in file names from a specified
    >> directory amd then processes them (how doesn't matter). If
    >> subdirectories are found they are processed recursively.
    >> [...]
    >> I've looked at File::Spec and File::CheckTree without finding what I
    >> want. Can anyone suggest either (1) A module that would help with
    >> wildcard processing of file names, or (2) A better way of coding this
    >> kind of thing?

    >
    >File::Find perhaps?


    Yes, I'm familiar with File::Find and there are two reasons why I'm
    not using it. Firstly I can't prevent it from scanning sub-trees

    include C:\some\path
    exclude C:\some\path\huge\subdirectory

    and secondly when it invokes its "wanted" function I'm still faced
    with working out whether $File::Find::name matches my "exclude"
    specification, which is the bit I'm stuck on. My tree-following logic
    probably isn't perfect but it works well enough.
    --

    Henry Law <>< Manchester, England
    Henry Law, Jun 16, 2005
    #3
  4. Henry Law

    Chris Guest

    Henry Law wrote:
    > On 15 Jun 2005 23:32:05 GMT, Christopher Nehren
    > <> wrote:
    >
    >
    >>On 2005-06-15, Henry Law scribbled these
    >>curious markings:

    >
    >
    >>>A Perl program I'm writing reads in file names from a specified
    >>>directory amd then processes them (how doesn't matter). If
    >>>subdirectories are found they are processed recursively.
    >>>[...]
    >>>I've looked at File::Spec and File::CheckTree without finding what I
    >>>want. Can anyone suggest either (1) A module that would help with
    >>>wildcard processing of file names, or (2) A better way of coding this
    >>>kind of thing?

    >>
    >>File::Find perhaps?

    >
    >
    > Yes, I'm familiar with File::Find and there are two reasons why I'm
    > not using it. Firstly I can't prevent it from scanning sub-trees
    >
    > include C:\some\path
    > exclude C:\some\path\huge\subdirectory


    File::Find has a "preprocess" option that may help here. From the
    documentation:

    "The value should be a code reference. This code reference is used to
    preprocess the current directory. <snip> Your preprocessing function
    is called after readdir(), but before the loop that calls the wanted()
    function. <snip> The code can be used to sort the file/directory names
    alphabetically, numerically, or to filter out directory entries based on
    their name alone."
    >
    > and secondly when it invokes its "wanted" function I'm still faced
    > with working out whether $File::Find::name matches my "exclude"
    > specification, which is the bit I'm stuck on. My tree-following logic
    > probably isn't perfect but it works well enough.


    It looks like the problem is in your regex. You put:

    my $excl_spec = "F:/NOTES/DATA/.*\.NSF";

    which matched "F:/NOTES/DATA/ABBC2.NSF" and
    "F:/NOTES/DATA/FURBLE/ABBC2.NSF". The reason it matched the second
    string is because of the ".*" - you probably don't want to match ANY
    character. For instance, you probably don't want to match the directory
    separator "/". Try it with:

    my $excl_spec = "F:/NOTES/DATA/[^/]*\.NSF";

    That should only match .NSF files in that directory.

    Hope that helps :)

    -chris
    Chris, Jun 16, 2005
    #4
  5. Henry Law

    Henry Law Guest

    On Wed, 15 Jun 2005 23:10:03 -0700, Chris <>
    wrote:

    >Henry Law wrote:


    >> Yes, I'm familiar with File::Find and there are two reasons why I'm
    >> not using it. Firstly I can't prevent it from scanning sub-trees
    >>
    >> include C:\some\path
    >> exclude C:\some\path\huge\subdirectory

    >
    >File::Find has a "preprocess" option that may help here. From the
    >documentation:


    Ah, I'd looked at this but not in the right way; I see what you mean.
    If I want to exclude certain subdirectories from being processed I can
    drop them out of the list that the "preprocess" subroutine returns.
    Neat; thank you.

    >It looks like the problem is in your regex. You put:
    >
    > my $excl_spec = "F:/NOTES/DATA/.*\.NSF";
    >
    >which matched "F:/NOTES/DATA/ABBC2.NSF" and
    >"F:/NOTES/DATA/FURBLE/ABBC2.NSF". The reason it matched the second
    >string is because of the ".*" - you probably don't want to match ANY
    >character. For instance, you probably don't want to match the directory
    >separator "/". Try it with:
    >
    > my $excl_spec = "F:/NOTES/DATA/[^/]*\.NSF";


    Yes, I can see that now. I'm re-casting the exclusion checking to
    split the checked file name and the exclude specification into their
    component parts (F:, NOTES, DATA, .*\.NSF) and then I can immediately
    tell if the two aren't at the same depth in the tree, before doing
    regex-type matches on the respective parts. I've not shot all the
    bugs yet but it looks promising.

    Thanks for all the help.
    --

    Henry Law <>< Manchester, England
    Henry Law, Jun 16, 2005
    #5
  6. Henry Law

    Anno Siegel Guest

    Henry Law <> wrote in comp.lang.perl.misc:
    > I've searched CPAN and the web for an answer to this without finding
    > anything (but I confess I found it hard to structure a query so I may
    > have missed something). Maybe someone can point me in the right
    > direction.
    >
    > A Perl program I'm writing reads in file names from a specified
    > directory amd then processes them (how doesn't matter). If
    > subdirectories are found they are processed recursively.
    >
    > I need to be able to restrict its operation by specifying groups of
    > files via a wild card; the control syntax looks a bit like this
    >
    > # Include contents of C:\foo and all its subdirectories
    > include C:\foo
    > # But don't do text files in the root of \foo
    > exclude C:\foo\*.txt
    > # Note that text files elsewhere in the \foo tree, such as
    > # C:\foo\bar\bletch.txt should be processed.
    >
    > (The above is Windows, obviously, but I need to write this so it works
    > on Unix too).
    >
    > I'm getting really tangled up trying to turn my exclude specifications
    > into regexes which I can then use to exclude relevant files; not only
    > am I not a very experienced Perl coder but the logic of the task turns
    > out to be quite complicated. For example


    Do you actually need that?

    A glob-to-regex translator wouldn't be very hard to write (I think).
    The hard part is getting the specification right for all kinds of
    file system with so many variants of glob around. That is probably
    why there isn't one in Regex::Common, where it would belong.

    However, since everything you want to include or exclude are actual
    files in a file system (right?), you don't have to do that, you can
    use Perl's glob() function together with File::Find. Here's a sketch:

    use File::Find;

    my $dir = 'c:\foo';
    my $exclude = 'C:\foo\*.txt'
    @exclude{ glob( $exclude)} = ();

    find sub {
    return if exists $exclude{ $File::Find::name};
    # process file
    }, $dir;

    Anno
    Anno Siegel, Jun 16, 2005
    #6
  7. Henry Law

    Anno Siegel Guest

    Henry Law <> wrote in comp.lang.perl.misc:
    > I've searched CPAN and the web for an answer to this without finding
    > anything (but I confess I found it hard to structure a query so I may
    > have missed something). Maybe someone can point me in the right
    > direction.
    >
    > A Perl program I'm writing reads in file names from a specified
    > directory amd then processes them (how doesn't matter). If
    > subdirectories are found they are processed recursively.
    >
    > I need to be able to restrict its operation by specifying groups of
    > files via a wild card; the control syntax looks a bit like this
    >
    > # Include contents of C:\foo and all its subdirectories
    > include C:\foo
    > # But don't do text files in the root of \foo
    > exclude C:\foo\*.txt
    > # Note that text files elsewhere in the \foo tree, such as
    > # C:\foo\bar\bletch.txt should be processed.
    >
    > (The above is Windows, obviously, but I need to write this so it works
    > on Unix too).
    >
    > I'm getting really tangled up trying to turn my exclude specifications
    > into regexes which I can then use to exclude relevant files; not only
    > am I not a very experienced Perl coder but the logic of the task turns
    > out to be quite complicated. For example


    Do you actually need that?

    A glob-to-regex translator wouldn't be very hard to write (I think).
    The hard part is getting the specification right for all kinds of
    file system with so many variants of glob around. That is probably
    why there isn't one in Regex::Common, where it would belong.

    However, since everything you want to include or exclude are actual
    files in a file system (right?), you don't have to do that, you can
    use Perl's glob() function together with File::Find. Here's a sketch:

    use File::Find;

    my $dir = 'c:\foo';
    my $exclude = 'C:\foo\*.txt'
    my %exclude;
    @exclude{ glob( $exclude)} = ();

    find sub {
    return if exists $exclude{ $File::Find::name};
    # process file
    }, $dir;

    Anno
    Anno Siegel, Jun 16, 2005
    #7
  8. Henry Law

    Henry Law Guest

    On 16 Jun 2005 09:48:30 GMT, -berlin.de (Anno
    Siegel) wrote:

    >However, since everything you want to include or exclude are actual
    >files in a file system (right?),


    Indeed. Just files or complete sub-directories.

    >you don't have to do that, you can
    >use Perl's glob() function together with File::Find. Here's a sketch:
    >
    > use File::Find;
    >
    > my $dir = 'c:\foo';
    > my $exclude = 'C:\foo\*.txt'
    > @exclude{ glob( $exclude)} = ();


    Uuh ... this is where I feel like the sorcerer's apprentice: totally
    out of my depth. There are Perl constructs here that I simply don't
    recognise:

    @exclude{ glob( $exclude)} = ();
    ^ ^ ^
    | | |
    | 1. why don't we have to declare "@exclude" with "my"?
    | |
    | 2. That looks like a hash but "@exclude" is an array; or
    | is this some kind of subroutine? Surely not ...
    |
    | 3. Empty list ... but why, and where
    | is it going to?

    If you could help me by pointing out the perldoc references where this
    seam of witchcraft is described I'll go and read up!

    The rest of your post - the part dealing with File::Find - I do
    understand. Thanks in the mean time.
    --

    Henry Law <>< Manchester, England
    Henry Law, Jun 16, 2005
    #8
  9. Henry Law <> wrote in
    news::

    > On 16 Jun 2005 09:48:30 GMT, -berlin.de (Anno
    > Siegel) wrote:


    >> my $exclude = 'C:\foo\*.txt'
    >> @exclude{ glob( $exclude)} = ();

    >
    > Uuh ... this is where I feel like the sorcerer's apprentice: totally
    > out of my depth. There are Perl constructs here that I simply don't
    > recognise:
    >
    > @exclude{ glob( $exclude)} = ();
    > ^ ^ ^
    > | | |
    > | 1. why don't we have to declare "@exclude" with "my"?
    > | |
    > | 2. That looks like a hash but "@exclude" is an array; or
    > | is this some kind of subroutine? Surely not ...
    > |
    > | 3. Empty list ... but why, and where
    > | is it going to?
    >
    > If you could help me by pointing out the perldoc references where this
    > seam of witchcraft is described I'll go and read up!


    perldoc perldata

    Read the section on slices.

    If Anno had specified

    use strict;

    he would have had to have:

    my %exclude;

    before the assignment to the slice.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Jun 16, 2005
    #9
  10. "A. Sinan Unur" <> wrote in
    news:Xns96776A0C2C771asu1cornelledu@127.0.0.1:

    > If Anno had specified
    >
    > use strict;


    And so he did, in <d8ri2k$nif$-Berlin.DE>.

    Sorry, I had not seen that one.

    Sinan

    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
    A. Sinan Unur, Jun 16, 2005
    #10
  11. Henry Law

    Anno Siegel Guest

    A. Sinan Unur <> wrote in comp.lang.perl.misc:
    > "A. Sinan Unur" <> wrote in
    > news:Xns96776A0C2C771asu1cornelledu@127.0.0.1:
    >
    > > If Anno had specified
    > >
    > > use strict;

    >
    > And so he did, in <d8ri2k$nif$-Berlin.DE>.
    >
    > Sorry, I had not seen that one.


    Yes, the declaration was meant to be there, it got lost in a copy/paste
    operation. I corrected that fast, but not fast enough for modern Usenet,
    it seems.

    It used to be I could safely send a "supersede" within 5 minutes or so
    and still catch it on my server most of the time. These days, more
    uncorrected postings seem to escape, something must be spinning faster.
    I'll adjust my discipline and add a note to superseding postings that
    marks them as such.

    Anno
    Anno Siegel, Jun 16, 2005
    #11
  12. Anno Siegel <-berlin.de> kirjoitti 16.06.2005:
    > Henry Law <> wrote in comp.lang.perl.misc:
    >>
    >> I'm getting really tangled up trying to turn my exclude specifications
    >> into regexes which I can then use to exclude relevant files; not only
    >> am I not a very experienced Perl coder but the logic of the task turns
    >> out to be quite complicated. For example

    >
    > A glob-to-regex translator wouldn't be very hard to write (I think).


    It seems this has, in fact, been done already.

    http://search.cpan.org/dist/Text-Glob/

    --
    Ilmari Karonen
    To reply by e-mail, please replace ".invalid" with ".net" in address.
    Ilmari Karonen, Jun 17, 2005
    #12
  13. Henry Law

    Anno Siegel Guest

    Ilmari Karonen <> wrote in comp.lang.perl.misc:
    > Anno Siegel <-berlin.de> kirjoitti 16.06.2005:
    > > Henry Law <> wrote in comp.lang.perl.misc:
    > >>
    > >> I'm getting really tangled up trying to turn my exclude specifications
    > >> into regexes which I can then use to exclude relevant files; not only
    > >> am I not a very experienced Perl coder but the logic of the task turns
    > >> out to be quite complicated. For example

    > >
    > > A glob-to-regex translator wouldn't be very hard to write (I think).

    >
    > It seems this has, in fact, been done already.
    >
    > http://search.cpan.org/dist/Text-Glob/


    Ah, yes. I haven't run it, but the doc looks good.

    Anno
    Anno Siegel, Jun 18, 2005
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bruce Lawson

    [ ] (Wildcard - Character(s) to Match) ?

    Bruce Lawson, Apr 19, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    439
    Karl Seguin [MVP]
    Apr 19, 2006
  2. Ravi

    Filename wildcard match

    Ravi, Nov 4, 2003, in forum: XML
    Replies:
    1
    Views:
    866
    Patrick TJ McPhee
    Nov 4, 2003
  3. Alain Frisch
    Replies:
    3
    Views:
    425
    Richard Tobin
    May 3, 2005
  4. Replies:
    7
    Views:
    833
  5. Andrew Chen
    Replies:
    1
    Views:
    191
    David Chelimsky
    Mar 25, 2008
Loading...

Share This Page