RegExp: Matching

Discussion in 'Perl Misc' started by Tore Aursand, Sep 23, 2003.

  1. Tore Aursand

    Tore Aursand Guest

    Hi!

    I'm totally stuck with a regular expression. It has actually to do with
    my Apache configuration (I've posted to alt.apache.configuration), but as
    regular expressions are quite similar, I hope it's OK to post my question
    here as well.

    The problem is that I want to match (inside a <DirectoryMatch> directive)
    '/var/www/html/test/' and all the subdirectories. However, I _do not_
    want the regular expression to match on subdirectories which begin with an
    underscore ('_').

    Example:

    /var/www/html/test/ - Match
    /var/www/html/test - Match
    /var/www/html/test/2 - Match
    /var/www/html/test/23/ - Match
    /var/www/html/test/_foo/ - Do _not_ match

    Thanks for any help!


    --
    Tore Aursand <>

    "You know the world is going crazy when the best rapper is white, the best
    golfer is black, France is accusing US of arrogance and Germany doesn't
    want to go to war."
    Tore Aursand, Sep 23, 2003
    #1
    1. Advertising

  2. Tore Aursand wrote:
    > /var/www/html/test/ - Match
    > /var/www/html/test - Match
    > /var/www/html/test/2 - Match
    > /var/www/html/test/23/ - Match
    > /var/www/html/test/_foo/ - Do _not_ match


    /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/

    Chief S.
    Chief Squawtendrawpet, Sep 23, 2003
    #2
    1. Advertising

  3. On Tue, 23 Sep 2003 03:42:02 +0200,
    Tore Aursand <> wrote:
    > Hi!
    >
    > I'm totally stuck with a regular expression. It has actually to do with
    > my Apache configuration (I've posted to alt.apache.configuration), but as
    > regular expressions are quite similar, I hope it's OK to post my question
    > here as well.


    I have no idea whether Apache's RE is similar to Perl's. I'll answer
    for Perl RE, and leave it up to you to translate.

    If you had crossposted you might have avoided people saying the same
    thing in the two disparate places.

    > The problem is that I want to match (inside a <DirectoryMatch> directive)
    > '/var/www/html/test/' and all the subdirectories. However, I _do not_
    > want the regular expression to match on subdirectories which begin with an
    > underscore ('_').
    >
    > Example:
    >
    > /var/www/html/test/ - Match
    > /var/www/html/test - Match
    > /var/www/html/test/2 - Match
    > /var/www/html/test/23/ - Match
    > /var/www/html/test/_foo/ - Do _not_ match



    I wouldn't normally try to capture this in a single regexp, but I
    guess that's what you're looking for, so...

    my $dir = "/var/www/html/test";

    print "matches" if m#\A^$dir(/[^_]|/?\Z)#;

    More readable:

    m#
    \A^$dir # Start with the directory
    ( # followed by either
    /[^_] # a slash and a non-underscore character
    | # or
    /?\Z # the end of the string, optionally preceded by slash
    )
    #x;

    If Apache doesn't have '\A' and '\Z', you can probably replace them
    with '^' and '$'.

    Martien
    --
    |
    Martien Verbruggen | True seekers can always find something to
    Trading Post Australia | believe in.
    |
    Martien Verbruggen, Sep 23, 2003
    #3
  4. Tore Aursand

    Tore Aursand Guest

    On Mon, 22 Sep 2003 19:05:12 -0700, Chief Squawtendrawpet wrote:
    >> /var/www/html/test/ - Match
    >> /var/www/html/test - Match
    >> /var/www/html/test/2 - Match
    >> /var/www/html/test/23/ - Match
    >> /var/www/html/test/_foo/ - Do _not_ match


    > /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/


    Thanks for the answer. This regular expression doesn't work the way
    intended, however. It still matches on '/var/www/html/test/subdir/_foo/',
    which it should skip.


    --
    Tore Aursand <>
    Tore Aursand, Sep 23, 2003
    #4
  5. Tore Aursand

    Tore Aursand Guest

    On Tue, 23 Sep 2003 02:09:29 +0000, Martien Verbruggen wrote:
    >> /var/www/html/test/ - Match
    >> /var/www/html/test - Match
    >> /var/www/html/test/2 - Match
    >> /var/www/html/test/23/ - Match
    >> /var/www/html/test/_foo/ - Do _not_ match


    > I wouldn't normally try to capture this in a single regexp, but I
    > guess that's what you're looking for, so...


    That's right. If I had the chance, which I don't think I have, I need to
    catch this in _one_ regular expression. Ack! :)

    > my $dir = "/var/www/html/test";
    > print "matches" if m#\A^$dir(/[^_]|/?\Z)#;


    I don't get this one to work the way intended, either. Even tried it in
    Perl with a list of possible directory names.

    It matches on '/var/www/html/test/subdir/_foo/', but it shouldn't.

    Any idea?



    --
    Tore Aursand <>
    Tore Aursand, Sep 23, 2003
    #5
  6. Tore Aursand

    Anno Siegel Guest

    Martien Verbruggen <> wrote in comp.lang.perl.misc:
    > On Tue, 23 Sep 2003 03:42:02 +0200,
    > Tore Aursand <> wrote:
    > > Hi!
    > >
    > > I'm totally stuck with a regular expression. It has actually to do with
    > > my Apache configuration (I've posted to alt.apache.configuration), but as
    > > regular expressions are quite similar, I hope it's OK to post my question
    > > here as well.

    >
    > I have no idea whether Apache's RE is similar to Perl's. I'll answer
    > for Perl RE, and leave it up to you to translate.
    >
    > If you had crossposted you might have avoided people saying the same
    > thing in the two disparate places.
    >
    > > The problem is that I want to match (inside a <DirectoryMatch> directive)
    > > '/var/www/html/test/' and all the subdirectories. However, I _do not_
    > > want the regular expression to match on subdirectories which begin with an
    > > underscore ('_').
    > >
    > > Example:
    > >
    > > /var/www/html/test/ - Match
    > > /var/www/html/test - Match
    > > /var/www/html/test/2 - Match
    > > /var/www/html/test/23/ - Match
    > > /var/www/html/test/_foo/ - Do _not_ match

    >
    >
    > I wouldn't normally try to capture this in a single regexp, but I
    > guess that's what you're looking for, so...


    Indeed. In Perl, this would be a typical case where a single-regex
    match is possible, but a combination with other techniques simplifies
    things.

    > my $dir = "/var/www/html/test";
    >
    > print "matches" if m#\A^$dir(/[^_]|/?\Z)#;
    >
    > More readable:
    >
    > m#
    > \A^$dir # Start with the directory
    > ( # followed by either
    > /[^_] # a slash and a non-underscore character
    > | # or
    > /?\Z # the end of the string, optionally preceded by slash
    > )
    > #x;
    >
    > If Apache doesn't have '\A' and '\Z', you can probably replace them
    > with '^' and '$'.


    Less readable, but more general:

    my $slash = qr{/(?!_)}; # slash not followed by "-"
    my $name = qr{[^/]*}; # a string of non-slashes

    /^(?:$slash$name)*$/

    This matches all full qualified path names where no component name
    starts with a "_". For use with apache, the regex must be expanded:

    (?-xism:^(?:(?-xism:/(?!_))(?-xism:[^/]*))*$)

    Parts of that may still have to go... I'm not sure how much of
    "(?...)" syntax apache understands.

    Anno
    Anno Siegel, Sep 23, 2003
    #6
  7. Tore Aursand wrote:
    > > /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/

    >
    > Thanks for the answer. This regular expression doesn't work the way
    > intended, however. It still matches on '/var/www/html/test/subdir/_foo/',
    > which it should skip.


    Not on my Perl. But you should use Martien's regex; it's simpler.

    for (<DATA>){
    chomp;
    print "Match: $&\n" if /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/;
    }
    __DATA__
    /var/www/html/test/
    /var/www/html/test
    /var/www/html/test/2
    /var/www/html/test/23/
    /var/www/html/test/_foo/


    # OUTPUT

    Match: /var/www/html/test/
    Match: /var/www/html/test
    Match: /var/www/html/test/2
    Match: /var/www/html/test/23/
    Chief Squawtendrawpet, Sep 23, 2003
    #7
  8. Tore Aursand

    Tore Aursand Guest

    On Tue, 23 Sep 2003 11:55:53 -0700, Chief Squawtendrawpet wrote:
    >>> /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/


    >> Thanks for the answer. This regular expression doesn't work the way
    >> intended, however. It still matches on '/var/www/html/test/subdir/_foo/',
    >> which it should skip.


    > Not on my Perl.


    Then something is wrong with your Perl, I assume. Remember that I don't
    want the regular expression to match on "subdirectories of subdirectories"
    either.

    > for (<DATA>){
    > chomp;
    > print "Match: $&\n" if /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/;
    > }
    > __DATA__
    > /var/www/html/test/
    > /var/www/html/test
    > /var/www/html/test/2
    > /var/www/html/test/23/
    > /var/www/html/test/_foo/


    So...

    /var/www/html/test/subdir/_foo

    ....also matches, but it shouldn't. :)


    --
    Tore Aursand <>

    "You know the world is going crazy when the best rapper is white, the best
    golfer is black, France is accusing US of arrogance and Germany doesn't
    want to go to war."
    Tore Aursand, Sep 23, 2003
    #8
  9. Tore Aursand

    Tore Aursand Guest

    On Tue, 23 Sep 2003 10:35:40 +0000, Anno Siegel wrote:
    > Indeed. In Perl, this would be a typical case where a single-regex
    > match is possible, but a combination with other techniques simplifies
    > things.


    AFAIK, there's no way I can accomplish what I'm trying to do without doing
    all this matching in _one_ regular expression.

    Your regexp works _perfect_ in Perl, but doesn't seem to do the same in
    Apache. Hard to debug in Apache, really, and I don't get any errors or
    warnings when running a test on the configuration file.

    However. I might have found a way round the whole problem, as I might
    need to match _even more_. :)

    The problem is that I've written a quite tricky ApacheHandler. It needs
    to handle _everything_ in '/var/www/html/Application/' and all the subdirs
    (and their content), except sub-directories starting with '_'.

    I now see that I might be able to do this matching in the ApacheHandler
    itself...? Anyone know if it's possible to give the control back to
    Apache from a Handler written by yourself?

    Anyway. Guess alt.apache.configuration is the right place.


    --
    Tore Aursand <>

    "You know the world is going crazy when the best rapper is white, the best
    golfer is black, France is accusing US of arrogance and Germany doesn't
    want to go to war."
    Tore Aursand, Sep 23, 2003
    #9
  10. Tore Aursand wrote:
    > Then something is wrong with your Perl, I assume. Remember that I don't
    > want the regular expression to match on "subdirectories of subdirectories"
    > either.


    My mistake for not reading carefully enough, though you could have nudged
    us in the right direction had you included just one more entry in your
    original sample data, and your OP didn't place any real emphasis on the
    subdir-of-subdir issue. Sorry for the confusion.

    Chief S.
    Chief Squawtendrawpet, Sep 23, 2003
    #10
  11. On Tue, 23 Sep 2003 11:55:09 +0200,
    Tore Aursand <> wrote:
    > On Tue, 23 Sep 2003 02:09:29 +0000, Martien Verbruggen wrote:
    >>> /var/www/html/test/ - Match
    >>> /var/www/html/test - Match
    >>> /var/www/html/test/2 - Match
    >>> /var/www/html/test/23/ - Match
    >>> /var/www/html/test/_foo/ - Do _not_ match

    >
    >> I wouldn't normally try to capture this in a single regexp, but I
    >> guess that's what you're looking for, so...

    >
    > That's right. If I had the chance, which I don't think I have, I need to
    > catch this in _one_ regular expression. Ack! :)
    >
    >> my $dir = "/var/www/html/test";
    >> print "matches" if m#\A^$dir(/[^_]|/?\Z)#;

    >
    > I don't get this one to work the way intended, either. Even tried it in
    > Perl with a list of possible directory names.
    >
    > It matches on '/var/www/html/test/subdir/_foo/', but it shouldn't.


    Oh. I didn't get that at all out of the original post. maybe you
    should have included that as an example as well.

    > Any idea?


    That makes it alltogether more difficult to do it in one regex, and I
    suspect you'd need to use features of Perl's RE that won't be
    available in Apache. I wouldn't even try to do this in a single regex
    anymore in Perl, and I do think that trying to come up with a single
    one for Perl would be futile, given that it has to work in Apache.

    In Perl, I'd say:

    print "match" if m#\A$dir# and not m#/_#;

    or, if you want to make sure you don't match

    /var/www/html/test.dir

    print "match" if m#\A$dir(/|\Z)# and not m#/_#;

    Expressing that "not" in a regular expression is tough, if at all
    possible. I expect it's possible, but, as I said, I think it'd require
    some of Perl's specific RE features.

    Martien
    --
    |
    Martien Verbruggen | If at first you don't succeed, destroy all
    Trading Post Australia | evidence that you tried.
    |
    Martien Verbruggen, Sep 23, 2003
    #11
  12. [This followup was posted to comp.lang.perl.misc]

    In article <>,
    says...
    > Hi!
    >
    > I'm totally stuck with a regular expression. It has actually to do with
    > my Apache configuration (I've posted to alt.apache.configuration), but as
    > regular expressions are quite similar, I hope it's OK to post my question
    > here as well.
    >
    > The problem is that I want to match (inside a <DirectoryMatch> directive)
    > '/var/www/html/test/' and all the subdirectories. However, I _do not_
    > want the regular expression to match on subdirectories which begin with an
    > underscore ('_').
    >
    > Example:
    >
    > /var/www/html/test/ - Match
    > /var/www/html/test - Match
    > /var/www/html/test/2 - Match
    > /var/www/html/test/23/ - Match
    > /var/www/html/test/_foo/ - Do _not_ match
    >
    > Thanks for any help!


    #!/usr/bin/perl -w

    @paths = ( "/var/www/html/test/", "/var/www/html/test",
    "/var/www/html/test/2" , "/var/www/html/test/23/" ,
    "/var/www/html/test/_foo/" );

    foreach $path ( @paths ) {
    @parts = split(/\//,$path);
    $lastpart = $parts[$#parts];
    if ( $lastpart =~ m/^_/ ) {
    print "Do not match [$path]\n";
    }
    else {
    print "A match for [$path]!\n";
    }
    }

    exit 0;
    Barry Kimelman, Sep 24, 2003
    #12
  13. Tore Aursand

    Anno Siegel Guest

    [posted and mailedsnd.no]

    Tore Aursand <> wrote in comp.lang.perl.misc:
    > On Tue, 23 Sep 2003 10:35:40 +0000, Anno Siegel wrote:
    > > Indeed. In Perl, this would be a typical case where a single-regex
    > > match is possible, but a combination with other techniques simplifies
    > > things.

    >
    > AFAIK, there's no way I can accomplish what I'm trying to do without doing
    > all this matching in _one_ regular expression.
    >
    > Your regexp works _perfect_ in Perl, but doesn't seem to do the same in
    > Apache. Hard to debug in Apache, really, and I don't get any errors or
    > warnings when running a test on the configuration file.


    Here is another one that doesn't use the lookaround features of
    Perl regexes:

    m{^(/[^_/][^/]*)*/?$}

    It may need some tweaking for marginal cases (double slashes are
    probably not treated right), but apache should understand it.

    Anno
    Anno Siegel, Sep 25, 2003
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Prabh
    Replies:
    2
    Views:
    493
    Chris Smith
    May 13, 2004
  2. enrique
    Replies:
    3
    Views:
    12,742
    Alan Moore
    Feb 8, 2005
  3. Joao Silva
    Replies:
    16
    Views:
    355
    7stud --
    Aug 21, 2009
  4. Marc Bissonnette

    Pattern matching : not matching problem

    Marc Bissonnette, Jan 8, 2004, in forum: Perl Misc
    Replies:
    9
    Views:
    231
    Marc Bissonnette
    Jan 13, 2004
  5. Bobby Chamness
    Replies:
    2
    Views:
    225
    Xicheng Jia
    May 3, 2007
Loading...

Share This Page