RegExp: Matching

Discussion in 'Perl Misc' started by Tore Aursand, Sep 23, 2003.

  1. Tore Aursand

    Tore Aursand Guest

    Hi!

    I'm totally stuck with a regular expression. It has actually to do with
    my Apache configuration (I've posted to alt.apache.configuration), but as
    regular expressions are quite similar, I hope it's OK to post my question
    here as well.

    The problem is that I want to match (inside a <DirectoryMatch> directive)
    '/var/www/html/test/' and all the subdirectories. However, I _do not_
    want the regular expression to match on subdirectories which begin with an
    underscore ('_').

    Example:

    /var/www/html/test/ - Match
    /var/www/html/test - Match
    /var/www/html/test/2 - Match
    /var/www/html/test/23/ - Match
    /var/www/html/test/_foo/ - Do _not_ match

    Thanks for any help!


    --
    Tore Aursand <>

    "You know the world is going crazy when the best rapper is white, the best
    golfer is black, France is accusing US of arrogance and Germany doesn't
    want to go to war."
     
    Tore Aursand, Sep 23, 2003
    #1
    1. Advertisements

  2. /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/

    Chief S.
     
    Chief Squawtendrawpet, Sep 23, 2003
    #2
    1. Advertisements

  3. I have no idea whether Apache's RE is similar to Perl's. I'll answer
    for Perl RE, and leave it up to you to translate.

    If you had crossposted you might have avoided people saying the same
    thing in the two disparate places.

    I wouldn't normally try to capture this in a single regexp, but I
    guess that's what you're looking for, so...

    my $dir = "/var/www/html/test";

    print "matches" if m#\A^$dir(/[^_]|/?\Z)#;

    More readable:

    m#
    \A^$dir # Start with the directory
    ( # followed by either
    /[^_] # a slash and a non-underscore character
    | # or
    /?\Z # the end of the string, optionally preceded by slash
    )
    #x;

    If Apache doesn't have '\A' and '\Z', you can probably replace them
    with '^' and '$'.

    Martien
     
    Martien Verbruggen, Sep 23, 2003
    #3
  4. Tore Aursand

    Tore Aursand Guest

    Thanks for the answer. This regular expression doesn't work the way
    intended, however. It still matches on '/var/www/html/test/subdir/_foo/',
    which it should skip.
     
    Tore Aursand, Sep 23, 2003
    #4
  5. Tore Aursand

    Tore Aursand Guest

    That's right. If I had the chance, which I don't think I have, I need to
    catch this in _one_ regular expression. Ack! :)
    I don't get this one to work the way intended, either. Even tried it in
    Perl with a list of possible directory names.

    It matches on '/var/www/html/test/subdir/_foo/', but it shouldn't.

    Any idea?
     
    Tore Aursand, Sep 23, 2003
    #5
  6. Tore Aursand

    Anno Siegel Guest

    Indeed. In Perl, this would be a typical case where a single-regex
    match is possible, but a combination with other techniques simplifies
    things.
    Less readable, but more general:

    my $slash = qr{/(?!_)}; # slash not followed by "-"
    my $name = qr{[^/]*}; # a string of non-slashes

    /^(?:$slash$name)*$/

    This matches all full qualified path names where no component name
    starts with a "_". For use with apache, the regex must be expanded:

    (?-xism:^(?:(?-xism:/(?!_))(?-xism:[^/]*))*$)

    Parts of that may still have to go... I'm not sure how much of
    "(?...)" syntax apache understands.

    Anno
     
    Anno Siegel, Sep 23, 2003
    #6
  7. Not on my Perl. But you should use Martien's regex; it's simpler.

    for (<DATA>){
    chomp;
    print "Match: $&\n" if /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/;
    }
    __DATA__
    /var/www/html/test/
    /var/www/html/test
    /var/www/html/test/2
    /var/www/html/test/23/
    /var/www/html/test/_foo/


    # OUTPUT

    Match: /var/www/html/test/
    Match: /var/www/html/test
    Match: /var/www/html/test/2
    Match: /var/www/html/test/23/
     
    Chief Squawtendrawpet, Sep 23, 2003
    #7
  8. Tore Aursand

    Tore Aursand Guest

    Then something is wrong with your Perl, I assume. Remember that I don't
    want the regular expression to match on "subdirectories of subdirectories"
    either.
    So...

    /var/www/html/test/subdir/_foo

    ....also matches, but it shouldn't. :)


    --
    Tore Aursand <>

    "You know the world is going crazy when the best rapper is white, the best
    golfer is black, France is accusing US of arrogance and Germany doesn't
    want to go to war."
     
    Tore Aursand, Sep 23, 2003
    #8
  9. Tore Aursand

    Tore Aursand Guest

    AFAIK, there's no way I can accomplish what I'm trying to do without doing
    all this matching in _one_ regular expression.

    Your regexp works _perfect_ in Perl, but doesn't seem to do the same in
    Apache. Hard to debug in Apache, really, and I don't get any errors or
    warnings when running a test on the configuration file.

    However. I might have found a way round the whole problem, as I might
    need to match _even more_. :)

    The problem is that I've written a quite tricky ApacheHandler. It needs
    to handle _everything_ in '/var/www/html/Application/' and all the subdirs
    (and their content), except sub-directories starting with '_'.

    I now see that I might be able to do this matching in the ApacheHandler
    itself...? Anyone know if it's possible to give the control back to
    Apache from a Handler written by yourself?

    Anyway. Guess alt.apache.configuration is the right place.


    --
    Tore Aursand <>

    "You know the world is going crazy when the best rapper is white, the best
    golfer is black, France is accusing US of arrogance and Germany doesn't
    want to go to war."
     
    Tore Aursand, Sep 23, 2003
    #9
  10. My mistake for not reading carefully enough, though you could have nudged
    us in the right direction had you included just one more entry in your
    original sample data, and your OP didn't place any real emphasis on the
    subdir-of-subdir issue. Sorry for the confusion.

    Chief S.
     
    Chief Squawtendrawpet, Sep 23, 2003
    #10
  11. Oh. I didn't get that at all out of the original post. maybe you
    should have included that as an example as well.
    That makes it alltogether more difficult to do it in one regex, and I
    suspect you'd need to use features of Perl's RE that won't be
    available in Apache. I wouldn't even try to do this in a single regex
    anymore in Perl, and I do think that trying to come up with a single
    one for Perl would be futile, given that it has to work in Apache.

    In Perl, I'd say:

    print "match" if m#\A$dir# and not m#/_#;

    or, if you want to make sure you don't match

    /var/www/html/test.dir

    print "match" if m#\A$dir(/|\Z)# and not m#/_#;

    Expressing that "not" in a regular expression is tough, if at all
    possible. I expect it's possible, but, as I said, I think it'd require
    some of Perl's specific RE features.

    Martien
     
    Martien Verbruggen, Sep 23, 2003
    #11
  12. [This followup was posted to comp.lang.perl.misc]

    #!/usr/bin/perl -w

    @paths = ( "/var/www/html/test/", "/var/www/html/test",
    "/var/www/html/test/2" , "/var/www/html/test/23/" ,
    "/var/www/html/test/_foo/" );

    foreach $path ( @paths ) {
    @parts = split(/\//,$path);
    $lastpart = $parts[$#parts];
    if ( $lastpart =~ m/^_/ ) {
    print "Do not match [$path]\n";
    }
    else {
    print "A match for [$path]!\n";
    }
    }

    exit 0;
     
    Barry Kimelman, Sep 24, 2003
    #12
  13. Tore Aursand

    Anno Siegel Guest

    [posted and mailedsnd.no]

    Here is another one that doesn't use the lookaround features of
    Perl regexes:

    m{^(/[^_/][^/]*)*/?$}

    It may need some tweaking for marginal cases (double slashes are
    probably not treated right), but apache should understand it.

    Anno
     
    Anno Siegel, Sep 25, 2003
    #13
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.