RegExp: Matching

T

Tore Aursand

Hi!

I'm totally stuck with a regular expression. It has actually to do with
my Apache configuration (I've posted to alt.apache.configuration), but as
regular expressions are quite similar, I hope it's OK to post my question
here as well.

The problem is that I want to match (inside a <DirectoryMatch> directive)
'/var/www/html/test/' and all the subdirectories. However, I _do not_
want the regular expression to match on subdirectories which begin with an
underscore ('_').

Example:

/var/www/html/test/ - Match
/var/www/html/test - Match
/var/www/html/test/2 - Match
/var/www/html/test/23/ - Match
/var/www/html/test/_foo/ - Do _not_ match

Thanks for any help!


--
Tore Aursand <[email protected]>

"You know the world is going crazy when the best rapper is white, the best
golfer is black, France is accusing US of arrogance and Germany doesn't
want to go to war."
 
C

Chief Squawtendrawpet

Tore said:
/var/www/html/test/ - Match
/var/www/html/test - Match
/var/www/html/test/2 - Match
/var/www/html/test/23/ - Match
/var/www/html/test/_foo/ - Do _not_ match

/^\/var\/www\/html\/test(\/[^_].*|\/|$)$/

Chief S.
 
M

Martien Verbruggen

Hi!

I'm totally stuck with a regular expression. It has actually to do with
my Apache configuration (I've posted to alt.apache.configuration), but as
regular expressions are quite similar, I hope it's OK to post my question
here as well.

I have no idea whether Apache's RE is similar to Perl's. I'll answer
for Perl RE, and leave it up to you to translate.

If you had crossposted you might have avoided people saying the same
thing in the two disparate places.
The problem is that I want to match (inside a <DirectoryMatch> directive)
'/var/www/html/test/' and all the subdirectories. However, I _do not_
want the regular expression to match on subdirectories which begin with an
underscore ('_').

Example:

/var/www/html/test/ - Match
/var/www/html/test - Match
/var/www/html/test/2 - Match
/var/www/html/test/23/ - Match
/var/www/html/test/_foo/ - Do _not_ match


I wouldn't normally try to capture this in a single regexp, but I
guess that's what you're looking for, so...

my $dir = "/var/www/html/test";

print "matches" if m#\A^$dir(/[^_]|/?\Z)#;

More readable:

m#
\A^$dir # Start with the directory
( # followed by either
/[^_] # a slash and a non-underscore character
| # or
/?\Z # the end of the string, optionally preceded by slash
)
#x;

If Apache doesn't have '\A' and '\Z', you can probably replace them
with '^' and '$'.

Martien
 
T

Tore Aursand

/^\/var\/www\/html\/test(\/[^_].*|\/|$)$/

Thanks for the answer. This regular expression doesn't work the way
intended, however. It still matches on '/var/www/html/test/subdir/_foo/',
which it should skip.
 
T

Tore Aursand

I wouldn't normally try to capture this in a single regexp, but I
guess that's what you're looking for, so...

That's right. If I had the chance, which I don't think I have, I need to
catch this in _one_ regular expression. Ack! :)
my $dir = "/var/www/html/test";
print "matches" if m#\A^$dir(/[^_]|/?\Z)#;

I don't get this one to work the way intended, either. Even tried it in
Perl with a list of possible directory names.

It matches on '/var/www/html/test/subdir/_foo/', but it shouldn't.

Any idea?
 
A

Anno Siegel

Martien Verbruggen said:
I have no idea whether Apache's RE is similar to Perl's. I'll answer
for Perl RE, and leave it up to you to translate.

If you had crossposted you might have avoided people saying the same
thing in the two disparate places.



I wouldn't normally try to capture this in a single regexp, but I
guess that's what you're looking for, so...

Indeed. In Perl, this would be a typical case where a single-regex
match is possible, but a combination with other techniques simplifies
things.
my $dir = "/var/www/html/test";

print "matches" if m#\A^$dir(/[^_]|/?\Z)#;

More readable:

m#
\A^$dir # Start with the directory
( # followed by either
/[^_] # a slash and a non-underscore character
| # or
/?\Z # the end of the string, optionally preceded by slash
)
#x;

If Apache doesn't have '\A' and '\Z', you can probably replace them
with '^' and '$'.

Less readable, but more general:

my $slash = qr{/(?!_)}; # slash not followed by "-"
my $name = qr{[^/]*}; # a string of non-slashes

/^(?:$slash$name)*$/

This matches all full qualified path names where no component name
starts with a "_". For use with apache, the regex must be expanded:

(?-xism:^(?:(?-xism:/(?!_))(?-xism:[^/]*))*$)

Parts of that may still have to go... I'm not sure how much of
"(?...)" syntax apache understands.

Anno
 
C

Chief Squawtendrawpet

Tore said:
/^\/var\/www\/html\/test(\/[^_].*|\/|$)$/

Thanks for the answer. This regular expression doesn't work the way
intended, however. It still matches on '/var/www/html/test/subdir/_foo/',
which it should skip.

Not on my Perl. But you should use Martien's regex; it's simpler.

for (<DATA>){
chomp;
print "Match: $&\n" if /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/;
}
__DATA__
/var/www/html/test/
/var/www/html/test
/var/www/html/test/2
/var/www/html/test/23/
/var/www/html/test/_foo/


# OUTPUT

Match: /var/www/html/test/
Match: /var/www/html/test
Match: /var/www/html/test/2
Match: /var/www/html/test/23/
 
T

Tore Aursand

/^\/var\/www\/html\/test(\/[^_].*|\/|$)$/
Thanks for the answer. This regular expression doesn't work the way
intended, however. It still matches on '/var/www/html/test/subdir/_foo/',
which it should skip.
Not on my Perl.

Then something is wrong with your Perl, I assume. Remember that I don't
want the regular expression to match on "subdirectories of subdirectories"
either.
for (<DATA>){
chomp;
print "Match: $&\n" if /^\/var\/www\/html\/test(\/[^_].*|\/|$)$/;
}
__DATA__
/var/www/html/test/
/var/www/html/test
/var/www/html/test/2
/var/www/html/test/23/
/var/www/html/test/_foo/

So...

/var/www/html/test/subdir/_foo

....also matches, but it shouldn't. :)


--
Tore Aursand <[email protected]>

"You know the world is going crazy when the best rapper is white, the best
golfer is black, France is accusing US of arrogance and Germany doesn't
want to go to war."
 
T

Tore Aursand

Indeed. In Perl, this would be a typical case where a single-regex
match is possible, but a combination with other techniques simplifies
things.

AFAIK, there's no way I can accomplish what I'm trying to do without doing
all this matching in _one_ regular expression.

Your regexp works _perfect_ in Perl, but doesn't seem to do the same in
Apache. Hard to debug in Apache, really, and I don't get any errors or
warnings when running a test on the configuration file.

However. I might have found a way round the whole problem, as I might
need to match _even more_. :)

The problem is that I've written a quite tricky ApacheHandler. It needs
to handle _everything_ in '/var/www/html/Application/' and all the subdirs
(and their content), except sub-directories starting with '_'.

I now see that I might be able to do this matching in the ApacheHandler
itself...? Anyone know if it's possible to give the control back to
Apache from a Handler written by yourself?

Anyway. Guess alt.apache.configuration is the right place.


--
Tore Aursand <[email protected]>

"You know the world is going crazy when the best rapper is white, the best
golfer is black, France is accusing US of arrogance and Germany doesn't
want to go to war."
 
C

Chief Squawtendrawpet

Tore said:
Then something is wrong with your Perl, I assume. Remember that I don't
want the regular expression to match on "subdirectories of subdirectories"
either.

My mistake for not reading carefully enough, though you could have nudged
us in the right direction had you included just one more entry in your
original sample data, and your OP didn't place any real emphasis on the
subdir-of-subdir issue. Sorry for the confusion.

Chief S.
 
M

Martien Verbruggen

I wouldn't normally try to capture this in a single regexp, but I
guess that's what you're looking for, so...

That's right. If I had the chance, which I don't think I have, I need to
catch this in _one_ regular expression. Ack! :)
my $dir = "/var/www/html/test";
print "matches" if m#\A^$dir(/[^_]|/?\Z)#;

I don't get this one to work the way intended, either. Even tried it in
Perl with a list of possible directory names.

It matches on '/var/www/html/test/subdir/_foo/', but it shouldn't.

Oh. I didn't get that at all out of the original post. maybe you
should have included that as an example as well.
Any idea?

That makes it alltogether more difficult to do it in one regex, and I
suspect you'd need to use features of Perl's RE that won't be
available in Apache. I wouldn't even try to do this in a single regex
anymore in Perl, and I do think that trying to come up with a single
one for Perl would be futile, given that it has to work in Apache.

In Perl, I'd say:

print "match" if m#\A$dir# and not m#/_#;

or, if you want to make sure you don't match

/var/www/html/test.dir

print "match" if m#\A$dir(/|\Z)# and not m#/_#;

Expressing that "not" in a regular expression is tough, if at all
possible. I expect it's possible, but, as I said, I think it'd require
some of Perl's specific RE features.

Martien
 
B

Barry Kimelman

[This followup was posted to comp.lang.perl.misc]

Hi!

I'm totally stuck with a regular expression. It has actually to do with
my Apache configuration (I've posted to alt.apache.configuration), but as
regular expressions are quite similar, I hope it's OK to post my question
here as well.

The problem is that I want to match (inside a <DirectoryMatch> directive)
'/var/www/html/test/' and all the subdirectories. However, I _do not_
want the regular expression to match on subdirectories which begin with an
underscore ('_').

Example:

/var/www/html/test/ - Match
/var/www/html/test - Match
/var/www/html/test/2 - Match
/var/www/html/test/23/ - Match
/var/www/html/test/_foo/ - Do _not_ match

Thanks for any help!

#!/usr/bin/perl -w

@paths = ( "/var/www/html/test/", "/var/www/html/test",
"/var/www/html/test/2" , "/var/www/html/test/23/" ,
"/var/www/html/test/_foo/" );

foreach $path ( @paths ) {
@parts = split(/\//,$path);
$lastpart = $parts[$#parts];
if ( $lastpart =~ m/^_/ ) {
print "Do not match [$path]\n";
}
else {
print "A match for [$path]!\n";
}
}

exit 0;
 
A

Anno Siegel

[posted and mailedsnd.no]

Tore Aursand said:
AFAIK, there's no way I can accomplish what I'm trying to do without doing
all this matching in _one_ regular expression.

Your regexp works _perfect_ in Perl, but doesn't seem to do the same in
Apache. Hard to debug in Apache, really, and I don't get any errors or
warnings when running a test on the configuration file.

Here is another one that doesn't use the lookaround features of
Perl regexes:

m{^(/[^_/][^/]*)*/?$}

It may need some tweaking for marginal cases (double slashes are
probably not treated right), but apache should understand it.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top