custom regexp modifiers

J

Josef Lechner

I have to compare pathnames that use different formats; Windows
(.\foo\bar), UNIX (./foo/bar), and sometimes mixed (.\foo/bar). To my
script, all of those pathnames in parentheses are equivalent. However,
perl's regular expressions do not see it that way. The way I've been
handling this is by first converting to UNIX style then comparing.

This works fine, but it really irks me that this is really no different
from what the /i modifier does, in that the /i modifier just treats
upper and lower case letters as if they're the same. I think it is
reasonable to want to do the same sort of deal with different
characters (i.e. /'s and \'s). So, is there a way to make my own
modifiers? perlre mentions (near the end) that qr// can be overloaded
to make your own escape sequences but I'm not sure if this strategy
would work for me.
 
P

Paul

I think the easiest way is the way you are doing it, by converting all
paths to the same format before comparing them.

$path =~ s|[\\\/]|/|sg;
 
U

Uri Guttman

P> Actually the following may be better:
P> $path =~ s|\\(?:\\)?|/|sg;

???

why the grouping? \\? is 0 or 1 \

why not just \\{1,2} ?

why even care about multiple \ at all?

if so, then tr/// is better:

tr{\\}{/}

uri
 
B

Brian McCauley

I have to compare pathnames that use different formats; Windows
(.\foo\bar), UNIX (./foo/bar), and sometimes mixed (.\foo/bar). To my
script, all of those pathnames in parentheses are equivalent. However,
perl's regular expressions do not see it that way. The way I've been
handling this is by first converting to UNIX style then comparing.

This works fine, but it really irks me that this is really no different
from what the /i modifier does, in that the /i modifier just treats
upper and lower case letters as if they're the same. I think it is
reasonable to want to do the same sort of deal with different
characters (i.e. /'s and \'s). So, is there a way to make my own
modifiers? perlre mentions (near the end) that qr// can be overloaded
to make your own escape sequences but I'm not sure if this strategy
would work for me.

It would work. You could preprocess every / in a re into [\\/]. Unless
it's already in a character class in which case you'd want to
preprocess it to \\/.

The trouble is that parsing a regex is not trivial so this is a lot of
work.

YAPE::Regex will make a parse tree of the regex but I can't immediately
see from the YAPE::Regex documentation how to traverse and mutate this
tree cleanly through the API other than by digging in the internals.

Here's a first cut, but what we really need is better accessors in
YAPE::Regex.

package anyslash;
use strict;
use warnings;
use overload;
use YAPE::Regex;

sub import {
overload::constant 'qr' => \&convert;
}

# Ugly look at the inside of the parse tree!
sub mutate {
my $node = shift;
if ( ref $node->{CONTENT} eq 'ARRAY' ) {
@{$node->{CONTENT}} = map { mutate($_) } @{$node->{CONTENT}};
}
my $type = $node->type;
if ( $type eq 'class') {
$node->{TEXT} =~ s/\//\\\\\//g;
}
if ( $type eq 'text' ) {
my @nodes;
for my $token ( $node->{TEXT} =~ /[^\/]+|\//g ) {
if ( $token eq '/' ) {
push @nodes => YAPE::Regex::class->new("\\\\/");
} else {
push @nodes => YAPE::Regex::text->new($token);
}
}
return @nodes;
}
return $node;
}

sub convert {
my $parser = YAPE::Regex->new(shift);
$parser->parse;
return mutate($parser->root)->fullstring;
}

1;
 
T

Tad McClellan

Paul said:
Actually the following may be better:


$path =~ s|\\(?:\\)?|/|sg;


The "s" modifier applies only to dots in the pattern.

You do not have any dots in your pattern.

That "s" modifier is a no-op, so why is it there?
 
P

Paul

I'll certainly do that in future, but I find it hard to believe that
you can't figure out the context of my post when the previous post
asks "why is it there?" and my reply begins "it shouldn't be there".
 
U

Uri Guttman

what context? not everyone see all posts on usenet and not always in the
same order. read the guidelines posted here regularly for more on how to
work with this group. context is all.

uri
 
A

axel

Paul said:
I'll certainly do that in future, but I find it hard to believe that
you can't figure out the context of my post when the previous post
asks "why is it there?" and my reply begins "it shouldn't be there".

This post is the first one I read on the c.l.p.m group today. To figure
out the context means that I have to trawl back through previous posts
to establish that context - hardly a very productive use of time. It
becomes even worse when to make real sense one needs to compare what is
being commented on and the comment at the same time.

Moreover the concept of *the* previous post is a bit meaningless in
Usenet... any order based on time will vary between different servers
and how many contributions to a thread might have been made to a
particular thread in the meantime?

There is the concept of the article that you are following up to, but
many newsreaders do not make it easy to jump directly to this article
(which in any case may longer be available) and it can still mean having
to switch between two articles.

Axel
 
T

Tad McClellan

Paul said:
I find it hard to believe that
you can't figure out the context of my post


That is because you do not understand how Usenet works.

(Google Groups is not Usenet and most participants here do not
view the group via Google Groups.)

Your reply may arrive *before* the post that you are replying too.

The post that you are replying to may *never* arrive.

Your post may be found as part of a web search (now that *is*
Google Groups) even years from now, in which case the reader
may have access to no other posts in the thread.

Many folks are often participating in many threads at once,
so context is extremely important to help them remember what
was said in this one.


But the reason that trumps all of them is that it is socially
unacceptable to not quote context on Usenet.

So you either quote context or you take heat from your peers.
It's every poster's choice.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,156
Latest member
KetoBurnSupplement
Top