custom regexp modifiers

Josef Lechner · Jan 26, 2007

I have to compare pathnames that use different formats; Windows
(.\foo\bar), UNIX (./foo/bar), and sometimes mixed (.\foo/bar). To my
script, all of those pathnames in parentheses are equivalent. However,
perl's regular expressions do not see it that way. The way I've been
handling this is by first converting to UNIX style then comparing.

This works fine, but it really irks me that this is really no different
from what the /i modifier does, in that the /i modifier just treats
upper and lower case letters as if they're the same. I think it is
reasonable to want to do the same sort of deal with different
characters (i.e. /'s and \'s). So, is there a way to make my own
modifiers? perlre mentions (near the end) that qr// can be overloaded
to make your own escape sequences but I'm not sure if this strategy
would work for me.

Paul · Jan 27, 2007

I think the easiest way is the way you are doing it, by converting all
paths to the same format before comparing them.

$path =~ s|[\\\/]|/|sg;

Paul · Jan 27, 2007

Actually the following may be better:

$path =~ s|\$?:\$?|/|sg;

Uri Guttman · Jan 27, 2007

P> Actually the following may be better:
P> $path =~ s|\$?:\$?|/|sg;

???

why the grouping? \\? is 0 or 1 \

why not just \\{1,2} ?

why even care about multiple \ at all?

if so, then tr/// is better:

tr{\\}{/}

uri

Brian McCauley · Jan 27, 2007

I have to compare pathnames that use different formats; Windows
(.\foo\bar), UNIX (./foo/bar), and sometimes mixed (.\foo/bar). To my
script, all of those pathnames in parentheses are equivalent. However,
perl's regular expressions do not see it that way. The way I've been
handling this is by first converting to UNIX style then comparing.

This works fine, but it really irks me that this is really no different
from what the /i modifier does, in that the /i modifier just treats
upper and lower case letters as if they're the same. I think it is
reasonable to want to do the same sort of deal with different
characters (i.e. /'s and \'s). So, is there a way to make my own
modifiers? perlre mentions (near the end) that qr// can be overloaded
to make your own escape sequences but I'm not sure if this strategy
would work for me.

It would work. You could preprocess every / in a re into [\\/]. Unless
it's already in a character class in which case you'd want to
preprocess it to \\/.

The trouble is that parsing a regex is not trivial so this is a lot of
work.

YAPE::Regex will make a parse tree of the regex but I can't immediately
see from the YAPE::Regex documentation how to traverse and mutate this
tree cleanly through the API other than by digging in the internals.

Here's a first cut, but what we really need is better accessors in
YAPE::Regex.

package anyslash;
use strict;
use warnings;
use overload;
use YAPE::Regex;

sub import {
overload::constant 'qr' => \&convert;
}

# Ugly look at the inside of the parse tree!
sub mutate {
my $node = shift;
if ( ref $node->{CONTENT} eq 'ARRAY' ) {
@{$node->{CONTENT}} = map { mutate($_) } @{$node->{CONTENT}};
}
my $type = $node->type;
if ( $type eq 'class') {
$node->{TEXT} =~ s/\//\\\\\//g;
}
if ( $type eq 'text' ) {
my @nodes;
for my $token ( $node->{TEXT} =~ /[^\/]+|\//g ) {
if ( $token eq '/' ) {
push @nodes => YAPE::Regex::class->new("\\\\/");
} else {
push @nodes => YAPE::Regex::text->new($token);
}
}
return @nodes;
}
return $node;
}

sub convert {
my $parser = YAPE::Regex->new(shift);
$parser->parse;
return mutate($parser->root)->fullstring;
}

1;

Paul · Jan 27, 2007

Yep, TIMTOWTDI
why even care about multiple \ at all?
<<

Windows.

Tad McClellan · Jan 27, 2007

Paul said:
Actually the following may be better:

$path =~ s|\$?:\$?|/|sg;

The "s" modifier applies only to dots in the pattern.

You do not have any dots in your pattern.

That "s" modifier is a no-op, so why is it there?

Paul · Jan 27, 2007

Correct, it shouldn't be there. Just a lack of concentration.

Paul · Jan 28, 2007

I'll certainly do that in future, but I find it hard to believe that
you can't figure out the context of my post when the previous post
asks "why is it there?" and my reply begins "it shouldn't be there".

Uri Guttman · Jan 28, 2007

what context? not everyone see all posts on usenet and not always in the
same order. read the guidelines posted here regularly for more on how to
work with this group. context is all.

uri

axel · Jan 28, 2007

Paul said:
I'll certainly do that in future, but I find it hard to believe that
you can't figure out the context of my post when the previous post
asks "why is it there?" and my reply begins "it shouldn't be there".

This post is the first one I read on the c.l.p.m group today. To figure
out the context means that I have to trawl back through previous posts
to establish that context - hardly a very productive use of time. It
becomes even worse when to make real sense one needs to compare what is
being commented on and the comment at the same time.

Moreover the concept of *the* previous post is a bit meaningless in
Usenet... any order based on time will vary between different servers
and how many contributions to a thread might have been made to a
particular thread in the meantime?

There is the concept of the article that you are following up to, but
many newsreaders do not make it easy to jump directly to this article
(which in any case may longer be available) and it can still mean having
to switch between two articles.

Axel

Tad McClellan · Jan 28, 2007

Paul said:
I find it hard to believe that
you can't figure out the context of my post

That is because you do not understand how Usenet works.

(Google Groups is not Usenet and most participants here do not
view the group via Google Groups.)

Your reply may arrive *before* the post that you are replying too.

The post that you are replying to may *never* arrive.

Your post may be found as part of a web search (now that *is*
Google Groups) even years from now, in which case the reader
may have access to no other posts in the thread.

Many folks are often participating in many threads at once,
so context is extremely important to help them remember what
was said in this one.

But the reason that trumps all of them is that it is socially
unacceptable to not quote context on Usenet.

So you either quote context or you take heat from your peers.
It's every poster's choice.

help with regexp	5	Feb 7, 2013
Modifiers applied to attributes, local variables, member functions, classes and inncer classes !	11	Jul 16, 2004
Using char[] as function argument is equivalent to have a char* variant	34	May 11, 2014
can this be done with generics?	32	Nov 25, 2013
Simple regexp question	0	Oct 26, 2005
ANN: Version 0.1.2 of sarge (a subprocess wrapper library) has beenreleased.	0	Dec 17, 2013
Commented braces	3	Sep 12, 2007
Challenge supporting custom deepcopy with inheritance	3	May 31, 2009

custom regexp modifiers

Josef Lechner

Paul

Paul

Uri Guttman

Brian McCauley

Paul

Tad McClellan

Paul

Paul

Uri Guttman

axel

Tad McClellan

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads