2-1/2 regexp questions

J

J Krugman

1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

TIA,

jill


[1] If we changed R in (2) to the anchored regexp /^(\d+)/ we would
get the same set A as before, and not the set { (0, 1) }, because each
substring S[o, w] is tested against R in isolation.)
 
P

Paul Lalli

1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

This one's easy.

substr($string, 7, 3) =~ s/(a[^a]+)/;


The others... I'm gonna let someone else have a go.

Paul Lalli


2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

TIA,

jill


[1] If we changed R in (2) to the anchored regexp /^(\d+)/ we would
get the same set A as before, and not the set { (0, 1) }, because each
substring S[o, w] is tested against R in isolation.)
 
P

Paul Lalli

1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

This one's easy.

substr($string, 7, 3) =~ s/(a[^a]+)/;

Bah. I copy&pasted your typo...
substr($string, 7, 3) =~ s/(a[^a]+)/*\U$1*/;

Paul Lalli
 
B

Brian McCauley

J Krugman said:
1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

Err.... just do it.

substr() is an lvalued funtion you can use it on the LHS of =~.
2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1].

Please write in Perl. (Have you seen the posting guidelines that are
posted frequently?)

I shall assume you means to say:

Given a string $S and a regexp $R there is a finite set of pairs $o,$w
such that substr($S,$o,$w)=~/\A$R\Z/ is true.

For example, if

S = '1a21b4xy' and
R = /(\d+)/,

my $S = '1a21b4xy';
my $R = qr/\d+/;
then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

If we ignore $w for a moment you can easily get the set of $o such
that substr($S,$o)=~/\A$R/ is true.

my @A;
while ( $S = /(?=$R)/g ) {
push @A, [ pos($S), undef ];
}

To get the width it's a bit more involved

my @A;
{
my @w;

my $record_width = qr/(?{ push @w, length $1 })/;
while ( $S =~ /(?=($R)$record_width)/g ) {
push @A, map { [ pos($S), $_ ] } @w;
@w=();
}
}

For some reason I get an extra copy of [5,1] in @A. I can't
immediately see why.

It also seems odd that whilst $1 seems to do the right thing inside
the (?{}) $+[1] does not.
2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

You should not use length($PREMATCH), length($MATCH).

Use $-[0] and $+[0]-$-[0] instead.
Does Perl offer some built-in mechanism to get directly at the set B?

No.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\
 
P

Paul Lalli

2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

Still no idea how to go about 2, but you might be able to fiddle with my
solution to 2.5 to see what you can come up with....
$s = '1a21b4xy';
$r = qr/(\d+)/;
while ($s =~ /$r/g){
$w = length($1);
$o = pos ($s) - $w;
print "($o, $w) --> $1\n";
}
__END__
(0, 1) --> 1
(2, 2) --> 21
(5, 1) --> 4

I don't know how you want to define built-in mechanism in this case, but
this seems quick enough to me. (Your method is certainly short enough
too, but using the $`, $&, and $' variables is considered bad form for
mostly historical reasons...)

Paul Lalli
 
T

Tassilo v. Parseval

Also sprach J Krugman:
1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

substr() returns a left-value and you are already making use of that in
the above code. And since a lvalue can also be target of a substitution,
nothing will stop you from writing:

substr($string, 7, 3) =~ s/(.*)/*\U$1*/;
2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

Hardly. This can be done recursively:

my $s = '1a21b4xy';
my @s;

find_set($1, pos($s) - length($1)) while $s =~ /(\d+)/g;
print Dumper \@s;

sub find_set {
my ($m, $offset) = @_;
if (length $m > 1) {
my $num = length($m)-1;
find_set($1, $offset + pos($m) - length($1)) while $m =~ /(\d{$num})/g;
}
push @s, [ $offset, length($m), $m ];
}

I wouldn't call that a beauty though.
2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

Maybe not directly, but it's not very hard either:

my @s;
while ($s =~ /(\d+)/g) {
push @s, [ pos($s) - length($1), length($1) ];
}
[1] If we changed R in (2) to the anchored regexp /^(\d+)/ we would
get the same set A as before, and not the set { (0, 1) }, because each
substring S[o, w] is tested against R in isolation.)

In this case you have to change the algorithm. Mine will _not_ work with
an anchored pattern. As a matter of fact, I doubt that mine works for
any (non-anchored) pattern.

Tassilo
 
I

Irving Kimura

In said:
my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;

I assume that "$- [-1]" and "$+ [-1]" refer to elements of some
special arrays @- and @+ , but I have not been able to find anything
on these arrays by grepping through the online documentation.

Where in the Perl documentation are @- and @+ explained?

Thanks!

Irv
 
J

J Krugman

First I want to give thanks for all the great replies to my post;
I learned a lot from them. (Not least of them being that, even
though once upon a time a slogged through the 2nd Edition of
Programming Perl, I really should slog through the 3rd Edition.
Too bad O'Reilly never published a Delta edition.)

In said:
use re 'eval';
my $s = '1a21b4xy';
my $r = '\d+';
my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;
$s =~ $re;

I realize that the (?!) at the end of $re's definition is necessary
to get the right results, but I don't understand why. I figure it
has to do with getting the regexp engine to move to the right state,
but I really have no clue...

jill
 
J

J Krugman

In said:
J Krugman ([email protected]) wrote on MMMCMXVII September
MCMXCIII in <URL:{}
{} > my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
{} > " --> '$^N'\n"})(?!)/;
{}
{} I realize that the (?!) at the end of $re's definition is necessary
{} to get the right results, but I don't understand why.
(?!) is a (sub)expression that will always fail. It forces the regex
engine to try all possibilities. Otherwise, it will match '1', report
the match, and call it a day because it has archieved a match.

*Very* cool. Thanks.

jill
perl -le '"Just another Perl Hacker" =~ /(.+$)(?{ print $1 })(?!)/'
 
A

Anno Siegel

J Krugman said:
1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a').

I'm late to the thread, but here's a little supplement:

To do a replacement at a certain offset, consider pos() and /\G/ as a
possible alternative to substr(). In particular,

$_ = 'abracadabra';
pos = 7;
s/\G(...)/*\U$1*/;

does the replacement as required. You don't get to control the
length directly, it is implicit in the replacement.

The standard solution is, of course, substr(), but occasionally the
alternative comes in handy, especially when the offset can be
determined with a scalar /.../g which puts it directly in pos().

Anno
 
N

nobull

Abigail said:
use re 'eval';
my $r = '\d+';

my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;

I perfer to avoid the need for "use re 'eval'" by compling the (?{})
in a separate qr//.

my $r = '\d+';
my $re = qr /(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})/;
$re = qr /($r)$re(?!)/;
 
N

nobull

Abigail said:
use re 'eval';
my $r = '\d+';

my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;

I perfer to avoid the need for "use re 'eval'" by compling the (?{})
in a separate qr//.

my $r = '\d+';
my $re = qr /(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})/;
$re = qr /($r)$re(?!)/;

Sorry, those subscripts in @- and @+ should be 1 not -1 in case $r
contains any capturing groups.

my $r = '\d+';
my $re = qr /(?{print "(", $-[1], ", ", $+[1] - $-[1], ")\n"})/;
$re = qr /($r)$re(?!)/;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,125
Latest member
VinayKumar Nevatia_
Top