2-1/2 regexp questions

J Krugman · May 21, 2004

1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

TIA,

jill

[1] If we changed R in (2) to the anchored regexp /^(\d+)/ we would
get the same set A as before, and not the set { (0, 1) }, because each
substring S[o, w] is tested against R in isolation.)

Paul Lalli · May 21, 2004

1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

This one's easy.

substr($string, 7, 3) =~ s/(a[^a]+)/;

The others... I'm gonna let someone else have a go.

Paul Lalli

2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

TIA,

jill

[1] If we changed R in (2) to the anchored regexp /^(\d+)/ we would
get the same set A as before, and not the set { (0, 1) }, because each
substring S[o, w] is tested against R in isolation.)

Paul Lalli · May 21, 2004

1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

Click to expand...

This one's easy.

substr($string, 7, 3) =~ s/(a[^a]+)/;

Bah. I copy&pasted your typo...
substr($string, 7, 3) =~ s/(a[^a]+)/*\U$1*/;

Paul Lalli

Brian McCauley · May 21, 2004

J Krugman said:
1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

Err.... just do it.

substr() is an lvalued funtion you can use it on the LHS of =~.

2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1].

Please write in Perl. (Have you seen the posting guidelines that are
posted frequently?)

I shall assume you means to say:

Given a string $S and a regexp $R there is a finite set of pairs $o,$w
such that substr($S,$o,$w)=~/\A$R\Z/ is true.

For example, if

S = '1a21b4xy' and
R = /(\d+)/,

my $S = '1a21b4xy';
my $R = qr/\d+/;

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

If we ignore $w for a moment you can easily get the set of $o such
that substr($S,$o)=~/\A$R/ is true.

my @A;
while ( $S = /(?=$R)/g ) {
push @A, [ pos($S), undef ];
}

To get the width it's a bit more involved

my @A;
{
my @w;

my $record_width = qr/(?{ push @w, length $1 })/;
while ( $S =~ /(?=($R)$record_width)/g ) {
push @A, map { [ pos($S), $_ ] } @w;
@w=();
}
}

For some reason I get an extra copy of [5,1] in @A. I can't
immediately see why.

It also seems odd that whilst $1 seems to do the right thing inside
the (?{}) $+[1] does not.

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

You should not use length($PREMATCH), length($MATCH).

Use $-[0] and $+[0]-$-[0] instead.

Does Perl offer some built-in mechanism to get directly at the set B?

No.

--
\\ ( )
. _\\__[oo
.__/ \\ /\@
. l___\\
# ll l\\
###LL LL\\

Paul Lalli · May 21, 2004

2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

Still no idea how to go about 2, but you might be able to fiddle with my
solution to 2.5 to see what you can come up with....
$s = '1a21b4xy';
$r = qr/(\d+)/;
while ($s =~ /$r/g){
$w = length($1);
$o = pos ($s) - $w;
print "($o, $w) --> $1\n";
}
__END__
(0, 1) --> 1
(2, 2) --> 21
(5, 1) --> 4

I don't know how you want to define built-in mechanism in this case, but
this seems quick enough to me. (Your method is certainly short enough
too, but using the $`, $&, and $' variables is considered bad form for
mostly historical reasons...)

Paul Lalli

Tassilo v. Parseval · May 21, 2004

Also sprach J Krugman:

1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a'). I could use some labored code like this:

$string = 'abracadabra';
($substring = substr($string, 7, 3)) =~ s/(a[^a]+)/;
substr($string, 7, 3) = $substring;

This requires creating an auxiliary variable $substring and two calls
to substr, which seems like a waste. Is there any way to target s///
to a particular substring in a string?

substr() returns a left-value and you are already making use of that in
the above code. And since a lvalue can also be target of a substitution,
nothing will stop you from writing:

substr($string, 7, 3) =~ s/(.*)/*\U$1*/;

2. Given any finite string S and given any regexp R, there is a finite
set A of (o, w) pairs, such that the substring S[o, w] of S, beginning
at offset o and having length w matches R [1]. For example, if

S = '1a21b4xy' and
R = /(\d+)/,

then the pairs in the set A would be

(0, 1) --> '1'
(2, 1) --> '2'
(2, 2) --> '21'
(3, 1) --> '1'
(5, 1) --> '4'

Does Perl offer any simple and/or built-in way to generate the set
A?

Hardly. This can be done recursively:

my $s = '1a21b4xy';
my @s;

find_set($1, pos($s) - length($1)) while $s =~ /(\d+)/g;
print Dumper \@s;

sub find_set {
my ($m, $offset) = @_;
if (length $m > 1) {
my $num = length($m)-1;
find_set($1, $offset + pos($m) - length($1)) while $m =~ /(\d{$num})/g;
}
push @s, [ $offset, length($m), $m ];
}

I wouldn't call that a beauty though.

2.5 There's a *different* problem, simpler for me to code than the one
described in (2): generate the set B of all pairs (length($PREMATCH),
length($MATCH)) generated during a "global" (i.e. /g-modified) match.
For example:

while ($S =~ /(\d+)/g) {
push @pairs, [ map length, ($`, $&) ];
}

For the string S and the regexp R in the example given in question 2,
the set B would be { (0, 1), (2, 2), (5, 1) }.

Does Perl offer some built-in mechanism to get directly at the set B?

Maybe not directly, but it's not very hard either:

my @s;
while ($s =~ /(\d+)/g) {
push @s, [ pos($s) - length($1), length($1) ];
}

[1] If we changed R in (2) to the anchored regexp /^(\d+)/ we would
get the same set A as before, and not the set { (0, 1) }, because each
substring S[o, w] is tested against R in isolation.)

In this case you have to change the algorithm. Mine will _not_ work with
an anchored pattern. As a matter of fact, I doubt that mine works for
any (non-anchored) pattern.

Tassilo

Irving Kimura · May 22, 2004

In said:
my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;

I assume that "$- [-1]" and "$+ [-1]" refer to elements of some
special arrays @- and @+ , but I have not been able to find anything
on these arrays by grepping through the online documentation.

Where in the Perl documentation are @- and @+ explained?

Thanks!

Irv

Tad McClellan · May 22, 2004

Where in the Perl documentation are @- and @+ explained?

Perl's var_iables are documented in:

perldoc perlvar

J Krugman · May 22, 2004

First I want to give thanks for all the great replies to my post;
I learned a lot from them. (Not least of them being that, even
though once upon a time a slogged through the 2nd Edition of
Programming Perl, I really should slog through the 3rd Edition.
Too bad O'Reilly never published a Delta edition.)

In said:
use re 'eval';

my $s = '1a21b4xy';
my $r = '\d+';

my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;

$s =~ $re;

I realize that the (?!) at the end of $re's definition is necessary
to get the right results, but I don't understand why. I figure it
has to do with getting the regexp engine to move to the right state,
but I really have no clue...

jill

J Krugman · May 22, 2004

In said:
J Krugman ([email protected]) wrote on MMMCMXVII September
MCMXCIII in <URL:{}
{} > my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
{} > " --> '$^N'\n"})(?!)/;
{}
{} I realize that the (?!) at the end of $re's definition is necessary
{} to get the right results, but I don't understand why.

(?!) is a (sub)expression that will always fail. It forces the regex
engine to try all possibilities. Otherwise, it will match '1', report
the match, and call it a day because it has archieved a match.

*Very* cool. Thanks.

jill
perl -le '"Just another Perl Hacker" =~ /(.+$)(?{ print $1 })(?!)/'

Anno Siegel · May 23, 2004

J Krugman said:
1. Supposed I wanted apply s/// at a particular (offset, length)
substring of a string (for example, applying s/(a[^a]+)/*\U$1*/ to the
(offset = 7, length = 3) substring in 'abracadabra', to get
'abracad*ABR*a').

I'm late to the thread, but here's a little supplement:

To do a replacement at a certain offset, consider pos() and /\G/ as a
possible alternative to substr(). In particular,

$_ = 'abracadabra';
pos = 7;
s/\G(...)/*\U$1*/;

does the replacement as required. You don't get to control the
length directly, it is implicit in the replacement.

The standard solution is, of course, substr(), but occasionally the
alternative comes in handy, especially when the offset can be
determined with a scalar /.../g which puts it directly in pos().

Anno

nobull · May 24, 2004

Abigail said:
use re 'eval';
my $r = '\d+';

my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;

I perfer to avoid the need for "use re 'eval'" by compling the (?{})
in a separate qr//.

my $r = '\d+';
my $re = qr /(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})/;
$re = qr /($r)$re(?!)/;

nobull · May 27, 2004

Abigail said:
Abigail said:

use re 'eval';
my $r = '\d+';

my $re = qr /($r)(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})(?!)/;

Click to expand...

I perfer to avoid the need for "use re 'eval'" by compling the (?{})
in a separate qr//.

my $r = '\d+';
my $re = qr /(?{print "(", $- [-1], ", ", $+ [-1] - $- [-1], ")",
" --> '$^N'\n"})/;
$re = qr /($r)$re(?!)/;

Sorry, those subscripts in @- and @+ should be 1 not -1 in case $r
contains any capturing groups.

my $r = '\d+';
my $re = qr /(?{print "(", $-[1], ", ", $+[1] - $-[1], ")\n"})/;
$re = qr /($r)$re(?!)/;

Value error: Argument Z must be 2-dimensional	0	May 7, 2022
CIN Input #2 gets skipped, I don't understand why.	1	Feb 9, 2023
How to speed this code	3	Nov 16, 2022
Complex Python challenge 1	3	Jan 30, 2023
Born Again C.S. Guy Intro/Career Questions	3	May 2, 2023
Blue J Ciphertext Program	2	Nov 22, 2023
Collect Excel Data from Website	5	Apr 30, 2022
C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022

2-1/2 regexp questions

J Krugman

Paul Lalli

Paul Lalli

Brian McCauley

Paul Lalli

Tassilo v. Parseval

Irving Kimura

Tad McClellan

J Krugman

J Krugman

Anno Siegel

nobull

nobull

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads