Regex: Backreferences do not work inside quantifiers?

Wolfgang Thomas · Mar 7, 2006

I have a line of the following format:
string length followed by colon followed by the actual
string.
To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+)

.{\1})/;
print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

I have studied the "Perl Programming" book and
the active perl regex documentation, but could not
find a restriction that backreferences must not be
used inside quantifiers.

What am I doing wrong?

it_says_BALLS_on_your forehead · Mar 7, 2006

Wolfgang said:
I have a line of the following format:
string length followed by colon followed by the actual
string.
To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;
print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

I have studied the "Perl Programming" book and
the active perl regex documentation, but could not
find a restriction that backreferences must not be
used inside quantifiers.

i haven't studied this yet, but are you sure regexes are the best tool
for what you're doing?

Wolfgang Thomas · Mar 7, 2006

i haven't studied this yet, but are you sure regexes are the best tool
for what you're doing?

Maybe not, but still I wonder why it does not work.

A. Sinan Unur · Mar 7, 2006

I have a line of the following format:
string length followed by colon followed by the actual
string.
To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;

Where did you get the notion that backreferences could be used in this
way?

....

What am I doing wrong?

You are using regular expressions to solve a problem to which they are
ill-suited.

Important question: What do you want to do if the string to the right of
the colon is shorter than the length specified?

Your attempted use of .{\1} means you want the match to fail in that
case. I don't know if this matters.

#!/usr/bin/perl

use strict;
use warnings;

while ( <DATA> ) {
chomp;
next unless length;
my $length = 0 + substr $_, 0, index($_, ':');
my $string = substr $_, 1 + index($_, ':'), $length;
print "Length = $length\nString = $string\n";
}

__DATA__
3:abcd
10:012345689
3:abc
5:aaa

Matt Garrish · Mar 7, 2006

Wolfgang Thomas said:
I have a line of the following format:
string length followed by colon followed by the actual
string.

So why aren't you using split and substr?

To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;

\d is shorthand for a character class; why are you then putting it in one?

print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

That's because you can't dynamically assign the value. To perl it's just
braces and a comma to match. For example:

my $s = "3:a{,}bcd";
$s =~ /(\d+)

.{\1,})/;
print "$1\n";
print "$2\n";

There might be some way to do this using the extended regexes, but off the
top of my head I couldn't say, and would recommend the two functions named
above... : )

Matt

Matt Garrish · Mar 7, 2006

Matt Garrish said:
Wolfgang Thomas said:

I have a line of the following format:
string length followed by colon followed by the actual
string.

Click to expand...

So why aren't you using split and substr?

To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;

Click to expand...

\d is shorthand for a character class; why are you then putting it in one?

print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

Click to expand...

That's because you can't dynamically assign the value. To perl it's just
braces and a comma to match. For example:

my $s = "3:a{,}bcd";

my $s = "3:a{3,}bcd";

Matt

Tad McClellan · Mar 7, 2006

Wolfgang Thomas said:
I have a line of the following format:
string length followed by colon followed by the actual
string.

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;

The square brackets serve no purpose there.

You would need the s///s modifier to handle "3:1\n34567".

print "$1\n";
print "$2\n";

You should *never* use the dollar-digit variables unless you
have first ensured that the pattern match *succeeded*:

if ( $s =~ /(\d+)

.{\1})/s ) {
print "$1\n";
...

I have studied the "Perl Programming" book and
the active perl regex documentation,

What is the "active perl regex documentation"?

Is that different from the standard documentation for Perl?

but could not
find a restriction that backreferences must not be
used inside quantifiers.

Me either.

What am I doing wrong?

Nothing, other than attempting to use a backreference inside
of a quantifier.

Do it a different way, perhaps:

---------------------
#!/usr/bin/perl
use warnings;
use strict;

my($length, $string) = decompose( '3:abcd' );
print "string '$string' of length '$length'\n";

sub decompose {
my($s) = @_;
return() unless $s =~ s/^(\d+)://; # data does not match
my $len = $1;
my $str = substr $s, 0, $len;
return($len, $str);
}

Wolfgang Thomas · Mar 7, 2006

All,

thank you for your replies. You showed me how to better solve the problem.

Nevertheless I think that this restriction (or is it a bug?) should be
documented.

A. Sinan Unur · Mar 7, 2006

thank you for your replies. You showed me how to better solve the
problem.

What way to solve what problem? Please quote some context when you reply.

Nevertheless I think that this restriction (or is it a bug?) should be
documented.

Feel free to document it.

Sinan

Ilya Zakharevich · Mar 7, 2006

[A complimentary Cc of this posting was sent to
Wolfgang Thomas

$s =~ /([\d]+).{\1})/;

This should match, e.g.,

123:a{123}

"{" is special in REx only in very few of contexts. When working over
RExen, I tried to "f1x" this misfeature (inheritance of [IMO,
completely broken] HS implementation); however, there was not way to
even insert a warning without heavy backward-compatibility penalty.

The best one can hope for is what the latest CPerl is doing to
circumvent this misfortune: it highlights "{" differently in the
different meanings...

Hope this helps,
Ilya

Wolfgang Thomas · Mar 7, 2006

Ilya said:
$s =~ /([\d]+).{\1})/;

Click to expand...

This should match, e.g.,

123:a{123}

"{" is special in REx only in very few of contexts. When working over
RExen, I tried to "f1x" this misfeature (inheritance of [IMO,
completely broken] HS implementation); however, there was not way to
even insert a warning without heavy backward-compatibility penalty.

The best one can hope for is what the latest CPerl is doing to
circumvent this misfortune: it highlights "{" differently in the
different meanings...

Hope this helps,

This was in fact very helpful. Thanks a lot.

Tad McClellan · Mar 8, 2006

Ilya Zakharevich said:
[A complimentary Cc of this posting was sent to
Wolfgang Thomas

$s =~ /([\d]+).{\1})/;

Click to expand...

This should match, e.g.,

123:a{123}

"{" is special in REx only in very few of contexts.

Aha!

So it is only incompletely documented (from perlre.pod):

The following standard quantifiers are recognized:

* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times

(If a curly bracket occurs in any other context, it is treated
as a regular character.)

Looks like the OP's use of curly was in one of those "other" contexts...

John W. Krahn · Mar 8, 2006

Wolfgang said:
I have a line of the following format:
string length followed by colon followed by the actual
string.
To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;
print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

If you didn't have that colon in the way you could use unpack():

$ perl -le'
my $s = "3:abcd";
print unpack "A/A*", $s;
'
:ab

John

Ilya Zakharevich · Mar 8, 2006

[A complimentary Cc of this posting was sent to
Tad McClellan

So it is only incompletely documented (from perlre.pod):

The following standard quantifiers are recognized:

* Match 0 or more times
+ Match 1 or more times
? Match 1 or 0 times
{n} Match exactly n times
{n,} Match at least n times
{n,m} Match at least n but not more than m times

(If a curly bracket occurs in any other context, it is treated
as a regular character.)

As usual, when documenting a historical misfeature, it is better to
insert an f-word (well, a c-word in this case ;-):

(CURRENTLY, If a curly bracket occurs in any other context, it is treated
as a regular character.)

Yours,
Ilya

Charles DeRykus · Mar 9, 2006

Wolfgang said:
I have a line of the following format:
string length followed by colon followed by the actual
string.
To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;
print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

I have studied the "Perl Programming" book and
the active perl regex documentation, but could not
find a restriction that backreferences must not be
used inside quantifiers.

What am I doing wrong?

An extended regex possibility:

my $pos;
if ( $s =~ /(\d+)

?{ $pos=pos })/ ) {
print "count=$1 substring=",substr($s, $pos, $1);
}

Xicheng · Mar 9, 2006

John said:
Wolfgang said:

I have a line of the following format:
string length followed by colon followed by the actual
string.
To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;
print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

Click to expand...

If you didn't have that colon in the way you could use unpack():

$ perl -le'
my $s = "3:abcd";
print unpack "A/A*", $s;
'
:ab

this behavier of unpack() is really interesting

, but I think he can
skip that colon by adding a 'x', like:

$ perl -le'
my $s = "3:abcd";
print unpack "Ax/A*", $s;
'
===print====
abc
=========

Xicheng

Xicheng · Mar 9, 2006

Xicheng said:
John said:

Wolfgang said:

I have a line of the following format:
string length followed by colon followed by the actual
string.
To extract the string with the correct length I use the
following regular expression:

my $s = "3:abcd";
$s =~ /([\d]+).{\1})/;
print "$1\n";
print "$2\n";

However this does not match. Neither $1 nor $2 become
defined. If I replace \1 with 3 it works as expected,
I get 3 in $1 and "abc" in $2.

Click to expand...

If you didn't have that colon in the way you could use unpack():

$ perl -le'
my $s = "3:abcd";
print unpack "A/A*", $s;
'
:ab

Click to expand...

this behavier of unpack() is really interesting, but I think he can
skip that colon by adding a 'x', like:

$ perl -le'
my $s = "3:abcd";
print unpack "Ax/A*", $s;
'
===print====
abc
=========

after checking up "Perl Pocket Reference", I found I dont even need
this '*', and I can use a number to replace 'x' coz of the way perl
handles "numeric+strings"......

print unpack "A2/A", $s;

but this is not robust, coz it works only on the fixed width records
which means the number of characters before colon should be fixed. so
this can not handle:

$s = "10:abcdefghijk";

which should use:

print unpack "A3/A", $s;

Xicheng

Page do not work, when adding php code	1	Sep 16, 2022
CSS Grid inside slider not working...	1	Nov 28, 2022
C language. work with text	3	Dec 10, 2021
Backreferences: alias vs copy	11	Aug 10, 2008
My regex kung-fu is not strong =(	0	Apr 4, 2020
help with regex	7	Jun 19, 2013
Using python recursion to calculate the Parenthesis part not working	4	Feb 5, 2023
All CRUD operations work except POST. Why?	2	May 28, 2023

Regex: Backreferences do not work inside quantifiers?

Wolfgang Thomas

it_says_BALLS_on_your forehead

Wolfgang Thomas

A. Sinan Unur

Matt Garrish

Matt Garrish

Tad McClellan

Wolfgang Thomas

A. Sinan Unur

Ilya Zakharevich

Wolfgang Thomas

Tad McClellan

John W. Krahn

Ilya Zakharevich

Charles DeRykus

Xicheng

Xicheng

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads