Regex: Backreferences do not work inside quantifiers?

Discussion in 'Perl Misc' started by Wolfgang Thomas, Mar 7, 2006.

  1. I have a line of the following format:
    string length followed by colon followed by the actual
    string.
    To extract the string with the correct length I use the
    following regular expression:

    my $s = "3:abcd";
    $s =~ /([\d]+):(.{\1})/;
    print "$1\n";
    print "$2\n";


    However this does not match. Neither $1 nor $2 become
    defined. If I replace \1 with 3 it works as expected,
    I get 3 in $1 and "abc" in $2.

    I have studied the "Perl Programming" book and
    the active perl regex documentation, but could not
    find a restriction that backreferences must not be
    used inside quantifiers.

    What am I doing wrong?
     
    Wolfgang Thomas, Mar 7, 2006
    #1
    1. Advertising

  2. Wolfgang Thomas wrote:
    > I have a line of the following format:
    > string length followed by colon followed by the actual
    > string.
    > To extract the string with the correct length I use the
    > following regular expression:
    >
    > my $s = "3:abcd";
    > $s =~ /([\d]+):(.{\1})/;
    > print "$1\n";
    > print "$2\n";
    >
    >
    > However this does not match. Neither $1 nor $2 become
    > defined. If I replace \1 with 3 it works as expected,
    > I get 3 in $1 and "abc" in $2.
    >
    > I have studied the "Perl Programming" book and
    > the active perl regex documentation, but could not
    > find a restriction that backreferences must not be
    > used inside quantifiers.



    i haven't studied this yet, but are you sure regexes are the best tool
    for what you're doing?
     
    it_says_BALLS_on_your forehead, Mar 7, 2006
    #2
    1. Advertising

  3. it_says_BALLS_on_your forehead wrote:
    > Wolfgang Thomas wrote:
    >> I have a line of the following format:
    >> string length followed by colon followed by the actual
    >> string.
    >> To extract the string with the correct length I use the
    >> following regular expression:


    >
    > i haven't studied this yet, but are you sure regexes are the best tool
    > for what you're doing?
    >


    Maybe not, but still I wonder why it does not work.
     
    Wolfgang Thomas, Mar 7, 2006
    #3
  4. Wolfgang Thomas <> wrote in news:dukl07$lhr$01$1
    @news.t-online.com:

    > I have a line of the following format:
    > string length followed by colon followed by the actual
    > string.
    > To extract the string with the correct length I use the
    > following regular expression:
    >
    > my $s = "3:abcd";
    > $s =~ /([\d]+):(.{\1})/;


    Where did you get the notion that backreferences could be used in this
    way?

    ....

    > What am I doing wrong?


    You are using regular expressions to solve a problem to which they are
    ill-suited.

    Important question: What do you want to do if the string to the right of
    the colon is shorter than the length specified?

    Your attempted use of .{\1} means you want the match to fail in that
    case. I don't know if this matters.

    #!/usr/bin/perl

    use strict;
    use warnings;

    while ( <DATA> ) {
    chomp;
    next unless length;
    my $length = 0 + substr $_, 0, index($_, ':');
    my $string = substr $_, 1 + index($_, ':'), $length;
    print "Length = $length\nString = $string\n";
    }


    __DATA__
    3:abcd
    10:012345689
    3:abc
    5:aaa



    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Mar 7, 2006
    #4
  5. Wolfgang Thomas

    Matt Garrish Guest

    "Wolfgang Thomas" <> wrote in message
    news:dukl07$lhr$01$-online.com...
    >I have a line of the following format:
    > string length followed by colon followed by the actual
    > string.


    So why aren't you using split and substr?

    > To extract the string with the correct length I use the
    > following regular expression:
    >
    > my $s = "3:abcd";
    > $s =~ /([\d]+):(.{\1})/;


    \d is shorthand for a character class; why are you then putting it in one?

    > print "$1\n";
    > print "$2\n";
    >
    >
    > However this does not match. Neither $1 nor $2 become
    > defined. If I replace \1 with 3 it works as expected,
    > I get 3 in $1 and "abc" in $2.
    >


    That's because you can't dynamically assign the value. To perl it's just
    braces and a comma to match. For example:

    my $s = "3:a{,}bcd";
    $s =~ /(\d+):(.{\1,})/;
    print "$1\n";
    print "$2\n";

    There might be some way to do this using the extended regexes, but off the
    top of my head I couldn't say, and would recommend the two functions named
    above... : )

    Matt
     
    Matt Garrish, Mar 7, 2006
    #5
  6. Wolfgang Thomas

    Matt Garrish Guest

    "Matt Garrish" <> wrote in message
    news:y2lPf.639$...
    >
    > "Wolfgang Thomas" <> wrote in message
    > news:dukl07$lhr$01$-online.com...
    >>I have a line of the following format:
    >> string length followed by colon followed by the actual
    >> string.

    >
    > So why aren't you using split and substr?
    >
    >> To extract the string with the correct length I use the
    >> following regular expression:
    >>
    >> my $s = "3:abcd";
    >> $s =~ /([\d]+):(.{\1})/;

    >
    > \d is shorthand for a character class; why are you then putting it in one?
    >
    >> print "$1\n";
    >> print "$2\n";
    >>
    >>
    >> However this does not match. Neither $1 nor $2 become
    >> defined. If I replace \1 with 3 it works as expected,
    >> I get 3 in $1 and "abc" in $2.
    >>

    >
    > That's because you can't dynamically assign the value. To perl it's just
    > braces and a comma to match. For example:
    >
    > my $s = "3:a{,}bcd";


    my $s = "3:a{3,}bcd";

    Matt
     
    Matt Garrish, Mar 7, 2006
    #6
  7. Wolfgang Thomas <> wrote:
    > I have a line of the following format:
    > string length followed by colon followed by the actual
    > string.


    > my $s = "3:abcd";
    > $s =~ /([\d]+):(.{\1})/;



    The square brackets serve no purpose there.

    You would need the s///s modifier to handle "3:1\n34567".


    > print "$1\n";
    > print "$2\n";



    You should *never* use the dollar-digit variables unless you
    have first ensured that the pattern match *succeeded*:

    if ( $s =~ /(\d+):(.{\1})/s ) {
    print "$1\n";
    ...


    > I have studied the "Perl Programming" book and
    > the active perl regex documentation,



    What is the "active perl regex documentation"?

    Is that different from the standard documentation for Perl?


    > but could not
    > find a restriction that backreferences must not be
    > used inside quantifiers.



    Me either.


    > What am I doing wrong?



    Nothing, other than attempting to use a backreference inside
    of a quantifier. :)

    Do it a different way, perhaps:


    ---------------------
    #!/usr/bin/perl
    use warnings;
    use strict;

    my($length, $string) = decompose( '3:abcd' );
    print "string '$string' of length '$length'\n";

    sub decompose {
    my($s) = @_;
    return() unless $s =~ s/^(\d+)://; # data does not match
    my $len = $1;
    my $str = substr $s, 0, $len;
    return($len, $str);
    }
    ---------------------


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Mar 7, 2006
    #7
  8. All,

    thank you for your replies. You showed me how to better solve the problem.

    Nevertheless I think that this restriction (or is it a bug?) should be
    documented.
     
    Wolfgang Thomas, Mar 7, 2006
    #8
  9. Wolfgang Thomas <> wrote in
    news:duks94$ve4$00$-online.com:

    > thank you for your replies. You showed me how to better solve the
    > problem.


    What way to solve what problem? Please quote some context when you reply.

    > Nevertheless I think that this restriction (or is it a bug?) should be
    > documented.


    Feel free to document it.

    Sinan
    --
    A. Sinan Unur <>
    (reverse each component and remove .invalid for email address)

    comp.lang.perl.misc guidelines on the WWW:
    http://mail.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
     
    A. Sinan Unur, Mar 7, 2006
    #9
  10. [A complimentary Cc of this posting was sent to
    Wolfgang Thomas
    <>], who wrote in article <dukl07$lhr$01$-online.com>:
    > $s =~ /([\d]+):(.{\1})/;


    This should match, e.g.,

    123:a{123}

    "{" is special in REx only in very few of contexts. When working over
    RExen, I tried to "f1x" this misfeature (inheritance of [IMO,
    completely broken] HS implementation); however, there was not way to
    even insert a warning without heavy backward-compatibility penalty.

    The best one can hope for is what the latest CPerl is doing to
    circumvent this misfortune: it highlights "{" differently in the
    different meanings...

    Hope this helps,
    Ilya
     
    Ilya Zakharevich, Mar 7, 2006
    #10
  11. Ilya Zakharevich wrote:

    >> $s =~ /([\d]+):(.{\1})/;

    >
    > This should match, e.g.,
    >
    > 123:a{123}
    >
    > "{" is special in REx only in very few of contexts. When working over
    > RExen, I tried to "f1x" this misfeature (inheritance of [IMO,
    > completely broken] HS implementation); however, there was not way to
    > even insert a warning without heavy backward-compatibility penalty.
    >
    > The best one can hope for is what the latest CPerl is doing to
    > circumvent this misfortune: it highlights "{" differently in the
    > different meanings...
    >
    > Hope this helps,


    This was in fact very helpful. Thanks a lot.
     
    Wolfgang Thomas, Mar 7, 2006
    #11
  12. Ilya Zakharevich <> wrote:
    > [A complimentary Cc of this posting was sent to
    > Wolfgang Thomas
    ><>], who wrote in article <dukl07$lhr$01$-online.com>:
    >> $s =~ /([\d]+):(.{\1})/;

    >
    > This should match, e.g.,
    >
    > 123:a{123}
    >
    > "{" is special in REx only in very few of contexts.



    Aha!

    So it is only incompletely documented (from perlre.pod):

    The following standard quantifiers are recognized:

    * Match 0 or more times
    + Match 1 or more times
    ? Match 1 or 0 times
    {n} Match exactly n times
    {n,} Match at least n times
    {n,m} Match at least n but not more than m times

    (If a curly bracket occurs in any other context, it is treated
    as a regular character.)

    Looks like the OP's use of curly was in one of those "other" contexts...


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Mar 8, 2006
    #12
  13. Wolfgang Thomas wrote:
    > I have a line of the following format:
    > string length followed by colon followed by the actual
    > string.
    > To extract the string with the correct length I use the
    > following regular expression:
    >
    > my $s = "3:abcd";
    > $s =~ /([\d]+):(.{\1})/;
    > print "$1\n";
    > print "$2\n";
    >
    >
    > However this does not match. Neither $1 nor $2 become
    > defined. If I replace \1 with 3 it works as expected,
    > I get 3 in $1 and "abc" in $2.


    If you didn't have that colon in the way you could use unpack():

    $ perl -le'
    my $s = "3:abcd";
    print unpack "A/A*", $s;
    '
    :ab



    John
    --
    use Perl;
    program
    fulfillment
     
    John W. Krahn, Mar 8, 2006
    #13
  14. [A complimentary Cc of this posting was sent to
    Tad McClellan
    <>], who wrote in article <>:
    > > "{" is special in REx only in very few of contexts.


    > So it is only incompletely documented (from perlre.pod):
    >
    > The following standard quantifiers are recognized:
    >
    > * Match 0 or more times
    > + Match 1 or more times
    > ? Match 1 or 0 times
    > {n} Match exactly n times
    > {n,} Match at least n times
    > {n,m} Match at least n but not more than m times
    >
    > (If a curly bracket occurs in any other context, it is treated
    > as a regular character.)


    As usual, when documenting a historical misfeature, it is better to
    insert an f-word (well, a c-word in this case ;-):

    (CURRENTLY, If a curly bracket occurs in any other context, it is treated
    as a regular character.)

    Yours,
    Ilya
     
    Ilya Zakharevich, Mar 8, 2006
    #14
  15. Wolfgang Thomas wrote:
    > I have a line of the following format:
    > string length followed by colon followed by the actual
    > string.
    > To extract the string with the correct length I use the
    > following regular expression:
    >
    > my $s = "3:abcd";
    > $s =~ /([\d]+):(.{\1})/;
    > print "$1\n";
    > print "$2\n";
    >
    >
    > However this does not match. Neither $1 nor $2 become
    > defined. If I replace \1 with 3 it works as expected,
    > I get 3 in $1 and "abc" in $2.
    >
    > I have studied the "Perl Programming" book and
    > the active perl regex documentation, but could not
    > find a restriction that backreferences must not be
    > used inside quantifiers.
    >
    > What am I doing wrong?


    An extended regex possibility:

    my $pos;
    if ( $s =~ /(\d+):(?{ $pos=pos })/ ) {
    print "count=$1 substring=",substr($s, $pos, $1);
    }

    --
    Charles DeRykus
     
    Charles DeRykus, Mar 9, 2006
    #15
  16. Wolfgang Thomas

    Xicheng Guest

    John W. Krahn wrote:
    > Wolfgang Thomas wrote:
    > > I have a line of the following format:
    > > string length followed by colon followed by the actual
    > > string.
    > > To extract the string with the correct length I use the
    > > following regular expression:
    > >
    > > my $s = "3:abcd";
    > > $s =~ /([\d]+):(.{\1})/;
    > > print "$1\n";
    > > print "$2\n";
    > >
    > >
    > > However this does not match. Neither $1 nor $2 become
    > > defined. If I replace \1 with 3 it works as expected,
    > > I get 3 in $1 and "abc" in $2.

    >
    > If you didn't have that colon in the way you could use unpack():
    >
    > $ perl -le'
    > my $s = "3:abcd";
    > print unpack "A/A*", $s;
    > '
    > :ab


    this behavier of unpack() is really interesting:), but I think he can
    skip that colon by adding a 'x', like:

    $ perl -le'
    my $s = "3:abcd";
    print unpack "Ax/A*", $s;
    '
    ===print====
    abc
    =========

    Xicheng
     
    Xicheng, Mar 9, 2006
    #16
  17. Wolfgang Thomas

    Xicheng Guest

    Xicheng wrote:
    > John W. Krahn wrote:
    > > Wolfgang Thomas wrote:
    > > > I have a line of the following format:
    > > > string length followed by colon followed by the actual
    > > > string.
    > > > To extract the string with the correct length I use the
    > > > following regular expression:
    > > >
    > > > my $s = "3:abcd";
    > > > $s =~ /([\d]+):(.{\1})/;
    > > > print "$1\n";
    > > > print "$2\n";
    > > >
    > > >
    > > > However this does not match. Neither $1 nor $2 become
    > > > defined. If I replace \1 with 3 it works as expected,
    > > > I get 3 in $1 and "abc" in $2.

    > >
    > > If you didn't have that colon in the way you could use unpack():
    > >
    > > $ perl -le'
    > > my $s = "3:abcd";
    > > print unpack "A/A*", $s;
    > > '
    > > :ab

    >
    > this behavier of unpack() is really interesting:), but I think he can
    > skip that colon by adding a 'x', like:
    >
    > $ perl -le'
    > my $s = "3:abcd";
    > print unpack "Ax/A*", $s;
    > '
    > ===print====
    > abc
    > =========


    after checking up "Perl Pocket Reference", I found I dont even need
    this '*', and I can use a number to replace 'x' coz of the way perl
    handles "numeric+strings"......

    print unpack "A2/A", $s;

    but this is not robust, coz it works only on the fixed width records
    which means the number of characters before colon should be fixed. so
    this can not handle:

    $s = "10:abcdefghijk";

    which should use:

    print unpack "A3/A", $s;

    Xicheng
     
    Xicheng, Mar 9, 2006
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dhek bhun kho

    java.util.regex: Backreferences?

    dhek bhun kho, Jul 9, 2003, in forum: Java
    Replies:
    2
    Views:
    788
    dhek bhun kho
    Jul 9, 2003
  2. Replies:
    1
    Views:
    1,954
  3. Replies:
    1
    Views:
    386
    Joshua Cranmer
    Sep 9, 2007
  4. Amy Lee
    Replies:
    3
    Views:
    399
    Amy Lee
    Oct 24, 2008
  5. Replies:
    0
    Views:
    139
Loading...

Share This Page