backreference oddity

Discussion in 'Perl Misc' started by poncenby, Oct 6, 2006.

  1. poncenby

    poncenby Guest

    i have a file which has lines of text with fields separated by a space.
    some of the fields are prefixed with a number and a space, like the
    line below...

    bar1 bar2 XX 10 bar3tooten
    foo1 foo2 XX 15 foo3uptofifteen

    as you can see, the numbers (10 and 15) are the length of the field
    after the number.
    so i want to use these numbers as length specifier to match the field
    after the number, with a regex like either of these:

    /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
    /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

    both regexs will make the program fall over when attempting to print
    $4.

    i've figured out a solution with a regex over two lines but am curious
    why this doesn't work.

    thanks in advance

    poncenby
    poncenby, Oct 6, 2006
    #1
    1. Advertising

  2. "poncenby" <> wrote in
    news::

    > i have a file which has lines of text with fields separated by a
    > space. some of the fields are prefixed with a number and a space, like
    > the line below...
    >
    > bar1 bar2 XX 10 bar3tooten
    > foo1 foo2 XX 15 foo3uptofifteen
    >
    > as you can see, the numbers (10 and 15) are the length of the field
    > after the number.
    > so i want to use these numbers as length specifier to match the field
    > after the number, with a regex like either of these:
    >
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/


    You can't used the capture variable here, but your problem is ...

    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/


    Ahem ... Did you read the error message? Without any testing, I can see
    that you should havve [0-9]+ rather than the [0-9)+ you used above.

    Have you read the posting guidelines yet? You should always post a short
    but complete script that illustrates the problem, so others can try it
    with the minimum of effort.

    Sinan

    Sinan
    A. Sinan Unur, Oct 6, 2006
    #2
    1. Advertising

  3. poncenby

    -berlin.de Guest

    poncenby <> wrote in comp.lang.perl.misc:
    > i have a file which has lines of text with fields separated by a space.
    > some of the fields are prefixed with a number and a space, like the
    > line below...
    >
    > bar1 bar2 XX 10 bar3tooten
    > foo1 foo2 XX 15 foo3uptofifteen
    >
    > as you can see, the numbers (10 and 15) are the length of the field
    > after the number.
    > so i want to use these numbers as length specifier to match the field
    > after the number, with a regex like either of these:
    >
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/
    >
    > both regexs will make the program fall over when attempting to print
    > $4.


    Earlier than that, as has been noted.

    >
    > i've figured out a solution with a regex over two lines but am curious
    > why this doesn't work.


    If a regex gets that big it's time to try something else. The
    pack/unpack functions have a template that can deal with an embedded
    length field. The following code shows how.

    We first use split() to retrieve the three blank-separated variables
    and the rest of the line. The rest starts with the length-delimited
    field. We can use unpack to split off the length-delimited part
    (the 'a3/a' template does that) and capture whatever is left over
    after that ('a*'). I have added some extra noise at the line ends
    to show that the length field is interpreted correctly. See
    "perldoc -f pack" for the details.

    while ( <DATA> ) {
    chomp;
    my ( $one, $two, $three, $rest) = split ' ', $_, 4;
    my $four;
    ( $four, $rest) = unpack 'a3/a a*', $rest;
    print "$one, $two, $three, $four, $rest\n";
    }

    __DATA__
    bar1 bar2 XX 10 bar3tooten+some
    foo1 foo2 XX 15 foo3uptofifteen+more

    Anno
    -berlin.de, Oct 7, 2006
    #3
  4. poncenby

    Eric Amick Guest

    On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
    <> wrote:

    >> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

    >
    >Ahem ... Did you read the error message? Without any testing, I can see
    >that you should havve [0-9]+ rather than the [0-9)+ you used above.


    Maybe this is version-dependent, but that won't do what the OP wants
    even after fixing the syntax error with [0-9). Repeat counts in curly
    brackets have to be constants. Try this and see what I mean:

    perl -Mre=debug -e "/(.+)\s(.{\1})/"
    --
    Eric Amick
    Columbia, MD
    Eric Amick, Oct 7, 2006
    #4
  5. poncenby

    Bob Walton Guest

    poncenby wrote:
    > i have a file which has lines of text with fields separated by a space.
    > some of the fields are prefixed with a number and a space, like the
    > line below...
    >
    > bar1 bar2 XX 10 bar3tooten
    > foo1 foo2 XX 15 foo3uptofifteen
    >
    > as you can see, the numbers (10 and 15) are the length of the field
    > after the number.
    > so i want to use these numbers as length specifier to match the field
    > after the number, with a regex like either of these:
    >
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/

    ]-----------------------^
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

    ]-----------------------^
    >
    > both regexs will make the program fall over when attempting to print
    > $4.
    >
    > i've figured out a solution with a regex over two lines but am curious
    > why this doesn't work.


    Doesn't work because of the syntax error. And because the contents of
    the {...} construction have to be literal digits or digits,digits .

    For a one-liner, try something like:

    use strict;
    use warnings;
    my $v;
    while(<DATA>){
    chomp;
    s/^(.+)\s(.+)\sXX\s(\d+)\s(.*)/$v=substr($4,0,$3);"$1 $2 XX $3 $4";/e;
    print "line:$_:\nv:$v:\n";
    }
    __END__
    bar1 bar2 XX 10 bar3tootenblahblahblah
    foo1 foo2 XX 15 foo3uptofifteenyadayadayada

    (Data was padded to illustrate that it works.) The second expression in
    the replacement expression is present so the value of the replacement
    string is the same as the original string so the "matched" variable is
    preserved in the substitution. Also, I anchored the start so it won't
    match starting partway through a line. Generates:

    D:\junk>perl junk574.pl
    line:bar1 bar2 XX 10 bar3tootenblahblahblah:
    v:bar3tooten:
    line:foo1 foo2 XX 15 foo3uptofifteenyadayadayada:
    v:foo3uptofifteen:

    D:\junk>
    ....
    > poncenby

    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Oct 7, 2006
    #5
  6. Eric Amick <> wrote in
    news::

    > On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
    > <> wrote:
    >
    >>> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

    >>
    >>Ahem ... Did you read the error message? Without any testing, I can see
    >>that you should havve [0-9]+ rather than the [0-9)+ you used above.

    >
    > Repeat counts in curly brackets have to be constants.


    I knew that, of course ;-)

    Thanks for the correction. I focused on the most obvious error and missed
    the other one.

    Sinan
    A. Sinan Unur, Oct 7, 2006
    #6
  7. poncenby

    Dr.Ruud Guest

    poncenby schreef:

    > i have a file which has lines of text with fields separated by a
    > space. some of the fields are prefixed with a number and a space,
    > like the line below...
    >
    > bar1 bar2 XX 10 bar3tooten
    > foo1 foo2 XX 15 foo3uptofifteen
    >
    > as you can see, the numbers (10 and 15) are the length of the field
    > after the number.


    Are these meant for fields with embedded blanks? If not, see split().


    > so i want to use these numbers as length specifier to match the field
    > after the number, with a regex like either of these:
    >
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/


    In addition to the other comments: the "(.+)\s" might first match up to
    the last space, and backtrack from there. Change to "(\S+)\s", or to
    "(.+?)\s".

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Oct 7, 2006
    #7
  8. On Oct 7, 1:58 am, Eric Amick <> wrote:
    > Repeat counts in curly
    > brackets have to be constants. Try this and see what I mean:
    >
    > perl -Mre=debug -e "/(.+)\s(.{\1})/"


    You can use (??{})

    / (.+) \s ( (??{ ".{$1}" }) )/x

    But this is neither vert readable nor very efficient.
    Brian McCauley, Oct 7, 2006
    #8
  9. poncenby

    Ala Qumsieh Guest

    Eric Amick wrote:

    > On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
    > <> wrote:
    >
    >
    >>>/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

    >>
    >>Ahem ... Did you read the error message? Without any testing, I can see
    >>that you should havve [0-9]+ rather than the [0-9)+ you used above.

    >
    >
    > Maybe this is version-dependent, but that won't do what the OP wants
    > even after fixing the syntax error with [0-9). Repeat counts in curly
    > brackets have to be constants.


    No. They can also be variables:

    % perl -le '$_ = "aaa"; $c = 2; print $& if /a{$c}/'
    aa

    --Ala
    Ala Qumsieh, Oct 7, 2006
    #9
  10. poncenby

    Ala Qumsieh Guest

    Bob Walton wrote:

    > Doesn't work because of the syntax error. And because the contents of
    > the {...} construction have to be literal digits or digits,digits .


    Not true. They can be variables. See my other post in this thread.

    --Ala
    Ala Qumsieh, Oct 7, 2006
    #10
  11. poncenby

    Stan R. Guest

    poncenby wrote:
    > i have a file which has lines of text with fields separated by a
    > space. some of the fields are prefixed with a number and a space,
    > like the line below...
    >
    > bar1 bar2 XX 10 bar3tooten
    > foo1 foo2 XX 15 foo3uptofifteen
    >
    > as you can see, the numbers (10 and 15) are the length of the field
    > after the number.
    > so i want to use these numbers as length specifier to match the field
    > after the number, with a regex like either of these:
    >
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
    > /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/
    >
    > both regexs will make the program fall over when attempting to print
    > $4.
    >
    > i've figured out a solution with a regex over two lines but am curious
    > why this doesn't work.
    >
    > thanks in advance
    >
    > poncenby


    This'll do the trick:

    /(\S+)\s(\S+)\sXX\s([0-9]+)\s((??{".{$3}"}))/

    __EXAMPLE__
    #!/usr/local/bin/perl

    use strict;

    my $s =
    qq{bar1 bar2 XX 10 bar3tooten\n}.
    qq{foo1 foo2 XX 15 foo3uptofifteen\n};

    while ($s =~ /(\S+)\s(\S+)\sXX\s([0-9]+)\s((??{".{$3}"}))/g) {
    print qq{1($1) 2($2) 3($3) 4($4)\n};
    }

    __OUTPUT__
    1(bar1) 2(bar2) 3(10) 4(bar3tooten)
    1(foo1) 2(foo2) 3(15) 4(foo3uptofifteen)

    --
    Stan
    Stan R., Oct 7, 2006
    #11
  12. poncenby

    Stan R. Guest

    Ala Qumsieh wrote:
    > Eric Amick wrote:
    >
    >> On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
    >> <> wrote:
    >>
    >>
    >>>> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/
    >>>
    >>> Ahem ... Did you read the error message? Without any testing, I can
    >>> see that you should havve [0-9]+ rather than the [0-9)+ you used
    >>> above.

    >>
    >>
    >> Maybe this is version-dependent, but that won't do what the OP wants
    >> even after fixing the syntax error with [0-9). Repeat counts in curly
    >> brackets have to be constants.

    >
    > No. They can also be variables:
    >
    > % perl -le '$_ = "aaa"; $c = 2; print $& if /a{$c}/'
    > aa


    Precisely, that's why this regex works:

    /(\S+)\s(\S+)\sXX\s([0-9]+)\s((??{".{$3}"}))/

    See my other post for working example.
    <>

    --
    Stan
    Stan R., Oct 7, 2006
    #12
  13. Ala Qumsieh wrote:
    > Eric Amick wrote:
    >
    >> On Fri, 06 Oct 2006 22:50:13 GMT, "A. Sinan Unur"
    >> <> wrote:
    >>
    >>
    >>>> /(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/
    >>>
    >>> Ahem ... Did you read the error message? Without any testing, I can
    >>> see that you should havve [0-9]+ rather than the [0-9)+ you used above.

    >>
    >>
    >> Maybe this is version-dependent, but that won't do what the OP wants
    >> even after fixing the syntax error with [0-9). Repeat counts in curly
    >> brackets have to be constants.

    >
    > No. They can also be variables:
    >
    > % perl -le '$_ = "aaa"; $c = 2; print $& if /a{$c}/'
    > aa


    Variable interpolation happens first so it is a constant when the regular
    expression engine sees it. :)



    John
    --
    Perl isn't a toolbox, but a small machine shop where you can special-order
    certain sorts of tools at low cost and in short order. -- Larry Wall
    John W. Krahn, Oct 7, 2006
    #13
  14. poncenby

    Bob Walton Guest

    Ala Qumsieh wrote:
    > Bob Walton wrote:
    >
    >> Doesn't work because of the syntax error. And because the contents of
    >> the {...} construction have to be literal digits or digits,digits .

    >
    > Not true. They can be variables. See my other post in this thread.
    >
    > --Ala
    >


    Hmmm...yes, I see that this works -- thank you:

    use strict;
    use warnings;
    while(<DATA>){
    chomp;
    /^(.+)\s(.+)\sXX\s(\d+)\s((??{".{$3}"}))/;
    print "line:$_:\n\$4:$4:\n";
    }
    __END__
    bar1 bar2 XX 10 bar3tootenblahblahblah
    foo1 foo2 XX 15 foo3uptofifteenyadayadayada

    --
    Bob Walton
    Email: http://bwalton.com/cgi-bin/emailbob.pl
    Bob Walton, Oct 7, 2006
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. paulm

    Newbie backreference question

    paulm, Jun 30, 2005, in forum: Python
    Replies:
    6
    Views:
    377
    paulm
    Jul 1, 2005
  2. Fredrik Lundh

    backreference in regexp

    Fredrik Lundh, Jan 31, 2006, in forum: Python
    Replies:
    2
    Views:
    333
    =?ISO-8859-1?Q?Sch=FCle_Daniel?=
    Jan 31, 2006
  3. Replies:
    4
    Views:
    613
    jeff emminger
    Aug 18, 2006
  4. abdulet
    Replies:
    2
    Views:
    528
    abdulet
    Oct 23, 2009
  5. Replies:
    4
    Views:
    124
Loading...

Share This Page