backreference oddity

P

poncenby

i have a file which has lines of text with fields separated by a space.
some of the fields are prefixed with a number and a space, like the
line below...

bar1 bar2 XX 10 bar3tooten
foo1 foo2 XX 15 foo3uptofifteen

as you can see, the numbers (10 and 15) are the length of the field
after the number.
so i want to use these numbers as length specifier to match the field
after the number, with a regex like either of these:

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

both regexs will make the program fall over when attempting to print
$4.

i've figured out a solution with a regex over two lines but am curious
why this doesn't work.

thanks in advance

poncenby
 
A

A. Sinan Unur

i have a file which has lines of text with fields separated by a
space. some of the fields are prefixed with a number and a space, like
the line below...

bar1 bar2 XX 10 bar3tooten
foo1 foo2 XX 15 foo3uptofifteen

as you can see, the numbers (10 and 15) are the length of the field
after the number.
so i want to use these numbers as length specifier to match the field
after the number, with a regex like either of these:

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/

You can't used the capture variable here, but your problem is ...
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

Ahem ... Did you read the error message? Without any testing, I can see
that you should havve [0-9]+ rather than the [0-9)+ you used above.

Have you read the posting guidelines yet? You should always post a short
but complete script that illustrates the problem, so others can try it
with the minimum of effort.

Sinan

Sinan
 
A

anno4000

poncenby said:
i have a file which has lines of text with fields separated by a space.
some of the fields are prefixed with a number and a space, like the
line below...

bar1 bar2 XX 10 bar3tooten
foo1 foo2 XX 15 foo3uptofifteen

as you can see, the numbers (10 and 15) are the length of the field
after the number.
so i want to use these numbers as length specifier to match the field
after the number, with a regex like either of these:

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

both regexs will make the program fall over when attempting to print
$4.

Earlier than that, as has been noted.
i've figured out a solution with a regex over two lines but am curious
why this doesn't work.

If a regex gets that big it's time to try something else. The
pack/unpack functions have a template that can deal with an embedded
length field. The following code shows how.

We first use split() to retrieve the three blank-separated variables
and the rest of the line. The rest starts with the length-delimited
field. We can use unpack to split off the length-delimited part
(the 'a3/a' template does that) and capture whatever is left over
after that ('a*'). I have added some extra noise at the line ends
to show that the length field is interpreted correctly. See
"perldoc -f pack" for the details.

while ( <DATA> ) {
chomp;
my ( $one, $two, $three, $rest) = split ' ', $_, 4;
my $four;
( $four, $rest) = unpack 'a3/a a*', $rest;
print "$one, $two, $three, $four, $rest\n";
}

__DATA__
bar1 bar2 XX 10 bar3tooten+some
foo1 foo2 XX 15 foo3uptofifteen+more

Anno
 
E

Eric Amick

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

Ahem ... Did you read the error message? Without any testing, I can see
that you should havve [0-9]+ rather than the [0-9)+ you used above.

Maybe this is version-dependent, but that won't do what the OP wants
even after fixing the syntax error with [0-9). Repeat counts in curly
brackets have to be constants. Try this and see what I mean:

perl -Mre=debug -e "/(.+)\s(.{\1})/"
 
B

Bob Walton

poncenby said:
i have a file which has lines of text with fields separated by a space.
some of the fields are prefixed with a number and a space, like the
line below...

bar1 bar2 XX 10 bar3tooten
foo1 foo2 XX 15 foo3uptofifteen

as you can see, the numbers (10 and 15) are the length of the field
after the number.
so i want to use these numbers as length specifier to match the field
after the number, with a regex like either of these:

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/ ]-----------------------^
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/
]-----------------------^

both regexs will make the program fall over when attempting to print
$4.

i've figured out a solution with a regex over two lines but am curious
why this doesn't work.

Doesn't work because of the syntax error. And because the contents of
the {...} construction have to be literal digits or digits,digits .

For a one-liner, try something like:

use strict;
use warnings;
my $v;
while(<DATA>){
chomp;
s/^(.+)\s(.+)\sXX\s(\d+)\s(.*)/$v=substr($4,0,$3);"$1 $2 XX $3 $4";/e;
print "line:$_:\nv:$v:\n";
}
__END__
bar1 bar2 XX 10 bar3tootenblahblahblah
foo1 foo2 XX 15 foo3uptofifteenyadayadayada

(Data was padded to illustrate that it works.) The second expression in
the replacement expression is present so the value of the replacement
string is the same as the original string so the "matched" variable is
preserved in the substitution. Also, I anchored the start so it won't
match starting partway through a line. Generates:

D:\junk>perl junk574.pl
line:bar1 bar2 XX 10 bar3tootenblahblahblah:
v:bar3tooten:
line:foo1 foo2 XX 15 foo3uptofifteenyadayadayada:
v:foo3uptofifteen:

D:\junk>
....
 
A

A. Sinan Unur

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

Ahem ... Did you read the error message? Without any testing, I can see
that you should havve [0-9]+ rather than the [0-9)+ you used above.

Repeat counts in curly brackets have to be constants.

I knew that, of course ;-)

Thanks for the correction. I focused on the most obvious error and missed
the other one.

Sinan
 
D

Dr.Ruud

poncenby schreef:
i have a file which has lines of text with fields separated by a
space. some of the fields are prefixed with a number and a space,
like the line below...

bar1 bar2 XX 10 bar3tooten
foo1 foo2 XX 15 foo3uptofifteen

as you can see, the numbers (10 and 15) are the length of the field
after the number.

Are these meant for fields with embedded blanks? If not, see split().

so i want to use these numbers as length specifier to match the field
after the number, with a regex like either of these:

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/

In addition to the other comments: the "(.+)\s" might first match up to
the last space, and backtrack from there. Change to "(\S+)\s", or to
"(.+?)\s".
 
B

Brian McCauley

Repeat counts in curly
brackets have to be constants. Try this and see what I mean:

perl -Mre=debug -e "/(.+)\s(.{\1})/"

You can use (??{})

/ (.+) \s ( (??{ ".{$1}" }) )/x

But this is neither vert readable nor very efficient.
 
A

Ala Qumsieh

Eric said:
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

Ahem ... Did you read the error message? Without any testing, I can see
that you should havve [0-9]+ rather than the [0-9)+ you used above.


Maybe this is version-dependent, but that won't do what the OP wants
even after fixing the syntax error with [0-9). Repeat counts in curly
brackets have to be constants.

No. They can also be variables:

% perl -le '$_ = "aaa"; $c = 2; print $& if /a{$c}/'
aa

--Ala
 
A

Ala Qumsieh

Bob said:
Doesn't work because of the syntax error. And because the contents of
the {...} construction have to be literal digits or digits,digits .

Not true. They can be variables. See my other post in this thread.

--Ala
 
S

Stan R.

poncenby said:
i have a file which has lines of text with fields separated by a
space. some of the fields are prefixed with a number and a space,
like the line below...

bar1 bar2 XX 10 bar3tooten
foo1 foo2 XX 15 foo3uptofifteen

as you can see, the numbers (10 and 15) are the length of the field
after the number.
so i want to use these numbers as length specifier to match the field
after the number, with a regex like either of these:

/(.+)\s(.+)\sXX\s([0-9)+)\s(.{$3})/
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

both regexs will make the program fall over when attempting to print
$4.

i've figured out a solution with a regex over two lines but am curious
why this doesn't work.

thanks in advance

poncenby

This'll do the trick:

/(\S+)\s(\S+)\sXX\s([0-9]+)\s((??{".{$3}"}))/

__EXAMPLE__
#!/usr/local/bin/perl

use strict;

my $s =
qq{bar1 bar2 XX 10 bar3tooten\n}.
qq{foo1 foo2 XX 15 foo3uptofifteen\n};

while ($s =~ /(\S+)\s(\S+)\sXX\s([0-9]+)\s((??{".{$3}"}))/g) {
print qq{1($1) 2($2) 3($3) 4($4)\n};
}

__OUTPUT__
1(bar1) 2(bar2) 3(10) 4(bar3tooten)
1(foo1) 2(foo2) 3(15) 4(foo3uptofifteen)
 
S

Stan R.

Ala said:
Eric said:
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

Ahem ... Did you read the error message? Without any testing, I can
see that you should havve [0-9]+ rather than the [0-9)+ you used
above.


Maybe this is version-dependent, but that won't do what the OP wants
even after fixing the syntax error with [0-9). Repeat counts in curly
brackets have to be constants.

No. They can also be variables:

% perl -le '$_ = "aaa"; $c = 2; print $& if /a{$c}/'
aa

Precisely, that's why this regex works:

/(\S+)\s(\S+)\sXX\s([0-9]+)\s((??{".{$3}"}))/

See my other post for working example.
<[email protected]>
 
J

John W. Krahn

Ala said:
Eric said:
/(.+)\s(.+)\sXX\s([0-9)+)\s(.{\3})/

Ahem ... Did you read the error message? Without any testing, I can
see that you should havve [0-9]+ rather than the [0-9)+ you used above.


Maybe this is version-dependent, but that won't do what the OP wants
even after fixing the syntax error with [0-9). Repeat counts in curly
brackets have to be constants.

No. They can also be variables:

% perl -le '$_ = "aaa"; $c = 2; print $& if /a{$c}/'
aa

Variable interpolation happens first so it is a constant when the regular
expression engine sees it. :)



John
 
B

Bob Walton

Ala said:
Not true. They can be variables. See my other post in this thread.

--Ala

Hmmm...yes, I see that this works -- thank you:

use strict;
use warnings;
while(<DATA>){
chomp;
/^(.+)\s(.+)\sXX\s(\d+)\s((??{".{$3}"}))/;
print "line:$_:\n\$4:$4:\n";
}
__END__
bar1 bar2 XX 10 bar3tootenblahblahblah
foo1 foo2 XX 15 foo3uptofifteenyadayadayada
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,131
Latest member
IsiahLiebe
Top