I think what I need is a backreference but it doesn't ever match

S

Sara

I wonder if I can use $-vars in the LHS of a substitution? Camel says
"yes" I believe- I think they are called "backreferences" in the regex
section?

For example, I have

CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED

Say the result I want is

CAT1 is ALFIE
DOG2 is FRED
MOUSE
EEL

I tried something like:

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;

which runs with no errors, yet never matches anything. Can some
backref-experienced person point out my sins?


Cheers,
G
 
G

Greg Bacon

: I wonder if I can use $-vars in the LHS of a substitution? Camel says
: "yes" I believe- I think they are called "backreferences" in the regex
: section?
:
: [...]
: I tried something like:
:
: s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;
:
: which runs with no errors, yet never matches anything. Can some
: backref-experienced person point out my sins?

It's a bit tricky:

% cat try
#! /usr/local/bin/perl

use warnings;
use strict;

my $want = <<EOExpectedResult;
CAT1 is ALFIE
DOG2 is FRED
MOUSE
EEL
EOExpectedResult

$_ = <<EOInput;
CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED
EOInput

1 while s/(^|\n)(\w+)\n(.+)\n(\2 is .+?)\n/$1$4\n$3\n/s;

if ($_ eq $want) {
print "Match!\n";
}
else {
print "want = [$want]\n",
"\$_ = [$_]\n";
}
% ./try
Match!
%

Hope this helps,
Greg
 
T

Tad McClellan

Sara said:
I tried something like:

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;

which runs with no errors, yet never matches anything.
^^^^^^^^^^^^^

Yes it does.

Why didn't you post a short and complete program that we can run
that illustrates your problem?

Something like:

-----------------------
#!/usr/bin/perl
use strict;
use warnings;

$_ = 'CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED
';
print "before: $_";

print "-----\n";

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;
print "after : $_";
-----------------------

Since it outputs a changed $_, the s/// *did* match.

So we aren't seeing what you are seeing, which makes it much
harder to fix what you are seeing...


1 while s/^(\w+\d)(.+\n)\1 is ([^\n]+)\n/$1 is $3$2/sm;
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

For example, I have

CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED

Say the result I want is

CAT1 is ALFIE
DOG2 is FRED
MOUSE
EEL

I tried something like:

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;

It will match, but it won't match ALL the cases you want. The reason is
because the regex will have matched PAST "DOG2" after it's finished
substituting "CAT1". One solution is to use look-aheads, which allow you
to match things in a string without actually consuming them in the string.

In case you don't know what I meant, let me give another example:

$s = "abc def ghi abc ghi";
$s =~ s/(\S+)(.*?)\1/$1($2)/g;

This turns $s into "abc(def ghi) ghi". It doesn't do anything with the
ghi...ghi pair, because the regex reads "abc def ghi abc", so the next
time the regex tries matching, there are no matches starting at any point
in the string past the remaining " ghi".

Here's one way to do it with look-aheads in ONE regex:

s{
^ (\w+\d)
(?:
(?= \n (?: .* \n )* \1 \x20 is \x20 (.+) )
|
\x20 is \x20 .+ \n
)
}{ $2 ? "$1 is $2" : "" }egmsx;

There's a bit of work in there. Basically, the regex matches one of two
things. If it matches "FOO1 is BLAH\n", it replaces that with nothing.
Otherwise, if it matches "FOO1" and sees that a newline, followed by some
lines, followed by "FOO1 is BLAH", it CAPTURES the "BLAH", and replaces
the original "FOO1" with "FOO1 is BLAH".

The /e is needed because the replacement is code to be executed, and the
/x is there to allow me to write the regex with abundant whitespace.
(That's why ACTUAL spaces have been replaced with \x20.)
 
S

Sara

Sara said:
I tried something like:

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;

which runs with no errors, yet never matches anything.
^^^^^^^^^^^^^

Yes it does.

Why didn't you post a short and complete program that we can run
that illustrates your problem?

Something like:

-----------------------
#!/usr/bin/perl
use strict;
use warnings;

$_ = 'CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED
';
print "before: $_";

print "-----\n";

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;
print "after : $_";
-----------------------

Since it outputs a changed $_, the s/// *did* match.

So we aren't seeing what you are seeing, which makes it much
harder to fix what you are seeing...


1 while s/^(\w+\d)(.+\n)\1 is ([^\n]+)\n/$1 is $3$2/sm;

Yes I misstated that- it does do ONE match. I didn't post a short
program to illustrate this sorry- this is actually a very tiny part of
a HUGE program, and the actual program operates on an infinitely more
complicated set of data. I can't possibly send you what I'm seeing.
SOrry the others gave me some useful pointers however- no problems
mate!

G
 
S

Sara

Jeff 'japhy' Pinyan said:
[posted & mailed]

For example, I have

CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED

Say the result I want is

CAT1 is ALFIE
DOG2 is FRED
MOUSE
EEL

I tried something like:

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;

It will match, but it won't match ALL the cases you want. The reason is
because the regex will have matched PAST "DOG2" after it's finished
substituting "CAT1". One solution is to use look-aheads, which allow you
to match things in a string without actually consuming them in the string.

In case you don't know what I meant, let me give another example:

$s = "abc def ghi abc ghi";
$s =~ s/(\S+)(.*?)\1/$1($2)/g;

This turns $s into "abc(def ghi) ghi". It doesn't do anything with the
ghi...ghi pair, because the regex reads "abc def ghi abc", so the next
time the regex tries matching, there are no matches starting at any point
in the string past the remaining " ghi".

Here's one way to do it with look-aheads in ONE regex:

s{
^ (\w+\d)
(?:
(?= \n (?: .* \n )* \1 \x20 is \x20 (.+) )
|
\x20 is \x20 .+ \n
)
}{ $2 ? "$1 is $2" : "" }egmsx;

There's a bit of work in there. Basically, the regex matches one of two
things. If it matches "FOO1 is BLAH\n", it replaces that with nothing.
Otherwise, if it matches "FOO1" and sees that a newline, followed by some
lines, followed by "FOO1 is BLAH", it CAPTURES the "BLAH", and replaces
the original "FOO1" with "FOO1 is BLAH".

The /e is needed because the replacement is code to be executed, and the
/x is there to allow me to write the regex with abundant whitespace.
(That's why ACTUAL spaces have been replaced with \x20.)


Whoa lookaheads and backrefs in one regex! Thanks so much for this
tip- and you're correct I'm not familair with them. I'm studying yuor
example which indeed looks pertinant.

Thanks,
G
 
S

Sara

Jeff 'japhy' Pinyan said:
[posted & mailed]

For example, I have

CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED

Say the result I want is

CAT1 is ALFIE
DOG2 is FRED
MOUSE
EEL

I tried something like:

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;

It will match, but it won't match ALL the cases you want. The reason is
because the regex will have matched PAST "DOG2" after it's finished
substituting "CAT1". One solution is to use look-aheads, which allow you
to match things in a string without actually consuming them in the string.

In case you don't know what I meant, let me give another example:

$s = "abc def ghi abc ghi";
$s =~ s/(\S+)(.*?)\1/$1($2)/g;

This turns $s into "abc(def ghi) ghi". It doesn't do anything with the
ghi...ghi pair, because the regex reads "abc def ghi abc", so the next
time the regex tries matching, there are no matches starting at any point
in the string past the remaining " ghi".

Here's one way to do it with look-aheads in ONE regex:

s{
^ (\w+\d)
(?:
(?= \n (?: .* \n )* \1 \x20 is \x20 (.+) )
|
\x20 is \x20 .+ \n
)
}{ $2 ? "$1 is $2" : "" }egmsx;

There's a bit of work in there. Basically, the regex matches one of two
things. If it matches "FOO1 is BLAH\n", it replaces that with nothing.
Otherwise, if it matches "FOO1" and sees that a newline, followed by some
lines, followed by "FOO1 is BLAH", it CAPTURES the "BLAH", and replaces
the original "FOO1" with "FOO1 is BLAH".

The /e is needed because the replacement is code to be executed, and the
/x is there to allow me to write the regex with abundant whitespace.
(That's why ACTUAL spaces have been replaced with \x20.)
 
S

Sara

Jeff 'japhy' Pinyan said:
[posted & mailed]

For example, I have

CAT1
DOG2
MOUSE
EEL
CAT1 is ALFIE
DOG2 is FRED

Say the result I want is

CAT1 is ALFIE
DOG2 is FRED
MOUSE
EEL

I tried something like:

s/^(\w+\d)(.+)\n\1 is ([^\n]+)\n/$1 is $3$2/gsm;

It will match, but it won't match ALL the cases you want. The reason is
because the regex will have matched PAST "DOG2" after it's finished
substituting "CAT1". One solution is to use look-aheads, which allow you
to match things in a string without actually consuming them in the string.

In case you don't know what I meant, let me give another example:

$s = "abc def ghi abc ghi";
$s =~ s/(\S+)(.*?)\1/$1($2)/g;

This turns $s into "abc(def ghi) ghi". It doesn't do anything with the
ghi...ghi pair, because the regex reads "abc def ghi abc", so the next
time the regex tries matching, there are no matches starting at any point
in the string past the remaining " ghi".

Here's one way to do it with look-aheads in ONE regex:

s{
^ (\w+\d)
(?:
(?= \n (?: .* \n )* \1 \x20 is \x20 (.+) )
|
\x20 is \x20 .+ \n
)
}{ $2 ? "$1 is $2" : "" }egmsx;

There's a bit of work in there. Basically, the regex matches one of two
things. If it matches "FOO1 is BLAH\n", it replaces that with nothing.
Otherwise, if it matches "FOO1" and sees that a newline, followed by some
lines, followed by "FOO1 is BLAH", it CAPTURES the "BLAH", and replaces
the original "FOO1" with "FOO1 is BLAH".

The /e is needed because the replacement is code to be executed, and the
/x is there to allow me to write the regex with abundant whitespace.
(That's why ACTUAL spaces have been replaced with \x20.)

Hmm curiously I changed

s/^(\w+\d)(.+)\n\1 ...

to

s/^(\w+\d)(?=.+)\n\1 ...

and now instead of one match I get NONE! Didn't see THAT coming :) How
could ?= have caused the string NOT TO MATCH? It shouldn't change the
matching functionality would it?

G
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

Jeff 'japhy' Pinyan said:
[posted & mailed]

Here's one way to do it with look-aheads in ONE regex:

s{
^ (\w+\d)
(?:
(?= \n (?: .* \n )* \1 \x20 is \x20 (.+) )
|
\x20 is \x20 .+ \n
)
}{ $2 ? "$1 is $2" : "" }egmsx;

Oops! Get rid of the /s modifier! That broke it. Taking the /s modifier
out makes it work.
OK there is a lot of new syntax in here. I read in Camel that ?= is "a
zero-width positive lookahead assertion" [1]. The meaning of a (ZWPLA)
isn't intuitively obvious to me, but I think it's what you refer to as
"the look-ahead"? So for those cases, anything in parens that is
preceeded by the ?= will NOT be consumed, but WILL be considered for
the match.

Yeah... (?=...) basically matches, and if it matches, it backtracks to
where it started to match. It looks ahead for the pattern, but doesn't
CONSUME it.
(?: ) is "the Grouper"? [also 1]

It's the same as ( ) except it creates no $n var for the result? Seems
like sort of like an oddball operator; not not loose the extra syntax
and just ignore that particular $n? But ok..

Yes. (?:...) groups, but doesn't capture. It also lets you put in
modifiers (to turn them on or off), like (?i:...) and (?s-i:...).
 
J

Jeff 'japhy' Pinyan

[posted & mailed]

Hmm curiously I changed

s/^(\w+\d)(.+)\n\1 ...

to

s/^(\w+\d)(?=.+)\n\1 ...

It breaks because the (?=.+) looks ahead to match one or more characters.
Once that succeeds, the regex goes back to where it was in the string. It
then looks for a newline, and what was matched into $1. So unless you
have

FOO1
FOO1 is BAR

then it won't work.
 
S

Sara

Jeff 'japhy' Pinyan said:
[posted & mailed]

Hmm curiously I changed

s/^(\w+\d)(.+)\n\1 ...

to

s/^(\w+\d)(?=.+)\n\1 ...

It breaks because the (?=.+) looks ahead to match one or more characters.
Once that succeeds, the regex goes back to where it was in the string. It
then looks for a newline, and what was matched into $1. So unless you
have

FOO1
FOO1 is BAR

then it won't work.

Wow.. OK I understand this nuance- I'll need to rethink this once again!

Cheers and thanks for the explaination...

G
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top