Setting backreference inside of a string

J

Jason C

I'm doing a replace, like this:

$text = "Yes dear!";
$pattern = "(D|d)ear";
$replace = "$1eer";

$text =~ s/$pattern/$replace/gi;

That's just an example, of course; the real $pattern and $replace come from a database list, and $text comes from form data.

The problem I'm having is that the replace is replacing with a literal "$1eer", instead of setting the $1 to (D|d). Meaning, instead of printing:

Yes deer!

I'm printing:

Yes $1eer!

Any suggestions on how to make $1 in $replace refer to the first group in $pattern?
 
P

Peter Makholm

Jason C said:
$text = "Yes dear!"; $pattern = "(D|d)ear"; $replace = "$1eer";

$text =~ s/$pattern/$replace/gi;

Using this code I get "Yes eer!" in $text...
Any suggestions on how to make $1 in $replace refer to the first group
in $pattern?

You need to look at the /e modifier to your substitution.

//Makholm
 
J

Jason C

Using this code I get "Yes eer!" in $text...

Could be a minor variation in what I posted vs. my actual code. I didn't post the whole thing because I thought it was unnecessarily complicated, but it's technically:

my $sth = $dbh->prepare("SELECT * FROM table");
$sth->execute();

while (($pattern, $replace) = $sth->fetchrow_array()) {
$text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;
}

You need to look at the /e modifier to your substitution.

Thanks for the tip. I've read a bit on the 'e' modifier now, but I'm not quite understanding how to use it for this application.

In retrospect, what I think is happening is that the while() loop is treating $replace as if it is in a single quote instead of double. So instead of it reading like:

$pattern = "(D|d)ear";
$replace = "$1eer";

it's reading:

$pattern = '(D|d)ear';
$replace = '$1eer';

So the question may really be, how do I get it to read $replace as interpretive?
 
W

Wolf Behrenhoff

Am 10.09.2012 11:36, schrieb Jason C:
Thanks for the tip. I've read a bit on the 'e' modifier now, but I'm not quite understanding how to use it for this application.

For example like this:

$ perl -E '$r=q("${1}eer");($_="hello")=~s/(ll)/$r/ee; say'
helleero

- Wolf
 
P

Peter Makholm

Wolf Behrenhoff said:
For example like this:

$ perl -E '$r=q("${1}eer");($_="hello")=~s/(ll)/$r/ee; say'
helleero

So, after matching 'll' and asigning it to $1 it is replaced by

eval( eval '$r' )

Start by computing the inner eval we get

eval ( '"$1eer"')

Remembering that $1 was "ll" this evaluates to

"lleer"

//Makholm
 
C

C.DeRykus

Could be a minor variation in what I posted vs. my actual code. I didn't post the whole thing because I thought it was unnecessarily complicated, but it's technically:



my $sth = $dbh->prepare("SELECT * FROM table");

$sth->execute();



while (($pattern, $replace) = $sth->fetchrow_array()) {

$text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;

}








Thanks for the tip. I've read a bit on the 'e' modifier now, but I'm not quite understanding how to use it for this application.



In retrospect, what I think is happening is that the while() loop is treating $replace as if it is in a single quote instead of double. So instead of it reading like:



$pattern = "(D|d)ear";

$replace = "$1eer";



it's reading:



$pattern = '(D|d)ear';

$replace = '$1eer';



So the question may really be, how do I get it to read $replace as interpretive?

One way to avoid an 'ee' solution's drawbacks
is just pull the backref out of the pattern:

my $pattern = '(D|d)ear';
my $replace = 'eer';

$text =~ s/$pattern/$1$replace/gi;
 
J

Jason C

One way to avoid an 'ee' solution's drawbacks
is just pull the backref out of the pattern:

my $pattern = '(D|d)ear';
my $replace = 'eer';

$text =~ s/$pattern/$1$replace/gi;

That was my original thought, too, but I also have rows where the () isn't at the beginning. Eg:

$pattern = 'smart(\s)*ass';
$replace = 'smart$1butt';

I really would like to avoid using /ee, though, for the security reasons mentioned earlier.

Maybe something like:

$text = "Yes dear!";
$pattern = '(D|d)ear';
$replace = '$1eer';

# if $pattern doesn't contain a backreference
# create an empty one
if ($pattern !~ /\(.*?\)/g) {
$pattern = "()*?" . $pattern;
}

$replace =~ s/\$1/<marker>/g;
# now, $replace = '<marker>eer';

while ($text =~ /$pattern/g) {
$replace =~ s/<marker>/$1/g;
$text =~ s/$pattern/$replace/gi;
}


I haven't tested that, I'm just spit-balling the logic. Thoughts?
 
C

C.DeRykus

That was my original thought, too, but I also have rows where the () isn't at the beginning. Eg:



$pattern = 'smart(\s)*ass';

$replace = 'smart$1butt';



I really would like to avoid using /ee, though, for the security reasons mentioned earlier.



Maybe something like:



$text = "Yes dear!";

$pattern = '(D|d)ear';

$replace = '$1eer';



# if $pattern doesn't contain a backreference

# create an empty one

if ($pattern !~ /\(.*?\)/g) {

$pattern = "()*?" . $pattern;

}



$replace =~ s/\$1/<marker>/g;

# now, $replace = '<marker>eer';



while ($text =~ /$pattern/g) {

$replace =~ s/<marker>/$1/g;

$text =~ s/$pattern/$replace/gi;

}





I haven't tested that, I'm just spit-balling the logic. Thoughts?

I'm not sure I follow entirely but, IMO, separate regexes would be much easier and more maintainable
than trying to do this in a single regex.

Only if there's a huge bottleneck, would I bother,
trying to re-factor...
 
J

Jason C

I'm not sure I follow entirely but, IMO, separate regexes would be much easier and more maintainable

than trying to do this in a single regex.

Only if there's a huge bottleneck, would I bother,
trying to re-factor...

You might have missed it before, but on the live site, $pattern and $replace are coming from a database. Like so:

my $sth = $dbh->prepare("SELECT * FROM table");
$sth->execute();

while (($pattern, $replace) = $sth->fetchrow_array()) {
$text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;
}

The first group in $pattern can actually be anywhere in the string, so one row might be:

(D|d)ear

while the next might be:

smart(\s*)ass

The issue comes in where $1 is defined as non-interpretive in the database, and I'm not sure how to make it interpretive in the replacement.

The while() loop that I presented in the last post is an attempt to replace the non-interpretive '$1' with '<marker>', then replace '<marker>' back with the interpretive "$1".
 
W

Willem

Jason C wrote:
) On Tuesday, September 11, 2012 4:50:20 PM UTC-4, C.DeRykus wrote:
)
)> One way to avoid an 'ee' solution's drawbacks
)> is just pull the backref out of the pattern:
)>
)> my $pattern = '(D|d)ear';
)> my $replace = 'eer';
)>
)> $text =~ s/$pattern/$1$replace/gi;
)
) That was my original thought, too, but I also have rows where the () isn't at the beginning. Eg:
)
) $pattern = 'smart(\s)*ass';
) $replace = 'smart$1butt';
)
) I really would like to avoid using /ee, though, for the security reasons mentioned earlier.
)
) Maybe something like:
)
) $text = "Yes dear!";
) $pattern = '(D|d)ear';
) $replace = '$1eer';
)
) # if $pattern doesn't contain a backreference
) # create an empty one
) if ($pattern !~ /\(.*?\)/g) {
) $pattern = "()*?" . $pattern;
) }
)
) $replace =~ s/\$1/<marker>/g;
) # now, $replace = '<marker>eer';
)
) while ($text =~ /$pattern/g) {
) $replace =~ s/<marker>/$1/g;
) $text =~ s/$pattern/$replace/gi;
) }

It would be easier to do the whole thing in a /e expression.
But not interpreting the database string, but just adding your own code.

Like this:

$test =~ s/$pattern/my $s1 = $1; (my $t = $replace) =~ s|\$1|$s1|g; $t/ge;

That should work.

If you want more than just $1, you need a slightly more complicated
expression, probably involving @- and @+.

(I've always wondered why there is no regex-match array perlvar...)


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
J

Jason C

That, IMHO, is basically the right approach, but you don't want to use a
fixed string like "<marker>" because it might appear in the source text.
Instead, you want something like this:

sub dosubst {
my ($repl, $one) = @_;
$repl =~ s/\$(?:\{1\}|1)/$one/g;
$repl;
}

$text =~ s/$pattern/dosubst $replace, $1/gie;

This assumes the replacement only uses $1. If you want to use arbitrary
captures, it gets a little more difficult, since perl doesn't provide an
array-of-all-the-captures variable. You would need to pass $text, \@-
and \@+ into dosubst, and pull the captures out as required.

Thanks to all of you for the help! I did eventually get it working correctly; Ben's reply made it click for me :)

Here's the original regex I was using:

$text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;

and here's the variation that is now working, using the /e modifier:

$text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/dosubst($replace, $1, $2, $3)/egi;

sub dosubst {
my ($repl, $one, $two, $three) = @_;

$repl =~ s/\$(?:\{2\}|2)/$two/g;
$repl = "$one" . $repl . "$three";

return $repl;
}

Essentially, it's sending the uninterpreted '$2' in $pattern to dosubst(), replacing it with the interpreted $two, then returning the whole updated variable.

I hope this helps someone in the future with a similar problem.
 
J

John W. Krahn

Jason said:
Thanks to all of you for the help! I did eventually get it working correctly; Ben's reply made it click for me :)

Here's the original regex I was using:

$text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;

You can't use a modifier on a zero-width pattern. \b matches BETWEEN
characters so there is no way it could be longer than zero.

The pattern 'ing' will never match because the pattern 'in' appears
before it.



John
 
D

Dr.Ruud

I'm doing a replace, like this:

$text = "Yes dear!";
$pattern = "(D|d)ear";
$replace = "$1eer";

$text =~ s/$pattern/$replace/gi;

That's just an example, of course; the real $pattern and $replace come from a database list, and $text comes from form data.

The problem I'm having is that the replace is replacing with a literal "$1eer", instead of setting the $1 to (D|d). Meaning, instead of printing:

Yes deer!

I'm printing:

Yes $1eer!

Any suggestions on how to make $1 in $replace refer to the first group in $pattern?

Check out Template::Toolkit. Etc.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top