Setting backreference inside of a string

Discussion in 'Perl Misc' started by Jason C, Sep 10, 2012.

  1. Jason C

    Jason C Guest

    I'm doing a replace, like this:

    $text = "Yes dear!";
    $pattern = "(D|d)ear";
    $replace = "$1eer";

    $text =~ s/$pattern/$replace/gi;

    That's just an example, of course; the real $pattern and $replace come from a database list, and $text comes from form data.

    The problem I'm having is that the replace is replacing with a literal "$1eer", instead of setting the $1 to (D|d). Meaning, instead of printing:

    Yes deer!

    I'm printing:

    Yes $1eer!

    Any suggestions on how to make $1 in $replace refer to the first group in $pattern?
    Jason C, Sep 10, 2012
    #1
    1. Advertising

  2. Jason C <> writes:

    > $text = "Yes dear!"; $pattern = "(D|d)ear"; $replace = "$1eer";
    >
    > $text =~ s/$pattern/$replace/gi;


    Using this code I get "Yes eer!" in $text...

    > Any suggestions on how to make $1 in $replace refer to the first group
    > in $pattern?


    You need to look at the /e modifier to your substitution.

    //Makholm
    Peter Makholm, Sep 10, 2012
    #2
    1. Advertising

  3. Jason C

    Jason C Guest

    On Monday, September 10, 2012 3:57:47 AM UTC-4, Peter Makholm wrote:
    > > $text = "Yes dear!"; $pattern = "(D|d)ear"; $replace = "$1eer";
    > > $text =~ s/$pattern/$replace/gi;

    >
    > Using this code I get "Yes eer!" in $text...


    Could be a minor variation in what I posted vs. my actual code. I didn't post the whole thing because I thought it was unnecessarily complicated, but it's technically:

    my $sth = $dbh->prepare("SELECT * FROM table");
    $sth->execute();

    while (($pattern, $replace) = $sth->fetchrow_array()) {
    $text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;
    }


    > > Any suggestions on how to make $1 in $replace refer to the first group
    > > in $pattern?

    >
    > You need to look at the /e modifier to your substitution.


    Thanks for the tip. I've read a bit on the 'e' modifier now, but I'm not quite understanding how to use it for this application.

    In retrospect, what I think is happening is that the while() loop is treating $replace as if it is in a single quote instead of double. So instead of it reading like:

    $pattern = "(D|d)ear";
    $replace = "$1eer";

    it's reading:

    $pattern = '(D|d)ear';
    $replace = '$1eer';

    So the question may really be, how do I get it to read $replace as interpretive?
    Jason C, Sep 10, 2012
    #3
  4. Am 10.09.2012 11:36, schrieb Jason C:
    >>> Any suggestions on how to make $1 in $replace refer to the first group
    >>> in $pattern?

    >>
    >> You need to look at the /e modifier to your substitution.

    >
    > Thanks for the tip. I've read a bit on the 'e' modifier now, but I'm not quite understanding how to use it for this application.


    For example like this:

    $ perl -E '$r=q("${1}eer");($_="hello")=~s/(ll)/$r/ee; say'
    helleero

    - Wolf
    Wolf Behrenhoff, Sep 10, 2012
    #4
  5. Wolf Behrenhoff <> writes:

    > For example like this:
    >
    > $ perl -E '$r=q("${1}eer");($_="hello")=~s/(ll)/$r/ee; say'
    > helleero


    So, after matching 'll' and asigning it to $1 it is replaced by

    eval( eval '$r' )

    Start by computing the inner eval we get

    eval ( '"$1eer"')

    Remembering that $1 was "ll" this evaluates to

    "lleer"

    //Makholm
    Peter Makholm, Sep 11, 2012
    #5
  6. Jason C

    C.DeRykus Guest

    On Monday, September 10, 2012 2:36:22 AM UTC-7, Jason C wrote:
    > On Monday, September 10, 2012 3:57:47 AM UTC-4, Peter Makholm wrote:
    >
    > > > $text = "Yes dear!"; $pattern = "(D|d)ear"; $replace = "$1eer";

    >
    > > > $text =~ s/$pattern/$replace/gi;

    >
    > >

    >
    > > Using this code I get "Yes eer!" in $text...

    >
    >
    >
    > Could be a minor variation in what I posted vs. my actual code. I didn't post the whole thing because I thought it was unnecessarily complicated, but it's technically:
    >
    >
    >
    > my $sth = $dbh->prepare("SELECT * FROM table");
    >
    > $sth->execute();
    >
    >
    >
    > while (($pattern, $replace) = $sth->fetchrow_array()) {
    >
    > $text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;
    >
    > }
    >
    >
    >
    >
    >
    > > > Any suggestions on how to make $1 in $replace refer to the first group

    >
    > > > in $pattern?

    >
    > >

    >
    > > You need to look at the /e modifier to your substitution.

    >
    >
    >
    > Thanks for the tip. I've read a bit on the 'e' modifier now, but I'm not quite understanding how to use it for this application.
    >
    >
    >
    > In retrospect, what I think is happening is that the while() loop is treating $replace as if it is in a single quote instead of double. So instead of it reading like:
    >
    >
    >
    > $pattern = "(D|d)ear";
    >
    > $replace = "$1eer";
    >
    >
    >
    > it's reading:
    >
    >
    >
    > $pattern = '(D|d)ear';
    >
    > $replace = '$1eer';
    >
    >
    >
    > So the question may really be, how do I get it to read $replace as interpretive?


    One way to avoid an 'ee' solution's drawbacks
    is just pull the backref out of the pattern:

    my $pattern = '(D|d)ear';
    my $replace = 'eer';

    $text =~ s/$pattern/$1$replace/gi;

    --
    Charles DeRykus
    C.DeRykus, Sep 11, 2012
    #6
  7. Jason C

    Jason C Guest

    On Tuesday, September 11, 2012 4:50:20 PM UTC-4, C.DeRykus wrote:

    > One way to avoid an 'ee' solution's drawbacks
    > is just pull the backref out of the pattern:
    >
    > my $pattern = '(D|d)ear';
    > my $replace = 'eer';
    >
    > $text =~ s/$pattern/$1$replace/gi;


    That was my original thought, too, but I also have rows where the () isn't at the beginning. Eg:

    $pattern = 'smart(\s)*ass';
    $replace = 'smart$1butt';

    I really would like to avoid using /ee, though, for the security reasons mentioned earlier.

    Maybe something like:

    $text = "Yes dear!";
    $pattern = '(D|d)ear';
    $replace = '$1eer';

    # if $pattern doesn't contain a backreference
    # create an empty one
    if ($pattern !~ /\(.*?\)/g) {
    $pattern = "()*?" . $pattern;
    }

    $replace =~ s/\$1/<marker>/g;
    # now, $replace = '<marker>eer';

    while ($text =~ /$pattern/g) {
    $replace =~ s/<marker>/$1/g;
    $text =~ s/$pattern/$replace/gi;
    }


    I haven't tested that, I'm just spit-balling the logic. Thoughts?
    Jason C, Sep 12, 2012
    #7
  8. Jason C

    C.DeRykus Guest

    On Tuesday, September 11, 2012 7:55:15 PM UTC-7, Jason C wrote:
    > On Tuesday, September 11, 2012 4:50:20 PM UTC-4, C.DeRykus wrote:
    >
    >
    >
    > > One way to avoid an 'ee' solution's drawbacks

    >
    > > is just pull the backref out of the pattern:

    >
    > >

    >
    > > my $pattern = '(D|d)ear';

    >
    > > my $replace = 'eer';

    >
    > >

    >
    > > $text =~ s/$pattern/$1$replace/gi;

    >
    >
    >
    > That was my original thought, too, but I also have rows where the () isn't at the beginning. Eg:
    >
    >
    >
    > $pattern = 'smart(\s)*ass';
    >
    > $replace = 'smart$1butt';
    >
    >
    >
    > I really would like to avoid using /ee, though, for the security reasons mentioned earlier.
    >
    >
    >
    > Maybe something like:
    >
    >
    >
    > $text = "Yes dear!";
    >
    > $pattern = '(D|d)ear';
    >
    > $replace = '$1eer';
    >
    >
    >
    > # if $pattern doesn't contain a backreference
    >
    > # create an empty one
    >
    > if ($pattern !~ /\(.*?\)/g) {
    >
    > $pattern = "()*?" . $pattern;
    >
    > }
    >
    >
    >
    > $replace =~ s/\$1/<marker>/g;
    >
    > # now, $replace = '<marker>eer';
    >
    >
    >
    > while ($text =~ /$pattern/g) {
    >
    > $replace =~ s/<marker>/$1/g;
    >
    > $text =~ s/$pattern/$replace/gi;
    >
    > }
    >
    >
    >
    >
    >
    > I haven't tested that, I'm just spit-balling the logic. Thoughts?


    I'm not sure I follow entirely but, IMO, separate regexes would be much easier and more maintainable
    than trying to do this in a single regex.

    Only if there's a huge bottleneck, would I bother,
    trying to re-factor...

    --
    Charles DeRykus
    C.DeRykus, Sep 12, 2012
    #8
  9. Jason C

    Jason C Guest

    On Wednesday, September 12, 2012 12:08:32 AM UTC-4, C.DeRykus wrote:
    > I'm not sure I follow entirely but, IMO, separate regexes would be much easier and more maintainable
    >
    > than trying to do this in a single regex.
    >
    > Only if there's a huge bottleneck, would I bother,
    > trying to re-factor...


    You might have missed it before, but on the live site, $pattern and $replace are coming from a database. Like so:

    my $sth = $dbh->prepare("SELECT * FROM table");
    $sth->execute();

    while (($pattern, $replace) = $sth->fetchrow_array()) {
    $text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;
    }

    The first group in $pattern can actually be anywhere in the string, so one row might be:

    (D|d)ear

    while the next might be:

    smart(\s*)ass

    The issue comes in where $1 is defined as non-interpretive in the database, and I'm not sure how to make it interpretive in the replacement.

    The while() loop that I presented in the last post is an attempt to replace the non-interpretive '$1' with '<marker>', then replace '<marker>' back with the interpretive "$1".
    Jason C, Sep 12, 2012
    #9
  10. Jason C

    Willem Guest

    Jason C wrote:
    ) On Tuesday, September 11, 2012 4:50:20 PM UTC-4, C.DeRykus wrote:
    )
    )> One way to avoid an 'ee' solution's drawbacks
    )> is just pull the backref out of the pattern:
    )>
    )> my $pattern = '(D|d)ear';
    )> my $replace = 'eer';
    )>
    )> $text =~ s/$pattern/$1$replace/gi;
    )
    ) That was my original thought, too, but I also have rows where the () isn't at the beginning. Eg:
    )
    ) $pattern = 'smart(\s)*ass';
    ) $replace = 'smart$1butt';
    )
    ) I really would like to avoid using /ee, though, for the security reasons mentioned earlier.
    )
    ) Maybe something like:
    )
    ) $text = "Yes dear!";
    ) $pattern = '(D|d)ear';
    ) $replace = '$1eer';
    )
    ) # if $pattern doesn't contain a backreference
    ) # create an empty one
    ) if ($pattern !~ /\(.*?\)/g) {
    ) $pattern = "()*?" . $pattern;
    ) }
    )
    ) $replace =~ s/\$1/<marker>/g;
    ) # now, $replace = '<marker>eer';
    )
    ) while ($text =~ /$pattern/g) {
    ) $replace =~ s/<marker>/$1/g;
    ) $text =~ s/$pattern/$replace/gi;
    ) }

    It would be easier to do the whole thing in a /e expression.
    But not interpreting the database string, but just adding your own code.

    Like this:

    $test =~ s/$pattern/my $s1 = $1; (my $t = $replace) =~ s|\$1|$s1|g; $t/ge;

    That should work.

    If you want more than just $1, you need a slightly more complicated
    expression, probably involving @- and @+.

    (I've always wondered why there is no regex-match array perlvar...)


    SaSW, Willem
    --
    Disclaimer: I am in no way responsible for any of the statements
    made in the above text. For all I know I might be
    drugged or something..
    No I'm not paranoid. You all think I'm paranoid, don't you !
    #EOT
    Willem, Sep 12, 2012
    #10
  11. Jason C

    Jason C Guest

    On Wednesday, September 12, 2012 9:33:03 AM UTC-4, Ben Morrow wrote:
    > That, IMHO, is basically the right approach, but you don't want to use a
    > fixed string like "<marker>" because it might appear in the source text.
    > Instead, you want something like this:
    >
    > sub dosubst {
    > my ($repl, $one) = @_;
    > $repl =~ s/\$(?:\{1\}|1)/$one/g;
    > $repl;
    > }
    >
    > $text =~ s/$pattern/dosubst $replace, $1/gie;
    >
    > This assumes the replacement only uses $1. If you want to use arbitrary
    > captures, it gets a little more difficult, since perl doesn't provide an
    > array-of-all-the-captures variable. You would need to pass $text, \@-
    > and \@+ into dosubst, and pull the captures out as required.


    Thanks to all of you for the help! I did eventually get it working correctly; Ben's reply made it click for me :)

    Here's the original regex I was using:

    $text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;

    and here's the variation that is now working, using the /e modifier:

    $text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/dosubst($replace, $1, $2, $3)/egi;

    sub dosubst {
    my ($repl, $one, $two, $three) = @_;

    $repl =~ s/\$(?:\{2\}|2)/$two/g;
    $repl = "$one" . $repl . "$three";

    return $repl;
    }

    Essentially, it's sending the uninterpreted '$2' in $pattern to dosubst(), replacing it with the interpreted $two, then returning the whole updated variable.

    I hope this helps someone in the future with a similar problem.
    Jason C, Sep 17, 2012
    #11
  12. Jason C wrote:
    >
    > Thanks to all of you for the help! I did eventually get it working correctly; Ben's reply made it click for me :)
    >
    > Here's the original regex I was using:
    >
    > $text =~ s/(\b*)$pattern(er|in|ing|s|ed|y|\b)/$1$replace$+/gi;


    You can't use a modifier on a zero-width pattern. \b matches BETWEEN
    characters so there is no way it could be longer than zero.

    The pattern 'ing' will never match because the pattern 'in' appears
    before it.



    John
    --
    Any intelligent fool can make things bigger and
    more complex... It takes a touch of genius -
    and a lot of courage to move in the opposite
    direction. -- Albert Einstein
    John W. Krahn, Sep 17, 2012
    #12
  13. Jason C

    Dr.Ruud Guest

    On 2012-09-10 08:41, Jason C wrote:

    > I'm doing a replace, like this:
    >
    > $text = "Yes dear!";
    > $pattern = "(D|d)ear";
    > $replace = "$1eer";
    >
    > $text =~ s/$pattern/$replace/gi;
    >
    > That's just an example, of course; the real $pattern and $replace come from a database list, and $text comes from form data.
    >
    > The problem I'm having is that the replace is replacing with a literal "$1eer", instead of setting the $1 to (D|d). Meaning, instead of printing:
    >
    > Yes deer!
    >
    > I'm printing:
    >
    > Yes $1eer!
    >
    > Any suggestions on how to make $1 in $replace refer to the first group in $pattern?


    Check out Template::Toolkit. Etc.

    --
    Ruud
    Dr.Ruud, Sep 19, 2012
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. paulm

    Newbie backreference question

    paulm, Jun 30, 2005, in forum: Python
    Replies:
    6
    Views:
    378
    paulm
    Jul 1, 2005
  2. Fredrik Lundh

    backreference in regexp

    Fredrik Lundh, Jan 31, 2006, in forum: Python
    Replies:
    2
    Views:
    338
    =?ISO-8859-1?Q?Sch=FCle_Daniel?=
    Jan 31, 2006
  3. Replies:
    4
    Views:
    616
    jeff emminger
    Aug 18, 2006
  4. abdulet
    Replies:
    2
    Views:
    531
    abdulet
    Oct 23, 2009
  5. Replies:
    4
    Views:
    127
Loading...

Share This Page