A couple of questions regarding runtime generation of REGEXP's

Discussion in 'Perl Misc' started by sln@netherlands.com, Nov 3, 2008.

  1. Guest

    I'm probably going to use some wrong terms here but I
    hope to give enough detail that I can get a definative
    resolution to this, once and for all.

    Basically I'm writing a sub that wants to take a regular
    expression as a parameter. It then blindly operates on data,
    matching, and posible substitution.

    Apparently qr// will only function on the matching side, something like this:

    # works
    $rx = qr/\Q$sometext\E/s;
    $data =~ /$rx/;
    # or $data =~ $rx/

    But this:

    # does not work, no way no how
    $rx = qr{s/\Q$sometext\E/junk/g};
    $data =~ $rx;

    Even though qr{s/\Q$sometext\E/junk/g} will pass warnings and errors,
    even though the substitution is constant (ie, no runtime $1,$2, etc..)
    it never matches.

    I mean I could see a failure scenario if using $1.. on the substitution side
    because it breaks undefined'ness, but if its given a constant it should work IMO.
    And if it does compile, like the above does, it should work.

    The fall back is to use an eval "" where something like this is possible:

    $rx = "s/\\Q$sometext(.*?)\\E/junk\$1/g";
    $expression = "\$res = \$data =~ $rx";
    eval $expression;
    if ($res) {
    ...
    }

    But eval is 2 to 4 times slower.

    They only thing "dynamic" about the regualar expression above is the case of
    substitution of $1.. Surely this could be taken into account when say using
    the qr// construct couldn't it? Is it really breaking the rules, or would it
    factor down to an eval anyway in that case? But the constant substitution,
    I don't see why that can't work.

    Is there anyway possible the substitution side will work?

    TIA,
    sln
     
    , Nov 3, 2008
    #1
    1. Advertising

  2. On Mon, 03 Nov 2008 00:24:30 GMT,
    <> wrote:
    > I'm probably going to use some wrong terms here but I
    > hope to give enough detail that I can get a definative
    > resolution to this, once and for all.
    >
    > Basically I'm writing a sub that wants to take a regular
    > expression as a parameter. It then blindly operates on data,
    > matching, and posible substitution.
    >
    > Apparently qr// will only function on the matching side, something like this:
    >
    > # works
    > $rx = qr/\Q$sometext\E/s;
    > $data =~ /$rx/;
    > # or $data =~ $rx/


    The matching is done by the // operator. Not because you happened to use
    qr// a bit earlier.

    > But this:
    >
    > # does not work, no way no how
    > $rx = qr{s/\Q$sometext\E/junk/g};
    > $data =~ $rx;


    A bare regex is simply not going to work on the right hand side of a =~
    operator. It's the operator on the right hand side that does the
    matching, not the =~ operator itself. That only binds an expression
    instead of $_ to that matching operator.

    More detail:

    From perlop:

    Binary "=~" binds a scalar expression to a pattern match. Certain
    operations search or modify the string $_ by default. This operator
    makes that kind of operation work on some other string. The right
    argument is a search pattern, substitution, or transliteration.

    Note that 'pattern' or 'regular expression' are not part of the allowed
    right arguments.

    Further down in the same document, under "Quote and Quote-like
    Operators":

    Customary Generic Meaning Interpolates
    '' q{} Literal no
    "" qq{} Literal yes
    ‘‘ qx{} Command yes*
    qw{} Word list no
    // m{} Pattern match yes*
    qr{} Pattern yes*
    s{}{} Substitution yes*
    tr{}{} Transliteration no (but see below)
    <<EOF here-doc yes*

    And a little further down again:

    Regexp Quote-Like Operators

    Here are the quote-like operators that apply to pattern matching and
    related activities.
    [snip]

    Martien
    --
    |
    Martien Verbruggen | Computers in the future may weigh no more
    | than 1.5 tons. -- Popular Mechanics, 1949
    |
     
    Martien Verbruggen, Nov 3, 2008
    #2
    1. Advertising

  3. <> wrote:

    > Basically I'm writing a sub that wants to take a regular
    > expression as a parameter. It then blindly operates on data,
    > matching, and posible substitution.
    >
    > Apparently qr// will only function on the matching side, something like this:



    "qr" stands for "quote regular expression" and the so called
    "matching side" of s/// is the part that is a regular expression.

    qr will work fine there.

    (the other "side" is the "replacement string", ie. it is not
    a regular expression at all.)


    > # does not work, no way no how



    Of course not. You are trying to quote something that is not
    a regular expression.


    > $rx = qr{s/\Q$sometext\E/junk/g};


    That regular expression will match if the string contains:
    an "s" character followed by
    a "/" character followed by
    the literal contents of $sometext followed by
    a "/" character followed by
    a "j" character followed by
    a "u" character followed by
    ...

    So that will match if:

    my $data = "s/$sometext/junk/g";


    > $data =~ $rx;


    my $rx = qr/\Q$sometext\E/; # quote only the regex part
    $data =~ s/$rx/junk/g; # works fine


    > And if it does compile, like the above does, it should work.



    It does work (but only if $data actually contains the characters listed above).


    > Is there anyway possible the substitution side will work?



    Yes. See above.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Nov 3, 2008
    #3
  4. Tim Greer Guest

    wrote:

    > # does not work, no way no how
    > $rx = qr{s/\Q$sometext\E/junk/g};
    > $data =~ $rx;


    Looks like you're unintentionally trying to run a regex within the
    regex, where the regex within is actually just trying to match a string
    (not a functional regex).
    --
    Tim Greer, CEO/Founder/CTO, BurlyHost.com, Inc.
    Shared Hosting, Reseller Hosting, Dedicated & Semi-Dedicated servers
    and Custom Hosting. 24/7 support, 30 day guarantee, secure servers.
    Industry's most experienced staff! -- Web Hosting With Muscle!
     
    Tim Greer, Nov 3, 2008
    #4
  5. On Sun, 02 Nov 2008 22:41:28 -0800, Tim Greer <>
    wrote:

    >> # does not work, no way no how
    >> $rx = qr{s/\Q$sometext\E/junk/g};
    >> $data =~ $rx;

    >
    >Looks like you're unintentionally trying to run a regex within the
    >regex, where the regex within is actually just trying to match a string
    >(not a functional regex).


    (S)he's just trying to "save" a substitution as first-order object,
    and (s)he blindily tried some "random" syntax that's not going to work
    of course.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Nov 3, 2008
    #5
  6. On Mon, 03 Nov 2008 00:24:30 GMT, wrote:

    >I'm probably going to use some wrong terms here but I
    >hope to give enough detail that I can get a definative
    >resolution to this, once and for all.
    >
    >Basically I'm writing a sub that wants to take a regular
    >expression as a parameter. It then blindly operates on data,
    >matching, and posible substitution.

    [cut]
    ># does not work, no way no how
    >$rx = qr{s/\Q$sometext\E/junk/g};


    Actually, this comes out oh so often! Others duly explained to you
    what's going on. Bottom line is, you *can't* "save" a substitution as
    a first order object of the language. The substitution part of a
    substitution, though, is "simply" a string: well, either that or code
    - if the /e modifier is supplied. In both cases you can *think* of it,
    possibly at the expense of a tiny wrapper layer, as a sub. Thus a
    solution to your problem, albeit not just as "slim" as you may have
    hoped for, may be given in terms of a couple consisting of a regex and
    a sub. Sounds reasonable?


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Nov 3, 2008
    #6
  7. On Mon, 3 Nov 2008 13:37:21 +1100, Martien Verbruggen
    <> wrote:

    >> # does not work, no way no how
    >> $rx = qr{s/\Q$sometext\E/junk/g};
    >> $data =~ $rx;

    >
    >A bare regex is simply not going to work on the right hand side of a =~
    >operator. It's the operator on the right hand side that does the
    >matching, not the =~ operator itself. That only binds an expression
    >instead of $_ to that matching operator.


    This is simply not true:

    $ perl -E '$r=qr/\w+\s(\w+)\s\w+/;
    "foo bar baz" =~ $r and say $1'
    bar

    In fact...

    >More detail:
    >
    >From perlop:
    >
    > Binary "=~" binds a scalar expression to a pattern match. Certain
    > operations search or modify the string $_ by default. This operator
    > makes that kind of operation work on some other string. The right
    > argument is a search pattern, substitution, or transliteration.

    ^^^^^^^^^^^^^^
    ^^^^^^^^^^^^^^

    It's simply *ad hoc* in Perl 5.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Nov 3, 2008
    #7
  8. Guest

    On Sun, 2 Nov 2008 21:49:03 -0600, Tad J McClellan <> wrote:

    > <> wrote:
    >
    >> Basically I'm writing a sub that wants to take a regular
    >> expression as a parameter. It then blindly operates on data,
    >> matching, and posible substitution.
    >>
    >> Apparently qr// will only function on the matching side, something like this:

    >
    >
    >"qr" stands for "quote regular expression" and the so called
    >"matching side" of s/// is the part that is a regular expression.
    >
    >qr will work fine there.
    >
    >(the other "side" is the "replacement string", ie. it is not
    >a regular expression at all.)
    >
    >
    >> # does not work, no way no how

    >
    >
    >Of course not. You are trying to quote something that is not
    >a regular expression.
    >
    >
    >> $rx = qr{s/\Q$sometext\E/junk/g};

    >
    >That regular expression will match if the string contains:
    > an "s" character followed by
    > a "/" character followed by
    > the literal contents of $sometext followed by
    > a "/" character followed by
    > a "j" character followed by
    > a "u" character followed by
    > ...
    >
    >So that will match if:
    >
    > my $data = "s/$sometext/junk/g";
    >
    >
    >> $data =~ $rx;

    >
    > my $rx = qr/\Q$sometext\E/; # quote only the regex part
    > $data =~ s/$rx/junk/g; # works fine
    >
    >
    >> And if it does compile, like the above does, it should work.

    >
    >
    >It does work (but only if $data actually contains the characters listed above).
    >
    >
    >> Is there anyway possible the substitution side will work?

    >
    >
    >Yes. See above.


    Thats clear, no suprises then.
    Thanks!

    sln
     
    , Nov 3, 2008
    #8
  9. Guest

    On Mon, 03 Nov 2008 14:14:52 +0100, Michele Dondi <> wrote:

    >On Mon, 03 Nov 2008 00:24:30 GMT, wrote:
    >
    >>I'm probably going to use some wrong terms here but I
    >>hope to give enough detail that I can get a definative
    >>resolution to this, once and for all.
    >>
    >>Basically I'm writing a sub that wants to take a regular
    >>expression as a parameter. It then blindly operates on data,
    >>matching, and posible substitution.

    >[cut]
    >># does not work, no way no how
    >>$rx = qr{s/\Q$sometext\E/junk/g};

    >
    >Actually, this comes out oh so often! Others duly explained to you
    >what's going on. Bottom line is, you *can't* "save" a substitution as
    >a first order object of the language. The substitution part of a
    >substitution, though, is "simply" a string: well, either that or code
    >- if the /e modifier is supplied. In both cases you can *think* of it,
    >possibly at the expense of a tiny wrapper layer, as a sub. Thus a
    >solution to your problem, albeit not just as "slim" as you may have
    >hoped for, may be given in terms of a couple consisting of a regex and
    >a sub. Sounds reasonable?
    >
    >
    >Michele


    No matter how I look at it, the replacement is still a string-
    constructed in the scope of the block that invokes regexp engine.

    So s/.../$somereplacement$1$2$3/ can be valid.
    Or s/.../somesub($1,$2,$3)/e can be valid.

    And only qr// can be compiled ahead of =~ if constant, ie: the regular expression.
    In this case (s)///(g) or //(g) has no meaning, nor does //(e) I take it,
    because the (.) is not part of the regular expression, but some modifiers are like //i
    because it acts on the regular expression.

    To me then it is a misnomer to call this: 's/$regx/$txt/g' a regular expression since
    it can't be known before a scope block that invokes it, but qr// can be.

    In my opinion, s///g should be allowed by qr{} using the scoping block it was created
    in, and later correctly used (s///g) within the context of a block that invokes the engine.

    This may violate 'first-order object' of the language. But then why are code extensions allowed?
    qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
    if allowed would would internally result in a dynamic code issue like eval.
    I don't that this 'code' extension isn't treated as a literal anyway.

    I don't know if invoking a 'sub' (/e) is going to be any better than having to
    parse through a passed in argument list for the proper form. In all cases, it looks
    like the replacement text cannot include special var's unles an eval is used
    at runtime.

    Can you give an example of your regex and a sub solution?

    Thanks.

    sln
     
    , Nov 3, 2008
    #9
  10. On Mon, 03 Nov 2008 23:01:35 GMT, wrote:

    >In my opinion, s///g should be allowed by qr{} using the scoping block it was created
    >in, and later correctly used (s///g) within the context of a block that invokes the engine.
    >
    >This may violate 'first-order object' of the language. But then why are code extensions allowed?
    >qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
    >if allowed would would internally result in a dynamic code issue like eval.
    >I don't that this 'code' extension isn't treated as a literal anyway.


    Do not misunderstand me, I'm all with you: would you write a Perl
    extension that allows to treat substitutions as first order objects of
    the language? I would cherish that... Unfortunately I *for one*
    haven't the slightest idea of where one could begin!

    In the meanwhile we must be happy with a clumsier solution, like...

    >I don't know if invoking a 'sub' (/e) is going to be any better than having to
    >parse through a passed in argument list for the proper form. In all cases, it looks
    >like the replacement text cannot include special var's unles an eval is used
    >at runtime.
    >
    >Can you give an example of your regex and a sub solution?


    .... sure:

    my %subst = ( regex => qr/.../, code => sub { ... } );

    And then you use that to perform the substitution. You may even make
    that the core data of a class, thus allowing objects like $subst with
    a suitable ->apply($string) method.


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Nov 4, 2008
    #10
  11. Guest

    On Tue, 04 Nov 2008 12:23:07 +0100, Michele Dondi <> wrote:

    >On Mon, 03 Nov 2008 23:01:35 GMT, wrote:
    >
    >>In my opinion, s///g should be allowed by qr{} using the scoping block it was created
    >>in, and later correctly used (s///g) within the context of a block that invokes the engine.
    >>
    >>This may violate 'first-order object' of the language. But then why are code extensions allowed?
    >>qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
    >>if allowed would would internally result in a dynamic code issue like eval.
    >>I don't that this 'code' extension isn't treated as a literal anyway.

    >
    >Do not misunderstand me, I'm all with you: would you write a Perl
    >extension that allows to treat substitutions as first order objects of
    >the language? I would cherish that... Unfortunately I *for one*
    >haven't the slightest idea of where one could begin!
    >
    >In the meanwhile we must be happy with a clumsier solution, like...
    >
    >>I don't know if invoking a 'sub' (/e) is going to be any better than having to
    >>parse through a passed in argument list for the proper form. In all cases, it looks
    >>like the replacement text cannot include special var's unles an eval is used
    >>at runtime.
    >>
    >>Can you give an example of your regex and a sub solution?

    >
    >... sure:
    >
    > my %subst = ( regex => qr/.../, code => sub { ... } );
    >
    >And then you use that to perform the substitution. You may even make
    >that the core data of a class, thus allowing objects like $subst with
    >a suitable ->apply($string) method.
    >
    >
    >Michele


    I'm in your debt. There is virtually no overhead in calling that
    sub for the substitution, and it executes in context. There is no
    comparison with eval, this is the way to go for me.

    I will, and have already resigned that its the callers responsibility
    to ensure proper regexp usage, so/and I am just providing the rope.

    In my circumstances, its all about performance. Any added indirection,
    calls/assignments, etc.. will mean hazard in my usage. I won't get into
    the gory details unless you want to know.

    Below, is raw isolated test code, in the case of method 2, no error checking.
    I already have an object function that an array of regex/code sub's could be passed to
    where it then operates on data highly bound to the object.

    Introducing a new object, RegxProc in the simple case below, would aleviate parsing,
    but an unknown object type might not be acessable. But would aleviate internal processing.
    I could internalize the RegxProc in the existing class, providing a wrapper method I guess
    but the caller could not specify search/replace/replace global without additional parameter
    parsing.

    This is a relief for me though. Thanks alot...

    sln

    -----------------

    use strict;
    use warnings;

    # method 1
    # ------------
    # my $data = "This is some data, this gets substituted";
    # my $subst = {
    # 'regex' => qr/(\whis)/i,
    # 'code' => sub { print "$1\n"; return 'That'; }
    # };
    # $data =~ s/$subst->{'regex'}/ &{$subst->{'code'}}/ge;
    # print "$data\n";


    # method 2
    # -------------
    my $data = "This(1) is some data, this(2) gets substituted,
    and so does this(3).";

    print "\nData = $data\n\n";

    my $rxp = new RegxProc (
    'regex' => qr/(\whis\(\d\))/si,
    'code' => sub { print "\ncode: \$1 = $1\n"; return 'That'; }
    );
    if ($rxp->search ($data)) {
    print "search worked\n";
    }
    if ($rxp->replace ($data)) {
    print "replace worked, data = $data\n";
    }
    if ($rxp->replace_g ($data)) {
    print "global replace worked, data = $data\n";
    }

    package RegxProc;
    use vars qw(@ISA);
    @ISA = qw();

    sub new
    {
    my ($class, @args) = @_;
    my $self = {};
    while (my ($name, $val) = splice (@args, 0, 2)) {
    if ('regex' eq lc $name) {
    $self->{regex} = $val;
    }
    elsif ('code' eq lc $name) {
    $self->{code} = $val;
    }
    }
    return bless ($self, $class);
    }
    sub search
    {
    my $self = shift;
    return 0 unless (defined $_[0]);
    return $_[0] =~ /$self->{regex}/;
    }
    sub replace
    {
    my $self = shift;
    return 0 unless (defined $_[0]);
    return $_[0] =~ s/$self->{regex}/&{$self->{code}}/e;
    }
    sub replace_g
    {
    my $self = shift;
    return 0 unless (defined $_[0]);
    return $_[0] =~ s/$self->{regex}/&{$self->{code}}/ge;
    }

    __END__

    Data = This(1) is some data, this(2) gets substituted,
    and so does this(3).

    search worked

    code: $1 = This(1)
    replace worked, data = That is some data, this(2) gets substituted,
    and so does this(3).

    code: $1 = this(2)

    code: $1 = this(3)
    global replace worked, data = That is some data, That gets substituted,
    and so does That.
     
    , Nov 5, 2008
    #11
  12. Guest

    On Wed, 05 Nov 2008 00:31:44 GMT, wrote:

    >On Tue, 04 Nov 2008 12:23:07 +0100, Michele Dondi <> wrote:
    >
    >>On Mon, 03 Nov 2008 23:01:35 GMT, wrote:
    >>
    >>>In my opinion, s///g should be allowed by qr{} using the scoping block it was created
    >>>in, and later correctly used (s///g) within the context of a block that invokes the engine.
    >>>
    >>>This may violate 'first-order object' of the language. But then why are code extensions allowed?
    >>>qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
    >>>if allowed would would internally result in a dynamic code issue like eval.
    >>>I don't that this 'code' extension isn't treated as a literal anyway.

    >>
    >>Do not misunderstand me, I'm all with you: would you write a Perl
    >>extension that allows to treat substitutions as first order objects of
    >>the language? I would cherish that... Unfortunately I *for one*
    >>haven't the slightest idea of where one could begin!
    >>
    >>In the meanwhile we must be happy with a clumsier solution, like...
    >>
    >>>I don't know if invoking a 'sub' (/e) is going to be any better than having to
    >>>parse through a passed in argument list for the proper form. In all cases, it looks
    >>>like the replacement text cannot include special var's unles an eval is used
    >>>at runtime.
    >>>
    >>>Can you give an example of your regex and a sub solution?

    >>
    >>... sure:
    >>
    >> my %subst = ( regex => qr/.../, code => sub { ... } );
    >>
    >>And then you use that to perform the substitution. You may even make
    >>that the core data of a class, thus allowing objects like $subst with
    >>a suitable ->apply($string) method.
    >>
    >>
    >>Michele

    >

    [snip]
    >
    >This is a relief for me though. Thanks alot...
    >

    [snip]
    >


    I settled on this lightweight class that handles the substution with some
    variable type's. Still it is with minimal error checking to reduce overhead.
    Added a few methods to generalize access, and it benchmarks pretty good.

    See any potential problems or performance issues ?

    sln

    ----------------------
    use strict;
    use warnings;

    my $data = "This(1) is some data, this(2) gets substituted,
    and so does this(3).";
    my $tempdata = $data;

    my $rxp = RxP->new (
    'regex' => qr/(\whis\(\d\))/si,
    'code' => sub { print "code: \$1 = $1\n"; return 'That'; },
    'type' => 'r'
    );

    # test apply, set/get_type methods
    if (1)
    {
    print "\n","-"x20,"\nData = $data\n\n";

    $rxp->set_type('s');
    if ($rxp->apply ($data)) {
    print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
    }
    $rxp->set_type('r');
    if ($rxp->apply ($data)) {
    print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
    }
    $rxp->set_type('g');
    if ($rxp->apply ($data)) {
    print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
    }
    }

    # test direct call and search, replace, replace_g methods
    if (1)
    {
    $rxp->set_type('r');
    $data = $tempdata;
    print "\n","-"x20,"\nData = $data\n\n";

    if ($rxp->{'dflt_sub'}($rxp, $data)) {
    print "Direct {dflt_sub} worked, data = $data\n\n";
    }
    if ($rxp->search ($data)) {
    print "Search worked, data = $data\n\n";
    }
    if ($rxp->replace ($data)) {
    print "Replace worked, data = $data\n\n";
    }
    if ($rxp->replace_g ($data)) {
    print "Global replace worked, data = $data\n\n";
    }
    }


    package RxP;
    use vars qw(@ISA);
    @ISA = qw();

    sub new
    {
    my ($class, @args) = @_;
    my $self = {
    'dflt_sub' => \&search,
    'type' => 's'
    };
    while (my ($name, $val) = splice (@args, 0, 2)) {
    if ('regex' eq lc $name) {
    $self->{'regex'} = $val;
    }
    elsif ('code' eq lc $name) {
    $self->{'code'} = $val;
    }
    elsif ('type' eq lc $name && $val =~ /(s|r|g)/i) {
    set_type ($self, $1);
    }
    }
    return bless ($self, $class);
    }
    sub get_type
    {
    return $_[0]->{'type'};
    }
    sub set_type
    {
    return 0 unless (defined $_[1]);
    if ($_[1] =~ /(s|r|g)/i) {
    $_[0]->{'dflt_sub'} = {
    's' => \&search,
    'r' => \&replace,
    'g' => \&replace_g
    }->{$1};
    $_[0]->{'type'} = $1;
    return 1;
    }
    return 0;
    }
    sub apply
    {
    return 0 unless (defined $_[1]);
    return &{$_[0]->{'dflt_sub'}};
    }
    sub search
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ /$_[0]->{'regex'}/;
    }
    sub replace
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/e;
    }
    sub replace_g
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/ge;
    }

    __END__

    --------------------
    Data = This(1) is some data, this(2) gets substituted,
    and so does this(3).

    Apply 's' worked, data = This(1) is some data, this(2) gets substituted,
    and so does this(3).

    code: $1 = This(1)
    Apply 'r' worked, data = That is some data, this(2) gets substituted,
    and so does this(3).

    code: $1 = this(2)
    code: $1 = this(3)
    Apply 'g' worked, data = That is some data, That gets substituted,
    and so does That.


    --------------------
    Data = This(1) is some data, this(2) gets substituted,
    and so does this(3).

    code: $1 = This(1)
    Direct {dflt_sub} worked, data = That is some data, this(2) gets substituted,
    and so does this(3).

    Search worked, data = That is some data, this(2) gets substituted,
    and so does this(3).

    code: $1 = this(2)
    Replace worked, data = That is some data, That gets substituted,
    and so does this(3).

    code: $1 = this(3)
    Global replace worked, data = That is some data, That gets substituted,
    and so does That.
     
    , Nov 5, 2008
    #12
  13. Guest

    On Wed, 05 Nov 2008 19:54:14 GMT, wrote:

    >On Wed, 05 Nov 2008 00:31:44 GMT, wrote:
    >
    >>On Tue, 04 Nov 2008 12:23:07 +0100, Michele Dondi <> wrote:
    >>
    >>>On Mon, 03 Nov 2008 23:01:35 GMT, wrote:
    >>>
    >>>>In my opinion, s///g should be allowed by qr{} using the scoping block it was created
    >>>>in, and later correctly used (s///g) within the context of a block that invokes the engine.
    >>>>
    >>>>This may violate 'first-order object' of the language. But then why are code extensions allowed?
    >>>>qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
    >>>>if allowed would would internally result in a dynamic code issue like eval.
    >>>>I don't that this 'code' extension isn't treated as a literal anyway.
    >>>
    >>>Do not misunderstand me, I'm all with you: would you write a Perl
    >>>extension that allows to treat substitutions as first order objects of
    >>>the language? I would cherish that... Unfortunately I *for one*
    >>>haven't the slightest idea of where one could begin!
    >>>
    >>>In the meanwhile we must be happy with a clumsier solution, like...
    >>>
    >>>>I don't know if invoking a 'sub' (/e) is going to be any better than having to
    >>>>parse through a passed in argument list for the proper form. In all cases, it looks
    >>>>like the replacement text cannot include special var's unles an eval is used
    >>>>at runtime.
    >>>>
    >>>>Can you give an example of your regex and a sub solution?
    >>>
    >>>... sure:
    >>>
    >>> my %subst = ( regex => qr/.../, code => sub { ... } );
    >>>
    >>>And then you use that to perform the substitution. You may even make
    >>>that the core data of a class, thus allowing objects like $subst with
    >>>a suitable ->apply($string) method.
    >>>
    >>>
    >>>Michele

    >>

    >[snip]
    >>
    >>This is a relief for me though. Thanks alot...
    >>

    >[snip]
    >>

    >
    >I settled on this lightweight class that handles the substution with some
    >variable type's. Still it is with minimal error checking to reduce overhead.
    >Added a few methods to generalize access, and it benchmarks pretty good.
    >
    >See any potential problems or performance issues ?
    >
    >sln
    >
    >----------------------
    >use strict;
    >use warnings;
    >
    >my $data = "This(1) is some data, this(2) gets substituted,
    >and so does this(3).";
    >my $tempdata = $data;
    >
    >my $rxp = RxP->new (
    > 'regex' => qr/(\whis\(\d\))/si,
    > 'code' => sub { print "code: \$1 = $1\n"; return 'That'; },
    > 'type' => 'r'
    >);
    >
    ># test apply, set/get_type methods
    >if (1)
    >{
    > print "\n","-"x20,"\nData = $data\n\n";
    >
    > $rxp->set_type('s');
    > if ($rxp->apply ($data)) {
    > print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
    > }
    > $rxp->set_type('r');
    > if ($rxp->apply ($data)) {
    > print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
    > }
    > $rxp->set_type('g');
    > if ($rxp->apply ($data)) {
    > print "Apply '".$rxp->get_type."' worked, data = $data\n\n";
    > }
    >}
    >
    ># test direct call and search, replace, replace_g methods
    >if (1)
    >{
    > $rxp->set_type('r');
    > $data = $tempdata;
    > print "\n","-"x20,"\nData = $data\n\n";
    >
    > if ($rxp->{'dflt_sub'}($rxp, $data)) {
    > print "Direct {dflt_sub} worked, data = $data\n\n";
    > }
    > if ($rxp->search ($data)) {
    > print "Search worked, data = $data\n\n";
    > }
    > if ($rxp->replace ($data)) {
    > print "Replace worked, data = $data\n\n";
    > }
    > if ($rxp->replace_g ($data)) {
    > print "Global replace worked, data = $data\n\n";
    > }
    >}
    >
    >
    >package RxP;
    >use vars qw(@ISA);
    >@ISA = qw();
    >

    [snip]
    Its better to have the regexp fail for some other reason than
    undefined'ness.

    Performance benchmarks are very good. Thx...


    sub new
    {
    my ($class, @args) = @_;
    my $self = {
    'regex' => '',
    'code' => '',
    'type' => 's',
    'dflt_sub' => \&search,
    };
    while (my ($name, $val) = splice (@args, 0, 2)) {
    next if (!defined $val);
    if ('regex' eq lc $name) {
    $self->{'regex'} = $val;
    }
    elsif ('code' eq lc $name) {
    $self->{'code'} = $val;
    }
    elsif ('type' eq lc $name && $val =~ /(s|r|g)/i) {
    set_type ($self, $1);
    }
    }
    return bless ($self, $class);
    }
    >sub get_type
    >{
    > return $_[0]->{'type'};
    >}
    >sub set_type
    >{
    > return 0 unless (defined $_[1]);
    > if ($_[1] =~ /(s|r|g)/i) {
    > $_[0]->{'dflt_sub'} = {
    > 's' => \&search,
    > 'r' => \&replace,
    > 'g' => \&replace_g
    > }->{$1};
    > $_[0]->{'type'} = $1;
    > return 1;
    > }
    > return 0;
    >}
    >sub apply
    >{
    > return 0 unless (defined $_[1]);
    > return &{$_[0]->{'dflt_sub'}};
    >}
    >sub search
    >{
    > return 0 unless (defined $_[1]);
    > return $_[1] =~ /$_[0]->{'regex'}/;
    >}
    >sub replace
    >{
    > return 0 unless (defined $_[1]);
    > return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/e;
    >}
    >sub replace_g
    >{
    > return 0 unless (defined $_[1]);
    > return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/ge;
    >}
    >
    >__END__
    >
    >--------------------
    >Data = This(1) is some data, this(2) gets substituted,
    >and so does this(3).
    >
    >Apply 's' worked, data = This(1) is some data, this(2) gets substituted,
    >and so does this(3).
    >
    >code: $1 = This(1)
    >Apply 'r' worked, data = That is some data, this(2) gets substituted,
    >and so does this(3).
    >
    >code: $1 = this(2)
    >code: $1 = this(3)
    >Apply 'g' worked, data = That is some data, That gets substituted,
    >and so does That.
    >
    >
    >--------------------
    >Data = This(1) is some data, this(2) gets substituted,
    >and so does this(3).
    >
    >code: $1 = This(1)
    >Direct {dflt_sub} worked, data = That is some data, this(2) gets substituted,
    >and so does this(3).
    >
    >Search worked, data = That is some data, this(2) gets substituted,
    >and so does this(3).
    >
    >code: $1 = this(2)
    >Replace worked, data = That is some data, That gets substituted,
    >and so does this(3).
    >
    >code: $1 = this(3)
    >Global replace worked, data = That is some data, That gets substituted,
    >and so does That.
    >
    >
    >
    >
     
    , Nov 7, 2008
    #13
  14. On Wed, 05 Nov 2008 19:54:14 GMT, wrote:

    >I settled on this lightweight class that handles the substution with some
    >variable type's. Still it is with minimal error checking to reduce overhead.
    >Added a few methods to generalize access, and it benchmarks pretty good.
    >
    >See any potential problems or performance issues ?


    I don't have time enough to dig through your implementation, but it
    seems to me that you set up a fairly complete thingie: now,
    performance is not generally a concern of mine. If it is for you, then
    just profile your app. For the rest, I can only suggest you to set up
    a test suite as well. As far as your implementation complies, you may
    consider yourself reasonalby safe, ain't it?


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Nov 7, 2008
    #14
  15. Guest

    On Fri, 07 Nov 2008 10:38:19 +0100, Michele Dondi <> wrote:

    >On Wed, 05 Nov 2008 19:54:14 GMT, wrote:
    >
    >>I settled on this lightweight class that handles the substution with some
    >>variable type's. Still it is with minimal error checking to reduce overhead.
    >>Added a few methods to generalize access, and it benchmarks pretty good.
    >>
    >>See any potential problems or performance issues ?

    >
    >I don't have time enough to dig through your implementation, but it
    >seems to me that you set up a fairly complete thingie: now,
    >performance is not generally a concern of mine. If it is for you, then
    >just profile your app. For the rest, I can only suggest you to set up
    >a test suite as well. As far as your implementation complies, you may
    >consider yourself reasonalby safe, ain't it?
    >
    >
    >Michele


    Never heard of test suites/cases. On my really big app, I'm making changes
    so fast it scares me. I miss a compiler as opposed to a syntax checker.
    No, no. Nunit isin't for me. I live on the edge, die on the edge, one
    man - one piece of art...

    sln
     
    , Nov 7, 2008
    #15
  16. Guest

    On Fri, 07 Nov 2008 10:38:19 +0100, Michele Dondi <> wrote:

    >On Wed, 05 Nov 2008 19:54:14 GMT, wrote:
    >
    >>I settled on this lightweight class that handles the substution with some
    >>variable type's. Still it is with minimal error checking to reduce overhead.
    >>Added a few methods to generalize access, and it benchmarks pretty good.
    >>
    >>See any potential problems or performance issues ?

    >
    >I don't have time enough to dig through your implementation, but it
    >seems to me that you set up a fairly complete thingie: now,
    >performance is not generally a concern of mine. If it is for you, then
    >just profile your app. For the rest, I can only suggest you to set up
    >a test suite as well. As far as your implementation complies, you may
    >consider yourself reasonalby safe, ain't it?
    >
    >
    >Michele


    I've already integrated this package into my bigger package and have exported
    a thin wrapper sub that instantiates objects which are used as a drop in
    by the caller, specifically used as a parameter (a ref from NewRxP) that
    gets passed to the larger package method. Like a macro almost.

    I'm learning the gory details of classes in Perl, something I didn't think
    I would need to know beyond casual knowledge. I'm a hard core Windows
    MFC C++ programmer, its how I make my living as a contractor.
    Periodically, I'm laid off, like now. Perl is like candy to me, sweet to the
    tongue, especially regular expressions. Its almost addicting. Unemployment is
    running out, nobody is calling, I'm sure I will have to give this up and work
    as a brick layer, my long past proffession, again. So, if I dissapear, its
    been nice knowing you!

    sln
     
    , Nov 7, 2008
    #16
  17. Guest

    On Fri, 07 Nov 2008 01:29:34 GMT, wrote:

    >On Wed, 05 Nov 2008 19:54:14 GMT, wrote:
    >
    >>On Wed, 05 Nov 2008 00:31:44 GMT, wrote:
    >>
    >>>On Tue, 04 Nov 2008 12:23:07 +0100, Michele Dondi <> wrote:
    >>>
    >>>>On Mon, 03 Nov 2008 23:01:35 GMT, wrote:
    >>>>
    >>>>>In my opinion, s///g should be allowed by qr{} using the scoping block it was created
    >>>>>in, and later correctly used (s///g) within the context of a block that invokes the engine.
    >>>>>
    >>>>>This may violate 'first-order object' of the language. But then why are code extensions allowed?
    >>>>>qr/(?{ code })/ and what is the scoping for them? To me this looks like parsing issues and
    >>>>>if allowed would would internally result in a dynamic code issue like eval.
    >>>>>I don't that this 'code' extension isn't treated as a literal anyway.
    >>>>
    >>>>Do not misunderstand me, I'm all with you: would you write a Perl
    >>>>extension that allows to treat substitutions as first order objects of
    >>>>the language? I would cherish that... Unfortunately I *for one*
    >>>>haven't the slightest idea of where one could begin!
    >>>>
    >>>>In the meanwhile we must be happy with a clumsier solution, like...
    >>>>
    >>>>>I don't know if invoking a 'sub' (/e) is going to be any better than having to
    >>>>>parse through a passed in argument list for the proper form. In all cases, it looks
    >>>>>like the replacement text cannot include special var's unles an eval is used
    >>>>>at runtime.
    >>>>>
    >>>>>Can you give an example of your regex and a sub solution?
    >>>>
    >>>>... sure:
    >>>>
    >>>> my %subst = ( regex => qr/.../, code => sub { ... } );
    >>>>
    >>>>And then you use that to perform the substitution. You may even make
    >>>>that the core data of a class, thus allowing objects like $subst with
    >>>>a suitable ->apply($string) method.
    >>>>
    >>>>
    >>>>Michele
    >>>

    >>[snip]
    >>>
    >>>This is a relief for me though. Thanks alot...
    >>>

    >>[snip]
    >>>

    >>
    >>I settled on this lightweight class that handles the substution with some
    >>variable type's. Still it is with minimal error checking to reduce overhead.
    >>Added a few methods to generalize access, and it benchmarks pretty good.
    >>
    >>See any potential problems or performance issues ?
    >>
    >>sln
    >>
    >>----------------------
    >>use strict;
    >>use warnings;
    >>


    Ran into issues that were fixed. I just want to close this out with
    the correct default 'code' sub, changed types, and added 'search_g()' method.
    Thanks.

    sln



    sub NewRxP
    {
    my ($regex,$code,$type) = @_;
    if (defined $code && ref($code) ne 'CODE') {
    my $temp = $type;
    $type = $code;
    $code = $temp;
    }
    return RxP->new('regex'=>$regex,'code'=>$code,'type'=>$type);
    }


    # =================

    package RxP;
    use vars qw(@ISA);
    @ISA = qw();

    sub new
    {
    my ($class, @args) = @_;
    my $self = {
    'regex' => '',
    'code' => sub{''},
    'type' => 's',
    'dflt_sub' => \&search
    };
    while (my ($name, $val) = splice (@args, 0, 2)) {
    next if (!defined $val);
    if ('regex' eq lc $name) {
    $self->{'regex'} = $val;
    }
    elsif ('code' eq lc $name && ref($val) eq 'CODE') {
    $self->{'code'} = $val;
    }
    elsif ('type' eq lc $name && $val =~ /(sg|gs|rg|gr|s|r)/i) {
    set_type ($self, lc $1);
    }
    }
    return bless ($self, $class);
    }
    sub get_type
    {
    return $_[0]->{'type'};
    }
    sub set_type
    {
    return 0 unless (defined $_[1]);
    if ($_[1] =~ /(sg|gs|rg|gr|s|r)/i) {
    $_[0]->{'dflt_sub'} = {
    's' => \&search,
    'sg' => \&search_g,
    'gs' => \&search_g,
    'r' => \&replace,
    'rg' => \&replace_g,
    'gr' => \&replace_g
    }->{lc $1};
    $_[0]->{'type'} = lc $1;
    return 1;
    }
    return 0;
    }
    sub apply
    {
    return 0 unless (defined $_[1]);
    return &{$_[0]->{'dflt_sub'}};
    }
    sub search
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ /$_[0]->{'regex'}/;
    }
    sub search_g
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ /$_[0]->{'regex'}/g;
    }
    sub replace
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/e;
    }
    sub replace_g
    {
    return 0 unless (defined $_[1]);
    return $_[1] =~ s/$_[0]->{'regex'}/&{$_[0]->{'code'}}/ge;
    }
     
    , Nov 8, 2008
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Wayne  Wengert

    A Couple of Questions

    Wayne Wengert, Jan 22, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    327
    =?Utf-8?B?aSBkb250IGtub3cgbXVjaCBhYm91dCB2cy4uLmI=
    Jan 22, 2004
  2. Ali Syed
    Replies:
    3
    Views:
    563
    Mark McIntyre
    Oct 13, 2004
  3. Michael W. Ryder
    Replies:
    34
    Views:
    353
    Robert Klemme
    Apr 26, 2008
  4. Michael W. Ryder

    A Couple of Questions Regarding Ruby Style

    Michael W. Ryder, May 16, 2008, in forum: Ruby
    Replies:
    5
    Views:
    117
    Michael W. Ryder
    May 18, 2008
  5. Joao Silva
    Replies:
    16
    Views:
    366
    7stud --
    Aug 21, 2009
Loading...

Share This Page