Help with dynamic regex

Discussion in 'Perl Misc' started by Dave Saville, Mar 7, 2012.

  1. Dave Saville

    Dave Saville Guest

    I have a general purpose CGI script to interogate various log files on
    my server. Because sometimes I want to limit what I see I have an
    optional field where I can type a regular expression.

    If the field is blank then obviously nothing special needs to be done,
    but if there is something then I need to build a regex. If the field
    is just for example abc then I need m/abc/ generated - or something
    that will do that. OTH if the field has /abc/i then I need m/abc/i and
    so on.

    Things are further complicated in that if there is a regular
    expression then I want that matching text highlighted in the output.
    What I have follows - but it does not always work. For example if I
    have /o/g it does not highlight all the o's.

    sub getlog
    {
    my $logfile = shift;
    open LOG, "<$logfile" or die "Can't open $logfile $!";

    local $regex = $regex;

    if ( $regex )
    {
    if ( $regex =~ m/^(.)(.*)\1([giomsx]+)$/ )
    {
    my $body = $2;
    my $mod = $3;

    if ( $mod )
    {
    $regex = qr/(?$mod)$body/;
    }
    else
    {
    $regex = qr/$body/;
    }
    }
    }

    while ( <LOG> )
    {
    s/\</&lt;/gm;
    s/\>/&gt;/gm;
    s/\xf8/&deg;/gm;

    if ( $regex )
    {
    push @log, $_ if s/($regex)/\<span
    class="searchlight"\>$1\<\/span\>/;
    }
    else
    {
    push @log, $_;
    }
    }

    close LOG;
    }

    TIA
    --
    Regards
    Dave Saville
     
    Dave Saville, Mar 7, 2012
    #1
    1. Advertising

  2. "Dave Saville" <> writes:
    > I have a general purpose CGI script to interogate various log files on
    > my server. Because sometimes I want to limit what I see I have an
    > optional field where I can type a regular expression.
    >
    > If the field is blank then obviously nothing special needs to be done,
    > but if there is something then I need to build a regex. If the field
    > is just for example abc then I need m/abc/ generated - or something
    > that will do that. OTH if the field has /abc/i then I need m/abc/i and
    > so on.
    >
    > Things are further complicated in that if there is a regular
    > expression then I want that matching text highlighted in the output.
    > What I have follows - but it does not always work. For example if I
    > have /o/g it does not highlight all the o's.
    >
    > sub getlog
    > {
    > my $logfile = shift;
    > open LOG, "<$logfile" or die "Can't open $logfile $!";
    >
    > local $regex = $regex;


    A general remark: Consider using my for local variables of
    subroutines. It is generally faster and these variables are then
    private to the subroutine which defines them while the local binding
    will be visible to all subroutines invoked from this one. Also, using
    a parameter to pass data into subroutines is usually sensible because
    the subroutine then becomes independent of its environment and it is
    easier to see what data flows into it (ie, without reading through all
    of the code). Here, this would mean using something like

    sub getlog
    {
    my ($logfile, $regex) = @_;

    >
    > if ( $regex )
    > {
    > if ( $regex =~ m/^(.)(.*)\1([giomsx]+)$/ )
    > {
    > my $body = $2;
    > my $mod = $3;
    >
    > if ( $mod )
    > {
    > $regex = qr/(?$mod)$body/;
    > }
    > else
    > {
    > $regex = qr/$body/;
    > }
    > }
    > }
    >
    > while ( <LOG> )
    > {
    > s/\</&lt;/gm;
    > s/\>/&gt;/gm;
    > s/\xf8/&deg;/gm;
    >
    > if ( $regex )
    > {
    > push @log, $_ if s/($regex)/\<span
    > class="searchlight"\>$1\<\/span\>/;
    > }


    According to perlre(1) qr/// doesn't support the g modifier and if you
    want 'global substituations' (in a particular line of input), you need
    to apply that to s///. A possible way to achieve that would be to
    create a subroutine which performs the desired modification on the input
    line and invoke that for each line. Example how this could be done:

    ------------
    my $r = '|g|g'; # set to regex
    my $s;

    if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    $2, $3));
    } else {
    $s = sub { };
    }

    $a = 'gagbgcgd';
    $s->($a);

    print $a, "\n";
    ------------

    NB: The created substitution routine modifies its argument since this
    gets around the need to copy the complete log file.
     
    Rainer Weikusat, Mar 7, 2012
    #2
    1. Advertising

  3. Rainer Weikusat <> writes:

    [...]

    > if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    > $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    > $2, $3));


    Additional remark: Because this is based on interpolating a string
    into some code, it will be possible to use this to execute arbitrary
    code by providing suitable input to the script, eg,

    my $r = '|a)//; print "Gotcha!\n"; s/(b|';

    will print Gotcha! whenever the substituation routine is
    executed. OTOH, since regexes support executing code anyway, this
    might not be much of a concern.
     
    Rainer Weikusat, Mar 7, 2012
    #3
  4. Dave Saville

    Dave Saville Guest

    On Wed, 7 Mar 2012 17:37:58 UTC, Rainer Weikusat
    <> wrote:

    <snip>
    > A general remark: Consider using my for local variables of
    > subroutines. It is generally faster and these variables are then
    > private to the subroutine which defines them while the local binding
    > will be visible to all subroutines invoked from this one. Also, using
    > a parameter to pass data into subroutines is usually sensible because
    > the subroutine then becomes independent of its environment and it is
    > easier to see what data flows into it (ie, without reading through all
    > of the code). Here, this would mean using something like


    Noted.

    <snip>

    > According to perlre(1) qr/// doesn't support the g modifier and if you


    That explains a lot :)

    > want 'global substituations' (in a particular line of input), you need
    > to apply that to s///. A possible way to achieve that would be to
    > create a subroutine which performs the desired modification on the input
    > line and invoke that for each line. Example how this could be done:
    >
    > ------------
    > my $r = '|g|g'; # set to regex
    > my $s;
    >
    > if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    > $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    > $2, $3));
    > } else {
    > $s = sub { };
    > }
    >
    > $a = 'gagbgcgd';
    > $s->($a);
    >
    > print $a, "\n";
    > ------------


    Sorry you have lost me there. Apart from it not doing what I need if
    the regex is just supplied as "abc" I do not understand that eval -
    obviously it runs the sprintf but what is that sub all about or the $s
    = sub{} ?

    I am guessing that $s gets set to *something* that $s->($a) passes $a
    to - but how does that work?

    Thanks for the help.
    --
    Regards
    Dave Saville
     
    Dave Saville, Mar 7, 2012
    #4
  5. Dave Saville

    Dave Saville Guest

    On Wed, 7 Mar 2012 17:54:28 UTC, Rainer Weikusat
    <> wrote:

    > Rainer Weikusat <> writes:
    >
    > [...]
    >
    > > if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    > > $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    > > $2, $3));

    >
    > Additional remark: Because this is based on interpolating a string
    > into some code, it will be possible to use this to execute arbitrary
    > code by providing suitable input to the script, eg,
    >
    > my $r = '|a)//; print "Gotcha!\n"; s/(b|';
    >
    > will print Gotcha! whenever the substituation routine is
    > executed. OTOH, since regexes support executing code anyway, this
    > might not be much of a concern.


    No concern at all in this case as it's just me. :)
    --
    Regards
    Dave Saville
     
    Dave Saville, Mar 7, 2012
    #5
  6. "Dave Saville" <> writes:
    > On Wed, 7 Mar 2012 17:37:58 UTC, Rainer Weikusat


    [...]


    >> ------------
    >> my $r = '|g|g'; # set to regex
    >> my $s;
    >>
    >> if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    >> $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    >> $2, $3));
    >> } else {
    >> $s = sub { };
    >> }
    >>
    >> $a = 'gagbgcgd';
    >> $s->($a);
    >>
    >> print $a, "\n";
    >> ------------

    >
    > Sorry you have lost me there. Apart from it not doing what I need if
    > the regex is just supplied as "abc"


    It expects that all regexes start and end with a delimiter-character.
    It might make more sense to use that in the function template, eg

    sprintf('sub { $_[0] =~ s%s(%s)%s/XX\1XX%s%s; }',
    $1, $2, $1, $1, $3);

    since there has to be some kind of delimiter and it must not occur
    unescaped in the 're text' itself (so the burden of picking something
    suitable is on the user). Otherwise, some kind of escaping would need
    to be added.

    > I do not understand that eval -
    > obviously it runs the sprintf but what is that sub all about or the $s
    > = sub{} ?
    >


    sub { } just created an anonymous subroutine which does nothing and
    returns a coderef pointing to that. The 'eval' compiles and runs the
    code contained in the string passed to it and returns the result. In
    this case, that's a references to an anonymous subroutine which does
    the intended pattern substituation (subject to the limitations
    mentioned elswhere).
     
    Rainer Weikusat, Mar 7, 2012
    #6
  7. Dave Saville

    Dave Saville Guest

    On Wed, 7 Mar 2012 19:13:13 UTC, Rainer Weikusat
    <> wrote:

    > "Dave Saville" <> writes:
    > > On Wed, 7 Mar 2012 17:37:58 UTC, Rainer Weikusat

    >
    > [...]
    >
    >
    > >> ------------
    > >> my $r = '|g|g'; # set to regex
    > >> my $s;
    > >>
    > >> if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    > >> $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    > >> $2, $3));
    > >> } else {
    > >> $s = sub { };
    > >> }
    > >>
    > >> $a = 'gagbgcgd';
    > >> $s->($a);
    > >>
    > >> print $a, "\n";
    > >> ------------

    > >
    > > Sorry you have lost me there. Apart from it not doing what I need if
    > > the regex is just supplied as "abc"

    >
    > It expects that all regexes start and end with a delimiter-character.
    > It might make more sense to use that in the function template, eg
    >
    > sprintf('sub { $_[0] =~ s%s(%s)%s/XX\1XX%s%s; }',
    > $1, $2, $1, $1, $3);
    >
    > since there has to be some kind of delimiter and it must not occur
    > unescaped in the 're text' itself (so the burden of picking something
    > suitable is on the user). Otherwise, some kind of escaping would need
    > to be added.
    >
    > > I do not understand that eval -
    > > obviously it runs the sprintf but what is that sub all about or the $s
    > > = sub{} ?
    > >

    >
    > sub { } just created an anonymous subroutine which does nothing and
    > returns a coderef pointing to that. The 'eval' compiles and runs the
    > code contained in the string passed to it and returns the result. In
    > this case, that's a references to an anonymous subroutine which does
    > the intended pattern substituation (subject to the limitations
    > mentioned elswhere).


    Thanks again - my brain hurts. :)
    --
    Regards
    Dave Saville
     
    Dave Saville, Mar 7, 2012
    #7
  8. Ben Morrow <> writes:
    > Quoth "Dave Saville" <>:
    >> On Wed, 7 Mar 2012 17:54:28 UTC, Rainer Weikusat
    >> <> wrote:
    >> > Rainer Weikusat <> writes:
    >> >
    >> > > if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    >> > > $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    >> > > $2, $3));
    >> >
    >> > Additional remark: Because this is based on interpolating a string
    >> > into some code,

    >
    > ...it is a completely stupid thing to do in a CGI script.


    Depending on the purpose of the script, ie, is it only going to be
    used by people who control the machine, anyway, as was the case here,
    the input valid needs to be sanitized in some suitable way in order to
    prevent code execution. Since this might not be self-evident to
    everyone (aka 'SQL injection?'), I wrote the additional posting
    pointing this out. Consequently, you assertion that I wrote this
    because I would be 'too stupid' to understand the consequence, while
    presumably being your honest conviction, does not hold water.

    [...]

    >> > it will be possible to use this to execute arbitrary
    >> > code by providing suitable input to the script, eg,
    >> >
    >> > my $r = '|a)//; print "Gotcha!\n"; s/(b|';
    >> >
    >> > will print Gotcha! whenever the substituation routine is
    >> > executed. OTOH, since regexes support executing code anyway, this
    >> > might not be much of a concern.

    >
    > Regexes are protected against interpolated sections containing (?{}) and
    > the like. See the documentation for 'use re "eval"'.


    This is not applicable here, cf

    when "use re 'eval'" is in effect, a regex is allowed to
    contain "(?{ ... })" zero-width assertions even if regular
    expression contains variable

    interpolation. That is normally disallowed, since it is a
    potential security risk. Note that this pragma is ignored when
    the regular expression is obtained from tainted data, i.e.
    evaluation is always disallowed with tainted regular
    expressions. See "(?{ code })"
    in perlre.


    For the purpose of this pragma, interpolation of precompiled
    regular expressions (i.e., the result of "qr//") is not
    considered variable interpolation.

    >> No concern at all in this case as it's just me. :)

    >
    > That is the *wrong* attitude.


    Nope. That's the appropriate attitude: Security isn't about 'voodoo
    dancing for the sake of it' but about preventing some kind of
    privilege escalation from happening. And 'requiring privilege' in
    order to be able to perform the operation at all is a completely valid
    strategy to achieve that: It is conceptually identical to having a
    'privileged' system account of some kind.
     
    Rainer Weikusat, Mar 7, 2012
    #8
  9. Ben Morrow <> writes:
    > Quoth Rainer Weikusat <>:
    >>
    >> my $r = '/g/g';
    >> my ($s, $mod);
    >>
    >> if ($r) {
    >> if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    >> $r = $2;
    >> $mod = $3;
    >> }
    >>
    >> $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    >> qr/$r/, $mod));

    >
    > Clever. Of course, eval is lexically scoped within the surrounding
    > block, so there's no need to compile the regex twice:
    >
    > my $qr = qr/$r/;
    > $s = eval sprintf 'sub { $_[0] =~ s/$qr/XX\1XX/%s; }', $mod;
    >


    This is a regex with a variable dynamically interpolated into it, IOW,
    the $qr is evaluated every time the sub is executed, as opposed to
    once when it is compiled. Since this is not good for anything, it
    should rather be avoided.

    > I might even consider that safe enough to use, given that the
    > interpolated section is now entirely predictable, except that there's no
    > need for eval at all. The only modifers you are accounting for that
    > can't be put in a (?X) are /g and /o, and /o isn't useful anyway, so
    > there's really nothing wrong with
    >
    > my $g = $mod =~ s/g//g;
    > if ($g) {
    > $a =~ s/(?$mod)$r/XX$1XX/g;
    > }
    > else {
    > $a =~ s/(?$mod)$r/XX$1XX/;
    > }
    >
    > and it's a whole lot harder to get wrong.


    It is more code (consequently, it is easier to get wrong) and it
    requires a conditional test whose outcome is always either one way or
    the other way for each line of input processed. And log files can
    easily be huge. Not to mention that it is also prone to so-called
    'update anomalies' since there are now two replacement expressions
    which need to be kept in sync manually.

    [...]

    >> >> Regexes are protected against interpolated sections containing (?{}) and
    >> >> the like. See the documentation for 'use re "eval"'.
    >> >
    >> > This is not applicable here, cf

    >>
    >> [...]
    >>
    >> > For the purpose of this pragma, interpolation of precompiled
    >> > regular expressions (i.e., the result of "qr//") is not
    >> > considered variable interpolation.

    >>
    >> The input to qr// is considered variable interpolation, though,
    >> meaning, ordinarily, (?{...}) won't work.

    >
    > ...which was my point. Interpolating into a regex is safe (modulo DoS
    > attacks using everlasting patterns) in a way eval can never be.


    There's no problem with eval per se, just with the string which is
    actually evaluated. Consequently, there is some (possibly absent)
    input validation/ sanitization code and this code can either be
    correct, then, evaluating the input is 'safe;', or incorrect and then, it
    isn't provided 'evaluating user input' isn't intentionally done in
    order to enable users to execute whatever Perl code might be useful
    for them. I've done this at least once in the past, with a script
    which was not intended to be a security boundary, where users of the
    script could use an arbitrary Perl expression in order to provide 'a
    size argument'. The idea was that it might be convenient to use a Perl
    expression calculating the actual size instead of having to do that in
    some separate way before invoking the script.
     
    Rainer Weikusat, Mar 7, 2012
    #9
  10. Ben Morrow <> writes:
    > Quoth Rainer Weikusat <>:
    >> Ben Morrow <> writes:
    >> >
    >> > my $qr = qr/$r/;
    >> > $s = eval sprintf 'sub { $_[0] =~ s/$qr/XX\1XX/%s; }', $mod;
    >> >

    >>
    >> This is a regex with a variable dynamically interpolated into it, IOW,
    >> the $qr is evaluated every time the sub is executed, as opposed to
    >> once when it is compiled. Since this is not good for anything, it
    >> should rather be avoided.

    >
    > No it isn't. Run it under re "debug" to see. Since the LHS of the s///
    > is nothing but a qr//, perl doesn't recompile the regex, it just uses
    > the compiled version in the qr//. (That's the whole point of qr//.)
    >
    > In fact, perl is cleverer than that. Even if you were to write
    >
    > my $r = "f";
    > $s = eval 'sub { /XX $r/ }';
    >
    > perl would still only compile the pattern once. It tracks the variables
    > used by the pattern, and only recompiles it if they've changed. (This is
    > why /o is not useful any more.)


    The most-recent perl I'm using is 5.10.1 and it still documents this
    as

    PATTERN may contain variables, which will be interpolated (and
    the pattern recompiled) every time the pattern search is
    evaluated, except for when the delimiter is a single quote.
    (Note that $(, $), and $| are not interpolated because they
    look like end-of-string tests.) If you want such a pattern to
    be compiled only once, add a "/o" after the trailing
    delimiter.

    Even assuming the additional checks you mentioned above (since when to
    these exist), that still means execution-time overhead which can
    easily be avoided.

    > Your code,
    >
    > $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    > qr/$r/, $mod));
    >
    > OTOH, compiles the pattern twice. First it is compiled into a qr//, then
    > that qr// is stringified and interpolated into a larger string, then
    > that larger string is evaled. The eval will have to compile the pattern
    > again, since it's lost any connection to the first qr//.


    Well, yes. But since this is executed only once while the substitution
    routine is possibly called millions (if not billions) of times, that
    seems like a worthwhile tradeoff to me.

    [...]


    >> There's no problem with eval per se, just with the string which is
    >> actually evaluated. Consequently, there is some (possibly absent)
    >> input validation/ sanitization code and this code can either be
    >> correct, then, evaluating the input is 'safe;', or incorrect and then, it
    >> isn't provided 'evaluating user input' isn't intentionally done in
    >> order to enable users to execute whatever Perl code might be useful
    >> for them.

    >
    > Yes, for those of us who are perfect programmers (and who can be certain
    > everyone maintaining the code after them will be too), eval can be
    > terribly convenient. For those of us who live in the real world, it
    > isn't safe.


    "It isn't safe" means nothing except "Be afraid of it! You really
    ought to!". Nothing which can be misused is safe from being misused
    and 'in the real world' people who are anything but 'perfect
    programmers' have created a whole new class of exploits by
    interpolating userdata into SQL statement templates without proper
    quoting (so-called 'SQL injection'). And the solution to this problem
    is not 'trying to scare people away from using this technique' (that's
    not going to work for the people who created the problem, anyway,
    because their main objective is usually "get shit done fast") but to
    teach them how to avoid interpolation into strings when it isn't
    needed (eg, when a parametrized query can be used instead) and how to
    do it safely when it is needed.

    Granted, it would be nicer when Perl had some structured
    representation of its own code which could be used for runtime code
    generation but that gets us deeply into 'oatmeal with finger nail
    clippings' territory :) (and IMHO, the lisp guys went off the deep end
    with CLOS ...).

    >> I've done this at least once in the past, with a script
    >> which was not intended to be a security boundary,

    >
    > A CGI script (or anything else which talks to the network) should always
    > be considered a security boundary. That's part of my point.


    And this point is wrong: Eg, a CGI script which is supposed to be used
    by authenticated users to make configuration changes to some system
    they control is not a security boundary.
     
    Rainer Weikusat, Mar 8, 2012
    #10
  11. Dave Saville

    Dave Saville Guest

    On Wed, 7 Mar 2012 20:20:09 UTC, Ben Morrow <> wrote:

    >
    > Quoth "Dave Saville" <>:
    > > On Wed, 7 Mar 2012 17:54:28 UTC, Rainer Weikusat
    > > <> wrote:
    > > > Rainer Weikusat <> writes:
    > > >
    > > > > if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    > > > > $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    > > > > $2, $3));
    > > >
    > > > Additional remark: Because this is based on interpolating a string
    > > > into some code,

    >
    > ...it is a completely stupid thing to do in a CGI script. This is,
    > unforunately, a case which would have slipped by taint mode, but it's
    > still ridiculously dangerous.
    >


    So given the original problem - Search remote log files for some
    arbitary regex - how would you go about it? SSH to the box and grep is
    not an acceptable solution. Much easier in a browser where the log
    files are predefined ( one does not have to remember *where* they are
    - not all in /var/log) and additionally you have the browsers search
    and highlight to play with.

    > > > it will be possible to use this to execute arbitrary
    > > > code by providing suitable input to the script, eg,
    > > >
    > > > my $r = '|a)//; print "Gotcha!\n"; s/(b|';
    > > >
    > > > will print Gotcha! whenever the substituation routine is
    > > > executed. OTOH, since regexes support executing code anyway, this
    > > > might not be much of a concern.

    >
    > Regexes are protected against interpolated sections containing (?{}) and
    > the like. See the documentation for 'use re "eval"'.
    >
    > > No concern at all in this case as it's just me. :)

    >
    > That is the *wrong* attitude. If it's a CGI script you *must* assume Bad
    > People will get their hands on it, no matter how 'internal' you think
    > the webserver serving it is. You should be using taint mode and writing
    > with security in mind.
    >


    Given that it is my system and that particular script is on a non
    standard port that is shut on the firewall I think that is very
    internal :) My server *always* runs taint mode *and* I check the
    logs.

    As a general point, if you need to provide a search to end users,
    forgeting my requirement to type a regex, how would you sanitise the
    provided search? ie just a simple string search but that would be
    implemented by a regex.
    --
    Regards
    Dave Saville
     
    Dave Saville, Mar 8, 2012
    #11
  12. Ben Morrow wrote:
    >
    > Quoth Rainer Weikusat<>:
    >>
    >> my $r = '/g/g';
    >> my ($s, $mod);
    >>
    >> if ($r) {
    >> if ($r =~ /(.)(.*)\1([giomsx]*)$/) {
    >> $r = $2;
    >> $mod = $3;
    >> }
    >>
    >> $s = eval(sprintf('sub { $_[0] =~ s/(%s)/XX\1XX/%s; }',
    >> qr/$r/, $mod));

    >
    > Clever. Of course, eval is lexically scoped within the surrounding
    > block, so there's no need to compile the regex twice:
    >
    > my $qr = qr/$r/;
    > $s = eval sprintf 'sub { $_[0] =~ s/$qr/XX\1XX/%s; }', $mod;


    You are still compiling the regexp twice. Also there is no need to use
    sprintf and the use of \1 in the replacement string should be $1:

    $s = eval "sub { \$_[0] =~ s/$r/XX\${1}XX/$mod; }";



    John
    --
    Any intelligent fool can make things bigger and
    more complex... It takes a touch of genius -
    and a lot of courage to move in the opposite
    direction. -- Albert Einstein
     
    John W. Krahn, Mar 8, 2012
    #12
  13. "Dave Saville" <> writes:
    > On Wed, 7 Mar 2012 20:20:09 UTC, Ben Morrow <> wrote:


    [...]

    > As a general point, if you need to provide a search to end users,
    > forgeting my requirement to type a regex, how would you sanitise the
    > provided search? ie just a simple string search but that would be
    > implemented by a regex.


    Use qr//, cf

    This operator quotes (and possibly compiles) its STRING
    as a regular expression. STRING is interpolated the
    same way as PATTERN in "m/PATTERN/". If "'" is used as
    the delimiter, no interpolation is done.

    The qr actually means 'quote regex' and judging from the 'and possibly
    compiles' quoted above, it was originally meant to be used for
    quoting.
     
    Rainer Weikusat, Mar 8, 2012
    #13
  14. "John W. Krahn" <> writes:
    > Ben Morrow wrote:


    [...]

    >> $s = eval sprintf 'sub { $_[0] =~ s/$qr/XX\1XX/%s; }', $mod;

    >
    > You are still compiling the regexp twice. Also there is no need to
    > use sprintf and the use of \1 in the replacement string should be $1:
    >
    > $s = eval "sub { \$_[0] =~ s/$r/XX\${1}XX/$mod; }";


    The \1 on the RHS came from the originally posted code. According to
    perlre(1),

    This is grandfathered for the RHS of a substitute to avoid
    shocking the sed addicts, but it's a dirty habit to get into.

    So, if somebody wants to use it in his own code, why not? The sprintf
    came from me. I usually prefer that to doing variable interpolations
    because it clearly separates the template string and the argument
    interpolated into it.
     
    Rainer Weikusat, Mar 8, 2012
    #14
  15. Dave Saville

    Dave Saville Guest

    Thanks Ben and Rainer for your help and explanations. Working to my
    satisfaction now :)
    --
    Regards
    Dave Saville
     
    Dave Saville, Mar 8, 2012
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    713
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,635
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    602
  4. Xah Lee
    Replies:
    1
    Views:
    945
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    772
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page