Stopping code execution in RE's

Discussion in 'Perl Misc' started by Matthew Braid, May 13, 2004.

  1. Hi all,

    OK, first off - the perldocs for RE's make my head hurt :)

    I'm writing a formatter package that basically has a little language of its own
    to allow joombie users format their own stuff without bothering me....

    One of the formatter 'tags' allows the use of a regular expression (in a
    slightly stunted fashion), however I want to ensure no code execution is performed.

    RE's are specified like:

    find /RE/, REFLAGS

    REFLAGS can be a string containing 'ismx' - anything else causes an error.
    RE is a (duh) regular expression. This is split into RE strings (for instance
    '^[a-z]\Q$FOO\E') and variables (for instance $FOO, @BAR etc).

    The variable handling is done - that was pretty easy.

    Technically the RE handling is pretty easy too - I could just rebuild it by
    evaluating the variables and plonking it all back together, eg

    store $BAR, 'BAR[]!'
    find /^[a-z](\Q$FOO\E$BAR)\z/, 'is', FIELD

    => RE is split into '^[a-z](\Q$FOO\E', $BAR, ')\z';
    => $BAR evaluates to 'BAR[]!' -> quote it with '\QBAR[]!\E';
    => $RE is built into '(?is:^[a-z](\Q$FOO\E\QBAR[]!\E)\z)';
    $FIELD =~ $re;
    etc etc... (obviously done so that it actually works....)

    The problem here is if the user does something like:

    find /(?{system('rm -rf /')})/, '', FIELD

    This is obviously not something I want to run. Is there a simple thing I can do
    to stop RE's from ever executing inline code, or do I have to add checks for
    this to my RE parser?

    MB
     
    Matthew Braid, May 13, 2004
    #1
    1. Advertising

  2. Matthew Braid wrote:

    <snip>
    > The problem here is if the user does something like:
    >
    > find /(?{system('rm -rf /')})/, '', FIELD
    >
    > This is obviously not something I want to run. Is there a simple thing I
    > can do to stop RE's from ever executing inline code, or do I have to add
    > checks for this to my RE parser?


    I'm replying to myself, but I've found one slightly non-ugly solution....

    If I insert an artificial variable into the re mix, execution is disallowed.

    For example, if I have a variable called $null with a value of '', and I build
    the RE like so:

    eval "\$re = qr/\$null$re/$reflags";
    $field =~ $re;

    then if the user specified RE has (?{ print "ARGH!" }) in it it fails with the
    error:

    Eval-group not allowed at runtime, use re 'eval' in regex m/....

    but the re is OK if no eval groups are included.

    Obviously I'm never gonna use re 'eval'.

    On a side note, I'm not happy using eval "..." but I have to do that for the \Q,
    \L, \U, \E tags.... May be I'll just parse for them earlier and convert things
    myself....
     
    Matthew Braid, May 13, 2004
    #2
    1. Advertising

  3. Matthew Braid

    Anno Siegel Guest

    Matthew Braid <> wrote in comp.lang.perl.misc:
    > Matthew Braid wrote:
    >
    > <snip>
    > > The problem here is if the user does something like:
    > >
    > > find /(?{system('rm -rf /')})/, '', FIELD
    > >
    > > This is obviously not something I want to run. Is there a simple thing I
    > > can do to stop RE's from ever executing inline code, or do I have to add
    > > checks for this to my RE parser?

    >
    > I'm replying to myself, but I've found one slightly non-ugly solution....
    >
    > If I insert an artificial variable into the re mix, execution is disallowed.
    >
    > For example, if I have a variable called $null with a value of '', and I build
    > the RE like so:
    >
    > eval "\$re = qr/\$null$re/$reflags";


    No big deal, but you don't have to do the assignment inside eval.

    $re = eval "qr/\$null$re/$reflags";

    > $field =~ $re;
    >
    > then if the user specified RE has (?{ print "ARGH!" }) in it it fails with the
    > error:
    >
    > Eval-group not allowed at runtime, use re 'eval' in regex m/....
    >
    > but the re is OK if no eval groups are included.
    >
    > Obviously I'm never gonna use re 'eval'.
    >
    > On a side note, I'm not happy using eval "..." but I have to do that for
    > the \Q,
    > \L, \U, \E tags.... May be I'll just parse for them earlier and convert things
    > myself....


    In full generality, "eval" will be hard to avoid, but you *can* reduce
    its use to the extraction of variable values. Something like this

    for my $varname ( $regex =~ /(\$[A-Za-z_]\w*)/g ) {
    my $val = eval "qq($varname)";
    $varname = '\\' . $varname;
    $regex =~ s/$varname/$val/g;
    }
    my $re = qr/$regex/;

    This expands the variable values right inside the regex string. It may
    need quite a bit of refinement, I'm being sketchy here.

    I'm wondering how the user knows what variables they can use. If they
    are user-defined variables, the user can do their own interpolation,
    so I assume the variables are defined in your program. But then
    the user needs some kind of catalog of what variables are available.

    In that case you could build a hash of all possible variable names
    and their (string) values and substitute according to that. No
    eval is needed.

    Anno
     
    Anno Siegel, May 13, 2004
    #3
  4. Anno Siegel wrote:

    > In full generality, "eval" will be hard to avoid, but you *can* reduce
    > its use to the extraction of variable values. Something like this
    >
    > for my $varname ( $regex =~ /(\$[A-Za-z_]\w*)/g ) {
    > my $val = eval "qq($varname)";
    > $varname = '\\' . $varname;
    > $regex =~ s/$varname/$val/g;
    > }
    > my $re = qr/$regex/;


    On parsing the file my RE object consists of a list of strings and variable
    names. The variable names are resolved before the perl-RE is built from the RE
    object and wrapped in \Q/\E so I don't have to worry about weird values....

    > I'm wondering how the user knows what variables they can use. If they
    > are user-defined variables, the user can do their own interpolation,
    > so I assume the variables are defined in your program. But then
    > the user needs some kind of catalog of what variables are available.
    >
    > In that case you could build a hash of all possible variable names
    > and their (string) values and substitute according to that. No
    > eval is needed.


    The 'language' allows specification of variables (scalars like $FOO, arrays like
    @FOO and array elements like @FOO[INDEX]) that are stored in a hash-structure
    inside a Storage package. Once again, the stumbling block is the special quote
    characters \Q, \L, \U, \l, \u and \E - I'm having a hard time building a valid
    RE out of, say:

    $part1 = '\QThis is quoted';
    $part2 = '\Uthis is uppercase';

    because:

    $re = qr/$part1$part2/;
    $re2 = qr/\QThis is quoted\Uthis is uppercase/;
    $re3 = eval "qr/$part1$part2/";
    print "RE1 IS $re\nRE2 IS $re2\nRE3 IS $re3\n";

    results in:

    RE1 IS (?-xism:\QThis is quoted\Uthis is uppercase)
    RE2 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)
    RE3 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)

    I'm currently reworking it so that when the parser finds an RE (or an
    interpolated string), instead of building a list of strings and variable names,
    it builds a list of strings, variable names and inline commands so that by the
    time the real RE is built the strings/variable values have been
    quoted/uppercased/lowercased etc so I don't need the eval.

    Who would've thought that building a language was so complex :)

    MB
     
    Matthew Braid, May 14, 2004
    #4
  5. Matthew Braid

    Ben Morrow Guest

    Quoth Matthew Braid <>:
    > The 'language' allows specification of variables (scalars like $FOO, arrays like
    > @FOO and array elements like @FOO[INDEX]) that are stored in a hash-structure
    > inside a Storage package. Once again, the stumbling block is the special quote
    > characters \Q, \L, \U, \l, \u and \E - I'm having a hard time building a valid
    > RE out of, say:
    >
    > $part1 = '\QThis is quoted';
    > $part2 = '\Uthis is uppercase';
    >
    > because:
    >
    > $re = qr/$part1$part2/;
    > $re2 = qr/\QThis is quoted\Uthis is uppercase/;
    > $re3 = eval "qr/$part1$part2/";
    > print "RE1 IS $re\nRE2 IS $re2\nRE3 IS $re3\n";
    >
    > results in:
    >
    > RE1 IS (?-xism:\QThis is quoted\Uthis is uppercase)
    > RE2 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)
    > RE3 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)


    How about

    $part1 = '\QThis is quoted';
    $part2 = '\UThis is uppercase';

    for ($part1, $part2) {
    s!/!\\/!g;
    $_ = eval "qr/$_/";
    }

    $re = qr/$part1$part2/;
    print $re;

    (?-xism:(?-xism:This\ is\ quoted)(?-xism:THIS IS UPPERCASE))

    Ben

    --
    perl -e'print map {/.(.)/s} sort unpack "a2"x26, pack "N"x13,
    qw/1632265075 1651865445 1685354798 1696626283 1752131169 1769237618
    1801808488 1830841936 1886550130 1914728293 1936225377 1969451372
    2047502190/' #
     
    Ben Morrow, May 14, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    372
  2. Replies:
    12
    Views:
    678
    Roedy Green
    Oct 2, 2005
  3. James Colannino

    Stopping Execution

    James Colannino, Nov 10, 2005, in forum: Python
    Replies:
    4
    Views:
    387
  4. BigHand
    Replies:
    6
    Views:
    888
    David Shapiro
    Jun 10, 2009
  5. a
    Replies:
    2
    Views:
    259
Loading...

Share This Page