Stopping code execution in RE's

M

Matthew Braid

Hi all,

OK, first off - the perldocs for RE's make my head hurt :)

I'm writing a formatter package that basically has a little language of its own
to allow joombie users format their own stuff without bothering me....

One of the formatter 'tags' allows the use of a regular expression (in a
slightly stunted fashion), however I want to ensure no code execution is performed.

RE's are specified like:

find /RE/, REFLAGS

REFLAGS can be a string containing 'ismx' - anything else causes an error.
RE is a (duh) regular expression. This is split into RE strings (for instance
'^[a-z]\Q$FOO\E') and variables (for instance $FOO, @BAR etc).

The variable handling is done - that was pretty easy.

Technically the RE handling is pretty easy too - I could just rebuild it by
evaluating the variables and plonking it all back together, eg

store $BAR, 'BAR[]!'
find /^[a-z](\Q$FOO\E$BAR)\z/, 'is', FIELD

=> RE is split into '^[a-z](\Q$FOO\E', $BAR, ')\z';
=> $BAR evaluates to 'BAR[]!' -> quote it with '\QBAR[]!\E';
=> $RE is built into '(?is:^[a-z](\Q$FOO\E\QBAR[]!\E)\z)';
$FIELD =~ $re;
etc etc... (obviously done so that it actually works....)

The problem here is if the user does something like:

find /(?{system('rm -rf /')})/, '', FIELD

This is obviously not something I want to run. Is there a simple thing I can do
to stop RE's from ever executing inline code, or do I have to add checks for
this to my RE parser?

MB
 
M

Matthew Braid

Matthew Braid wrote:

The problem here is if the user does something like:

find /(?{system('rm -rf /')})/, '', FIELD

This is obviously not something I want to run. Is there a simple thing I
can do to stop RE's from ever executing inline code, or do I have to add
checks for this to my RE parser?

I'm replying to myself, but I've found one slightly non-ugly solution....

If I insert an artificial variable into the re mix, execution is disallowed.

For example, if I have a variable called $null with a value of '', and I build
the RE like so:

eval "\$re = qr/\$null$re/$reflags";
$field =~ $re;

then if the user specified RE has (?{ print "ARGH!" }) in it it fails with the
error:

Eval-group not allowed at runtime, use re 'eval' in regex m/....

but the re is OK if no eval groups are included.

Obviously I'm never gonna use re 'eval'.

On a side note, I'm not happy using eval "..." but I have to do that for the \Q,
\L, \U, \E tags.... May be I'll just parse for them earlier and convert things
myself....
 
A

Anno Siegel

Matthew Braid said:
Matthew Braid wrote:



I'm replying to myself, but I've found one slightly non-ugly solution....

If I insert an artificial variable into the re mix, execution is disallowed.

For example, if I have a variable called $null with a value of '', and I build
the RE like so:

eval "\$re = qr/\$null$re/$reflags";

No big deal, but you don't have to do the assignment inside eval.

$re = eval "qr/\$null$re/$reflags";
$field =~ $re;

then if the user specified RE has (?{ print "ARGH!" }) in it it fails with the
error:

Eval-group not allowed at runtime, use re 'eval' in regex m/....

but the re is OK if no eval groups are included.

Obviously I'm never gonna use re 'eval'.

On a side note, I'm not happy using eval "..." but I have to do that for
the \Q,
\L, \U, \E tags.... May be I'll just parse for them earlier and convert things
myself....

In full generality, "eval" will be hard to avoid, but you *can* reduce
its use to the extraction of variable values. Something like this

for my $varname ( $regex =~ /(\$[A-Za-z_]\w*)/g ) {
my $val = eval "qq($varname)";
$varname = '\\' . $varname;
$regex =~ s/$varname/$val/g;
}
my $re = qr/$regex/;

This expands the variable values right inside the regex string. It may
need quite a bit of refinement, I'm being sketchy here.

I'm wondering how the user knows what variables they can use. If they
are user-defined variables, the user can do their own interpolation,
so I assume the variables are defined in your program. But then
the user needs some kind of catalog of what variables are available.

In that case you could build a hash of all possible variable names
and their (string) values and substitute according to that. No
eval is needed.

Anno
 
M

Matthew Braid

Anno said:
In full generality, "eval" will be hard to avoid, but you *can* reduce
its use to the extraction of variable values. Something like this

for my $varname ( $regex =~ /(\$[A-Za-z_]\w*)/g ) {
my $val = eval "qq($varname)";
$varname = '\\' . $varname;
$regex =~ s/$varname/$val/g;
}
my $re = qr/$regex/;

On parsing the file my RE object consists of a list of strings and variable
names. The variable names are resolved before the perl-RE is built from the RE
object and wrapped in \Q/\E so I don't have to worry about weird values....
I'm wondering how the user knows what variables they can use. If they
are user-defined variables, the user can do their own interpolation,
so I assume the variables are defined in your program. But then
the user needs some kind of catalog of what variables are available.

In that case you could build a hash of all possible variable names
and their (string) values and substitute according to that. No
eval is needed.

The 'language' allows specification of variables (scalars like $FOO, arrays like
@FOO and array elements like @FOO[INDEX]) that are stored in a hash-structure
inside a Storage package. Once again, the stumbling block is the special quote
characters \Q, \L, \U, \l, \u and \E - I'm having a hard time building a valid
RE out of, say:

$part1 = '\QThis is quoted';
$part2 = '\Uthis is uppercase';

because:

$re = qr/$part1$part2/;
$re2 = qr/\QThis is quoted\Uthis is uppercase/;
$re3 = eval "qr/$part1$part2/";
print "RE1 IS $re\nRE2 IS $re2\nRE3 IS $re3\n";

results in:

RE1 IS (?-xism:\QThis is quoted\Uthis is uppercase)
RE2 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)
RE3 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)

I'm currently reworking it so that when the parser finds an RE (or an
interpolated string), instead of building a list of strings and variable names,
it builds a list of strings, variable names and inline commands so that by the
time the real RE is built the strings/variable values have been
quoted/uppercased/lowercased etc so I don't need the eval.

Who would've thought that building a language was so complex :)

MB
 
B

Ben Morrow

Quoth Matthew Braid said:
The 'language' allows specification of variables (scalars like $FOO, arrays like
@FOO and array elements like @FOO[INDEX]) that are stored in a hash-structure
inside a Storage package. Once again, the stumbling block is the special quote
characters \Q, \L, \U, \l, \u and \E - I'm having a hard time building a valid
RE out of, say:

$part1 = '\QThis is quoted';
$part2 = '\Uthis is uppercase';

because:

$re = qr/$part1$part2/;
$re2 = qr/\QThis is quoted\Uthis is uppercase/;
$re3 = eval "qr/$part1$part2/";
print "RE1 IS $re\nRE2 IS $re2\nRE3 IS $re3\n";

results in:

RE1 IS (?-xism:\QThis is quoted\Uthis is uppercase)
RE2 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)
RE3 IS (?-xism:This\ is\ quotedTHIS\ IS\ UPPERCASE)

How about

$part1 = '\QThis is quoted';
$part2 = '\UThis is uppercase';

for ($part1, $part2) {
s!/!\\/!g;
$_ = eval "qr/$_/";
}

$re = qr/$part1$part2/;
print $re;

(?-xism:(?-xism:This\ is\ quoted)(?-xism:THIS IS UPPERCASE))

Ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top