Sanitizing untrusted code for eval()

J

Jim Washington

I'm still working on yet another parser for JSON (http://json.org). It's
called minjson, and it's tolerant on input, strict on output, and pretty
fast. The only problem is, it uses eval(). It's important to sanitize the
incoming untrusted code before sending it to eval(). Because eval() is
evil http://blogs.msdn.com/ericlippert/archive/2003/11/01/53329.aspx
apparently in every language.

A search for potential trouble with eval() in python turned up the
following.

1. Multiplication and exponentiation, particularly in concert with
strings, can do a DoS on the server, e.g., 'x'*9**99**999**9999

2. lambda can cause mischief, and therefore is right out.

3. Introspection can expose other methods to the untrusted code. e.g.,
{}.__class__.__bases__[0].__subclasses__... can climb around in the
object hierarchy and execute arbitrary methods.

4. List comprehensions might be troublesome, though it's not clear to me
how a DoS or exploit is possible with these. But presuming potential
trouble, 'for' is also right out. It's not in the JSON spec anyway.

So, the above seems to indicate disallowing "*", "__", "lambda", and "for"
anywhere outside a string in the untrusted code. Raise an error before
sending to eval().

I'm using eval() with proper __builtins__ and locals, e.g.,

result =eval(aString,
{"__builtins__":{'True':True,'False':False,'None':None}},
{'null':None,'true':True,'false':False})

I am familiar with this thread:
http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/cbcc21b95af0d9cc

Does anyone know of any other "gotchas" with eval() I have not found? Or
is eval() simply too evil?

-Jim Washington
 
B

Benji York

Jim said:
I'm still working on yet another parser for JSON (http://json.org).

Hi, Jim.
The only problem is, it uses eval(). It's important to sanitize the
incoming untrusted code before sending it to eval().
Does anyone know of any other "gotchas" with eval() I have not found? Or
is eval() simply too evil?

I'd say that eval is just too evil.

I do wonder if it would be possible to use eval by working from the
other direction. Instead of trying to filter out dangerous things, only
allow a *very* strict set of things in.

For example, since your doing JSON, you don't even need to allow
multiplication. If you only allowed dictionaries with string keys and a
restricted set of types as values, you'd be pretty close. But once
you're at that point you might as well use your own parser and not use
eval at all. <shrug>
 
D

Diez B. Roggisch

Does anyone know of any other "gotchas" with eval() I have not found? Or
is eval() simply too evil?

Yes - and from what I can see on the JSON-Page, it should be _way_
easier to simply write a parser your own - that ensures that only you
decide what python code gets called.

Diez
_
 
S

Scott David Daniels

Diez said:
Yes - and from what I can see on the JSON-Page, it should be _way_
easier to simply write a parser your own - that ensures that only you
decide what python code gets called.

Diez
_
Another thing you can do is use the compile message and then only allow
certain bytecodes. Of course this approach means you need to implement
this in a major version-dependent fashion, but it saves you the work of
mapping source code to python. Eventually there will be another form
available (the AST form), but that will show up no earlier than 2.5.
As a matter of pure practicality, it turns out you can probably use
almost the same code to look at 2.3 and 2.4 byte codes.


--Scott David Daniels
(e-mail address removed)
 
F

Fredrik Lundh

Jim said:
4. List comprehensions might be troublesome, though it's not clear to me
how a DoS or exploit is possible with these.

see item 1.
Or is eval() simply too evil?

yes.

however, running a tokenizer over the source string and rejecting any string
that contains unknown tokens (i.e. anything that's not a literal, comma,
colon,
or square or curly bracket) before evaluation might be good enough.

(you can use Python's standard tokenizer module, or rip out the relevant
parts
from it and use the RE engine directly)

</F>
 
D

Diez B. Roggisch

Another thing you can do is use the compile message and then only allow
certain bytecodes. Of course this approach means you need to implement
this in a major version-dependent fashion, but it saves you the work of
mapping source code to python. Eventually there will be another form
available (the AST form), but that will show up no earlier than 2.5.
As a matter of pure practicality, it turns out you can probably use
almost the same code to look at 2.3 and 2.4 byte codes.

I don't know much about python byte code, but from the JASON-HP - which
features the grammar for JASON on the first page - I'm under the strong
impression that abusing the python parser by whatever means, including
the byte-code ahck you propse, is way more complicated than writing a
small parser - I don't know pyparsing, but I know spark, and it would be
a matter of 30 lines of code. And 100% no loopholes...

Additionally, having a parser allows you to spit out meaningful errors -
whilst mapping byte code back to input lines is certainly not easy, if
feasible at all.

Regards,

Diez
 
J

Jim Washington

however, running a tokenizer over the source string and rejecting any string
that contains unknown tokens (i.e. anything that's not a literal, comma,
colon,
or square or curly bracket) before evaluation might be good enough.

(you can use Python's standard tokenizer module, or rip out the relevant
parts
from it and use the RE engine directly)

This seems like the right compromise, and not too difficult.
OOTB, tokenize burns a couple of additional milliseconds per read,
but maybe I can start there and optimize, as you say, and be a bit more
sure that python's parser is not abused into submission.

BTW, this afternoon I sent a couple of hours of random junk to eval()
just to see what would be accepted.

I did not know before that

5|3 = 7
6^3 = 5
~6 = -7
()and aslfsdf = ()

Amusing stuff.

Thanks!

-Jim Washington
 
P

Paul McGuire

Here's the pyparsing rendition - about 24 lines of code, and another 30
for testing.
For reference, here's the JSON "bnf":

object
{ members }
{}
members
string : value
members , string : value
array
[ elements ]
[]
elements
value
elements , value
value
string
number
object
array
true
false
null

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul

from pyparsing import *

TRUE = Keyword("true")
FALSE = Keyword("false")
NULL = Keyword("null")

jsonString = dblQuotedString.setParseAction( removeQuotes )
jsonNumber = Combine( Optional('-') + ( '0' | Word('123456789',nums) )
+
Optional( '.' + Word(nums) ) +
Optional( Word('eE',exact=1) + Word(nums+'+-',nums)
) )

jsonObject = Forward()
jsonValue = Forward()
jsonElements = delimitedList( jsonValue )
jsonArray = Group( Suppress('[') + jsonElements + Suppress(']') )
jsonValue << ( jsonString | jsonNumber | jsonObject | jsonArray | TRUE
| FALSE | NULL )
memberDef = Group( jsonString + Suppress(':') + jsonValue )
jsonMembers = delimitedList( memberDef )
jsonObject << Dict( Suppress('{') + jsonMembers + Suppress('}') )

lineComment = '//' + restOfLine
jsonComment = FollowedBy('/') + ( cStyleComment | lineComment )
jsonObject.ignore( jsonComment )

testdata = """
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": [{
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Standard Generalized Markup Language",
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef":
"A meta-markup language, used to create markup languages such as
DocBook.",
"GlossSeeAlso": ["GML", "XML", "markup"]
}]
}
}
}
"""

results = jsonObject.parseString(testdata)

import pprint
pprint.pprint( results.asList() )
print results.glossary.title
print results.glossary.GlossDiv
print results.glossary.GlossDiv.GlossList.keys()

Prints out (I've inserted blank lines to separate the output from the
different print statements):
[['glossary',
['title', 'example glossary'],
['GlossDiv',
['title', 'S'],
['GlossList',
[['ID', 'SGML'],
['SortAs', 'SGML'],
['GlossTerm', 'Standard Generalized Markup Language'],
['Acronym', 'SGML'],
['Abbrev', 'ISO 8879:1986'],
['GlossDef',
'A meta-markup language, used to create markup languages such as
DocBook.'],
['GlossSeeAlso', ['GML', 'XML', 'markup']]]]]]]

example glossary

[['title', 'S'], ['GlossList', [['ID', 'SGML'], ['SortAs', 'SGML'],
['GlossTerm', 'Standard Generalized Markup Language'], ['Acronym',
'SGML'], ['Abbrev', 'ISO 8879:1986'], ['GlossDef', 'A meta-markup
language, used to create markup languages such as DocBook.'],
['GlossSeeAlso', ['GML', 'XML', 'markup']]]]]

['GlossSeeAlso', 'GlossDef', 'Acronym', 'GlossTerm', 'SortAs',
'Abbrev', 'ID']
 
A

Alan Kennedy

[Jim Washington]
I'm still working on yet another parser for JSON (http://json.org). It's
called minjson, and it's tolerant on input, strict on output, and pretty
fast. The only problem is, it uses eval(). It's important to sanitize the
incoming untrusted code before sending it to eval().

I think that you shouldn't need eval to parse JSON.

For a discussion of the use of eval in pyjsonrpc, between me and the
author, Jan-Klaas Kollhof, see the content of the following links. A
discussion of the relative time *in*efficiency of eval is also included:
it is much faster to use built-in functions such str and float to
convert from JSON text/tokens to strings and numbers.

http://mail.python.org/pipermail/python-list/2005-February/265805.html
http://groups.yahoo.com/group/json-rpc/message/55

Pyjsonrpc uses the python tokeniser to split up JSON strings, which
means that you cannot be strict about things like double (") vs. single
(') quotes, etc.

JSON is so simple, I think it best to write a tokeniser and parser for
it, either using a parsing library, or just coding your own.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top