Ideas to optimize this getitem/eval call?

Discussion in 'Python' started by mario, Jan 3, 2009.

  1. mario

    mario Guest

    Hi,

    below is the essence of a an expression evaluator, by means of a
    getitem lookup. The expression codes are compiled and cached -- the
    lookup is actually recursive, and the first time around it will always
    fail.

    import sys
    class GetItemEval(object):

    def __init__(self):
    self.globals = globals() # some dict (always the same)
    self.locals = {} # some other dict (may be different between
    evaluations)
    self.codes = {} # compiled code expressions cache

    def __getitem__(self, expr):
    try:
    return eval(self.codes[expr], self.globals, self.locals)
    except:
    # KeyError, NameError, AttributeError, SyntaxError,
    ValueError,
    # TypeError, IOError
    #
    # Special case if a KeyError is coming from the self.codes
    [name]
    # lookup (traceback should consist of a single frame
    only):
    if sys.exc_info()[2].tb_next is None:
    if sys.exc_info()[0] is KeyError:
    self.codes[expr] = compile(expr, '<string>',
    'eval')
    return self[expr]
    # otherwise handle eval error in some way...

    This class could be used in a way as follows:

    # define some expressions
    def f(s): return "["+s+"]"
    exprs = ["1+2+3", "2*3*5", "f(__name__)"]
    # instantiate one
    gie = GetItemEval()
    # use it to lookup/eval each expression
    for x in exprs:
    print x, "=", gie[x]

    And, fwiw, some sample timeit code:

    import timeit
    print timeit.Timer("for x in exprs: gie[x]",
    "from __main__ import gie, exprs").timeit(500000)

    I had done various poking to discover if it could be made to go
    faster, and in the end I settled on the version above.

    mario

    Incidentally this constitutes the lion's share of the evaluation
    runtime of evoque templating... http://evoque.gizmojo.org/
     
    mario, Jan 3, 2009
    #1
    1. Advertising

  2. mario

    vk Guest

    What do you mean by 'fail'?
    you have;

    :: self.codes = {}

    so

    :: try:
    :: return eval(self.codes[expr], self.globals, self.locals)

    will always return an exception the first time (if this is what you're
    referring to).
     
    vk, Jan 3, 2009
    #2
    1. Advertising

  3. On Fri, 02 Jan 2009 17:29:29 -0800, mario wrote:

    > Hi,
    >
    > below is the essence of a an expression evaluator, by means of a getitem
    > lookup. The expression codes are compiled and cached -- the lookup is
    > actually recursive, and the first time around it will always fail.
    >
    > import sys
    > class GetItemEval(object):
    >
    > def __init__(self):
    > self.globals = globals() # some dict (always the same)
    > self.locals = {} # some other dict (may be different between
    > evaluations)
    > self.codes = {} # compiled code expressions cache
    >
    > def __getitem__(self, expr):
    > try:
    > return eval(self.codes[expr], self.globals, self.locals)


    I was about to make a comment about this being a security hole, but I see
    from here

    http://evoque.gizmojo.org/usage/restricted/

    that you are aware of at least some of the issues.

    I must say though, your choice of builtins to prohibit seems rather
    arbitrary. What is dangerous about (e.g.) id() and isinstance()?



    > except:
    > # KeyError, NameError, AttributeError, SyntaxError,
    > # ValueError, TypeError, IOError


    If you want to capture seven exceptions, then capture seven exceptions,
    not any exception.

    You should write:

    except (KeyError, NameError, ..., IOError):

    instead of a bare except clause. That will capture exceptions that
    represent bugs in your code as well as exceptions that should propbably
    be allowed to propagate, such as KeyboardInterupt and SystemExit.


    > # Special case if a KeyError is coming from the self.codes[name]
    > # lookup (traceback should consist of a single frame only):
    > if sys.exc_info()[2].tb_next is None:
    > if sys.exc_info()[0] is KeyError:
    > self.codes[expr] = compile(expr, '<string>', 'eval')
    > return self[expr]
    > # otherwise handle eval error in some way...


    That seems awfully complicated for no good reason. If you want to handle
    KeyError differently, then capture KeyError:

    try:
    ...
    except KeyError:
    handle_keyerror()
    except: (NameError, ..., IOError):
    handle_everythingelse()



    It also seems significant slower:

    >>> import sys
    >>> def catch1():

    .... D = {}
    .... try:
    .... D['x']
    .... except KeyError:
    .... pass
    ....
    >>> def catch2():

    .... D = {}
    .... try:
    .... D['x']
    .... except:
    .... if sys.exc_info()[2].tb_next is None:
    .... if sys.exc_info()[0] is KeyError:
    .... pass
    ....
    >>> from timeit import Timer
    >>> t1 = Timer('catch1()', 'from __main__ import catch1')
    >>> t2 = Timer('catch2()', 'from __main__ import catch2, sys')
    >>> min(t1.repeat())

    4.8421750068664551
    >>> min(t2.repeat())

    6.2694659233093262



    --
    Steven
     
    Steven D'Aprano, Jan 3, 2009
    #3
  4. mario

    mario Guest

    On Jan 3, 7:16 am, Steven D'Aprano <st...@REMOVE-THIS-
    cybersource.com.au> wrote:

    > I was about to make a comment about this being a security hole,


    Strange that you say this, as you are also implying that *all* the
    widely-used templating systems for python are security holes... Well,
    you would be right to say that of course ;-) Infact, evoque is really
    one of the few (or even the only one?) that was conceived from the
    start to support restricted evaluation.

    > but I see from here
    >
    > http://evoque.gizmojo.org/usage/restricted/
    >
    > that you are aware of at least some of the issues.
    >
    > I must say though, your choice of builtins to prohibit seems rather
    > arbitrary. What is dangerous about (e.g.) id() and isinstance()?


    Preventive, probably. I also feel that temlates should have any
    business with info such as the memory addressed returnred by id(). For
    isinstance, becuase it is somewhat related to __subclasses__ that is
    known to be insecure.

    Incidentally, I updated the page you point to clarify what is going
    on. I also added a link to Brett Cannon's inspiration paper on the
    topic of securing the python interpreter...

    > >     except:
    > >         # KeyError, NameError, AttributeError, SyntaxError,
    > >         # ValueError, TypeError, IOError

    >
    > If you want to capture seven exceptions, then capture seven exceptions,
    > not any exception.


    Absolutely not. I want to catch ALL evaluation exceptions... it would
    actually *be* a secuity hole to allow any exception to bubble. hey
    will however be handled appropriately as per the application policy/
    config/deployment.

    > You should write:
    >
    >     except (KeyError, NameError, ..., IOError):
    >
    > instead of a bare except clause. That will capture exceptions that
    > represent bugs in your code as well as exceptions that should propbably
    > be allowed to propagate, such as KeyboardInterupt and SystemExit.


    Again, no. Template presentational logic has no business issuing
    SystemExits or so. And, of course, there are no bugs in my code ;-)

    > >         # Special case if a KeyError is coming from the self.codes[name]
    > >         # lookup (traceback should consist of a single frame only):
    > >         if sys.exc_info()[2].tb_next is None:
    > >             if sys.exc_info()[0] is KeyError:
    > >                 self.codes[expr] = compile(expr, '<string>', 'eval')
    > >                     return self[expr]


    > That seems awfully complicated for no good reason.


    Yes, you are probably right. I wanted to be precise in that if the
    KeyError originates strictly from the codes lookup and not from the
    actual eval of the expr itself -- in which case the expr should be
    compiled and added to codes (yes, this is the "first-time failure" I
    referred to in the first message).

    I tested the performance of your 2 variations in context, and there
    seems to be no noticeable performance gain (something like less than
    1% gain). But note the two variations as you code them do not quite do
    exactly the same test.

    I have adjusted to use this code now:

    def __getitem__(self, expr):
    try:
    return eval(self.codes[expr], self.globals, self.locals)
    except:
    # We want to catch **all** evaluation errors!
    # KeyError, NameError, AttributeError, SyntaxError, V
    # alueError, TypeError, IOError, ...
    #
    # Special case:
    # if KeyError is coming from self.codes[expr] lookup,
    # then we add the compiledentry and try again:
    if not name in self.codes:
    self.codes[expr] = compile(name, '<string>', 'eval')
    return self[expr]
    # handle any other error...

    This retains the correctness of the check, and has the same marginal
    perf improvement (that is basically negligible, but at least it is not
    slower) and has teh advantage that the code is clearer.

    Thanks for the remarks!

    mario

    > --
    > Steven
     
    mario, Jan 3, 2009
    #4
  5. mario

    mario Guest

    correction: the code posted in previous message should have been:


    def __getitem__(self, expr):
    try:
    return eval(self.codes[expr], self.globals, self.locals)
    except:
    # We want to catch **all** evaluation errors!
    # KeyError, NameError, AttributeError, SyntaxError, V
    # alueError, TypeError, IOError, ...
    #
    # Special case:
    # if KeyError is coming from self.codes[expr] lookup,
    # then we add the compiledentry and try again:
    if not expr in self.codes:
    self.codes[expr] = compile(expr, '<string>', 'eval')
    return self[expr]
    # handle any other error...
     
    mario, Jan 3, 2009
    #5
  6. On Sat, 03 Jan 2009 04:14:14 -0800, mario wrote:

    > On Jan 3, 7:16 am, Steven D'Aprano <st...@REMOVE-THIS-
    > cybersource.com.au> wrote:

    [...]
    >> I must say though, your choice of builtins to prohibit seems rather
    >> arbitrary. What is dangerous about (e.g.) id() and isinstance()?

    >
    > Preventive, probably. I also feel that temlates should have any business
    > with info such as the memory addressed returnred by id(). For
    > isinstance, becuase it is somewhat related to __subclasses__ that is
    > known to be insecure.


    It's that "probably" that worries me.

    In my earlier post, I was going to say "your choice of builtins to
    prohibit seems rather Cargo Cult-ish", but I decided that might be a bit
    rude, and I decided to give you the benefit of the doubt. But your answer
    here doesn't really give me any more confidence.

    What security issues do you think you are preventing by prohibiting id()?
    It is true that the id returned is a memory address in CPython, but
    that's a mere implementation detail, and may not be true in other
    implementations. Even if it is a memory address, so what? You can't do
    anything with it except treat it as an integer. I don't see how id()
    decreases security one iota.

    In the absence of even a theoretical threat, banning a function that
    returns an int because the int is derived from a memory address (but
    can't be used as one!) makes you appear to be engaging in Cargo Cult
    programming: following the ritual, without any understanding of what you
    are doing.

    http://catb.org/~esr/jargon/html/C/cargo-cult-programming.html

    Sorry to be so harsh, especially over such a tiny little thing as the
    id() function, but that's the impression it gives me, and I'm probably
    not the only one. It is true that a false positive (needlessly banning a
    harmless function) has less serious consequences than a false negative
    (failing to ban a harmful function), but it does reduce confidence that
    you know what you're doing.

    Likewise for isinstance() and issubclass(). They may be related to
    __subclasses__, but they don't give the caller access to __subclassess__.
    So what is the actual threat you are defending against?

    I'm not saying that there must be an actual in-the-wild demonstrated
    vulnerability before you prohibit a built-in, but there should be at
    least a plausible attack vector.



    > Incidentally, I updated the page you point to clarify what is going on.


    Speaking of that page, you have an obvious typo ("teh" instead of "the")
    that should be fixed:

    "In addition to teh above, all subclasses of BaseException..."

    http://evoque.gizmojo.org/usage/restricted/




    > I also added a link to Brett Cannon's inspiration paper on the topic of
    > securing the python interpreter...
    >
    >> >     except:
    >> >         # KeyError, NameError, AttributeError, SyntaxError, #
    >> >         ValueError, TypeError, IOError

    >>
    >> If you want to capture seven exceptions, then capture seven exceptions,
    >> not any exception.

    >
    > Absolutely not. I want to catch ALL evaluation exceptions...


    I see. It wasn't clear that this was deliberate, I read it as if there
    was a fixed, finite list you wanted to catch, and nothing else. You
    should make sure this is clearly documented in your code.



    --
    Steven.
     
    Steven D'Aprano, Jan 4, 2009
    #6
  7. mario

    mario g Guest

    On Sun, Jan 4, 2009 at 6:46 PM, Tino Wildenhain <> wrote:
    > mario wrote:
    >>
    >> On Jan 3, 7:16 am, Steven D'Aprano <st...@REMOVE-THIS-
    >> cybersource.com.au> wrote:
    >>
    >>> I was about to make a comment about this being a security hole,

    >>
    >> Strange that you say this, as you are also implying that *all* the
    >> widely-used templating systems for python are security holes... Well,
    >> you would be right to say that of course ;-) Infact, evoque is really
    >> one of the few (or even the only one?) that was conceived from the
    >> start to support restricted evaluation.

    >
    > Thats is definitively not the case. There are at least 2 quite old
    > template systems on top of a quite good restricted environment.


    Interesting, which ones?

    I should have probably emphasized *conceived* above... that feature
    was there in the initial design and not added afterwards.

    > Cheers
    > Tino
     
    mario g, Jan 22, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Eric Newton
    Replies:
    3
    Views:
    9,422
    Brock Allen
    Apr 4, 2005
  2. DataBinder.Eval and Eval.

    , Jun 16, 2006, in forum: ASP .Net
    Replies:
    1
    Views:
    550
    Karl Seguin [MVP]
    Jun 16, 2006
  3. Alex van der Spek

    eval('07') works, eval('08') fails, why?

    Alex van der Spek, Jan 8, 2009, in forum: Python
    Replies:
    6
    Views:
    1,461
    Bruno Desthuilliers
    Jan 8, 2009
  4. Patrick Li
    Replies:
    4
    Views:
    126
    Thomas B.
    Sep 4, 2008
  5. Liang Wang
    Replies:
    8
    Views:
    134
    Ben Morrow
    Feb 2, 2008
Loading...

Share This Page