'_[1]' in .co_names using builtin compile() in Python 2.6

magnus.lycka · Nov 27, 2013

When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.

But when I have a list comprehension in the expression, I get a little surprise:

compile('[x*x for x in y]', '<string>', 'eval').co_names ('_[1]', 'y', 'x')

Click to expand...

Click to expand...

This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.

* Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?

* Is there perhaps a better way to achieve what I'm trying to do?

What I'm really after, is to check that python expressions embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution

So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:

# Find names not starting with ".", i.e a & b in "a.c + b"
abbr_expr = re.sub(r"\.\w+", "", expr)
names = compile(abbr_expr, '<string>', 'eval').co_names
# Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
names = [name for name in names if re.match(r'\w+$', name)]

for name in names:
if name not in allowed_names:
raise NameError('Name: %s not permitted in expression: %s' % (name, expr))

Ned Batchelder · Nov 27, 2013

When I run e.g. compile('sin(5) * cos(6)', '<string>', 'eval').co_names, I get ('sin', 'cos'), which is just what I expected.

But when I have a list comprehension in the expression, I get a little surprise:

compile('[x*x for x in y]', '<string>', 'eval').co_names ('_[1]', 'y', 'x')

Click to expand...

Click to expand...

This happens in Python 2.6.6 on Red Hat Linux, but not when I run Python 2.7.3 in Windows. Unfortunately I'm stuck with 2.6.

* Are there more surprises similar to this one that I can expect from compile(...).co_names? Is this "behaviour" documented somewhere?

That name is the name of the list being built by the comprehension,
which I found out by disassembling the code object to see the bytecodes:

co = compile("[x*x for x in y]", "<s>", "eval")
co.co_names ('_[1]', 'y', 'x')
import dis
dis.dis(co)

Click to expand...

Click to expand...

1 0 BUILD_LIST 0
3 DUP_TOP
4 STORE_NAME 0 (_[1])
7 LOAD_NAME 1 (y)
10 GET_ITER14 STORE_NAME 2 (x)
17 LOAD_NAME 0 (_[1])
20 LOAD_NAME 2 (x)
23 LOAD_NAME 2 (x)
26 BINARY_MULTIPLY
27 LIST_APPEND
28 JUMP_ABSOLUTE 11

31 DELETE_NAME 0 (_[1])

Click to expand...

34 RETURN_VALUE

The same list comprehension in 2.7 uses an unnamed list on the stack:

1 0 BUILD_LIST 0
3 LOAD_NAME 0 (y)
6 GET_ITER10 STORE_NAME 1 (x)
13 LOAD_NAME 1 (x)
16 LOAD_NAME 1 (x)
19 BINARY_MULTIPLY
20 LIST_APPEND 2
23 JUMP_ABSOLUTE 7
I don't know whether such facts are documented. They are deep
implementation details, and change from version to version, as you've seen.

* Is there perhaps a better way to achieve what I'm trying to do?

What I'm really after, is to check that python expressions embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution

I hope you aren't trying to prevent malice this way: you cannot examine
a piece of Python code to prove that it's safe to execute. For an
extreme example, see: Eval Really Is Dangerous:
http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html

In your environment it looks like you have a whitelist of identifiers,
so you're probably ok.

So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:

# Find names not starting with ".", i.e a & b in "a.c + b"
abbr_expr = re.sub(r"\.\w+", "", expr)
names = compile(abbr_expr, '<string>', 'eval').co_names
# Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
names = [name for name in names if re.match(r'\w+$', name)]

for name in names:
if name not in allowed_names:
raise NameError('Name: %s not permitted in expression: %s' % (name, expr))

I don't know of a better way to determine the real names in the
expression. I doubt Python will insert a valid name into the namespace,
since it doesn't want to step on real user names. The simplest way to
do that is to autogenerate invalid names, like "_[1]" (I wonder why it
isn't "_[0]"?)

--Ned.

Ian Kelly · Nov 27, 2013

So, in the case of "a.b + x" I'm really just interested in a and x, not b. So the (almost) whole story is that I do:

# Find names not starting with ".", i.e a & b in "a.c + b"
abbr_expr = re.sub(r"\.\w+", "", expr)
names = compile(abbr_expr, '<string>', 'eval').co_names
# Python 2.6 returns '_[1]' in co_names for list comprehension. Bug?
names = [name for name in names if re.match(r'\w+$', name)]

for name in names:
if name not in allowed_names:
raise NameError('Name: %s not permitted in expression: %s' % (name, expr))

Click to expand...

I don't know of a better way to determine the real names in the

expression. I doubt Python will insert a valid name into the namespace,
since it doesn't want to step on real user names. The simplest way to do
that is to autogenerate invalid names, like "_[1]" (I wonder why it isn't
"_[0]"?)

One possible alternative is to use the ast module to examine the parse tree
of the expression instead of the generated code object. Hard to say whether
that would be "better".

Chris Kaynor · Nov 27, 2013

* Is there perhaps a better way to achieve what I'm trying to do?

I hope you aren't trying to prevent malice this way: you cannot examine a
piece of Python code to prove that it's safe to execute. For an extreme
example, see: Eval Really Is Dangerous: http://nedbatchelder.com/blog/
201206/eval_really_is_dangerous.html

In your environment it looks like you have a whitelist of identifiers, so
you're probably ok.

I just tested the crash example from that link in Python 2.7.5 win64 and
the co_names from the compiled code is empty. Therefore, a simple whitelist
would not catch that problematic code (and likely any other global access
done correctly). Even a simple test of making sure that at least one (or
any number of) valid identifier exists would be insufficent, as you can
merely tack on a ",a" to add "a" to the co_names, and thus for any other
variable.

Basically, even with a pure whitelist, there is likely no possible way to
make eval/exec safe, unless you also eliminate the ability to make literals.

Chris

Ned Batchelder · Nov 27, 2013

On Wed, Nov 27, 2013 at 12:09 PM, Ned Batchelder <[email protected]

* Is there perhaps a better way to achieve what I'm trying to do?

What I'm really after, is to check that python expressions
embedded in text files are:
- well behaved (no syntax errors etc)
- don't accidentally access anything it shouldn't
- I serve them with the values they need on execution

I hope you aren't trying to prevent malice this way: you cannot
examine a piece of Python code to prove that it's safe to execute.
For an extreme example, see: Eval Really Is Dangerous:
http://nedbatchelder.com/blog/__201206/eval_really_is___dangerous.html
<http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html>

In your environment it looks like you have a whitelist of
identifiers, so you're probably ok.

I just tested the crash example from that link in Python 2.7.5 win64 and
the co_names from the compiled code is empty. Therefore, a simple
whitelist would not catch that problematic code (and likely any other
global access done correctly). Even a simple test of making sure that at
least one (or any number of) valid identifier exists would be
insufficent, as you can merely tack on a ",a" to add "a" to the
co_names, and thus for any other variable.

Ah, right you are! I neglected to go back and examine the dangerous
code. So eval really is dangerous!

--Ned.

Steven D'Aprano · Nov 27, 2013

What I'm really after, is to check that python expressions embedded in
text files are: - well behaved (no syntax errors etc) - don't
accidentally access anything it shouldn't - I serve them with the values
they need on execution

If you are trying to get safe execution of untrusted code in Python, you
should read this recent thread from the Python core developers:

https://mail.python.org/pipermail/python-dev/2013-November/130132.html

Probably the only way to securely sandbox untrusted Python code is to use
operating system level security (such as a chroot jail) or an
implementation such as PyPy which has been designed from the beginning to
be sandboxed -- and even that may simply mean that nobody has broken out
of PyPy's sandbox *yet*.

Looking back at your example:

compile('sin(5) * cos(6)', '<string>', 'eval').co_names

I'm not sure I understand why you inspect the co_names. What does that
give you? You can tell that there are no syntax errors just by compiling
it, if there are syntax errors it will raise SyntaxError.

I would pre-process the string before compiling and disallow *anything*
containing "eval", "exec", or underscore. I'd also apply a limit to the
total length of the string. That doesn't necessarily rule out a hostile
user running arbitrary code, but it does make it harder.

Also, when you execute the compiled code, don't do this:

eval(code) # No!

Instead, provide an explicit globals and locals namespace:

safe_ish_namespace = {'__builtins__': None}
eval(code, safe_ish_namespace)

Again, this increases the barrier to somebody hacking out of your sandbox
without ruling it out altogether.

Good luck!

magnus.lycka · Nov 28, 2013

I hope you aren't trying to prevent malice this way: you cannot examine
a piece of Python code to prove that it's safe to execute.

No worry. Whoever has access to modifying those configuration files
can cause a mess in all sorts of other ways, such as writing and running
arbitrary programs.

I just want to give reasonably rapid feedback when people make mistakes.

As with all python code, it's very important to test properly, but the
top level names are often defined elsewhere in the configuration, so I
want to catch those errors ASAP.

Docplex package in python	0	Nov 8, 2022
code object differences between 2.7 and 3.3a	0	Aug 11, 2011
In Python 2.6, bytes is str	0	Oct 6, 2008
Search nested folders with specific names in python	0	Sep 23, 2022
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
python-2.6	3	Oct 2, 2008
Released: Python 2.6.9 release candidate 1	0	Oct 1, 2013
surprised by import in python 2.6	3	Dec 10, 2010

'_[1]' in .co_names using builtin compile() in Python 2.6

magnus.lycka

Ned Batchelder

Ian Kelly

Chris Kaynor

Ned Batchelder

Steven D'Aprano

magnus.lycka

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads