limited python virtual machine (WAS: Another scripting language implementedinto Python itself?)

S

Steven Bethard

Fuzzyman said:
> Cameron Laird wrote:
> [snip..]
>
>>This is a serious issue.
>>
>>It's also one that brings Tcl, mentioned several
>>times in this thread, back into focus. Tcl presents
>>the notion of "safe interpreter", that is, a sub-
>>ordinate virtual machine which can interpret only
>>specific commands. It's a thrillingly powerful and
>>correct solution to the main problem Jeff and others
>>have described.
>
> A better (and of course *vastly* more powerful but unfortunately only
> a dream ;-) is a similarly limited python virutal machine.....

Yeah, I think there are a lot of people out there who would like
something like this, but it's not quite clear how to go about it. If
you search Google Groups, there are a lot of examples of how you can use
Python's object introspection to retrieve "unsafe" functions.

I wish there was a way to, say, exec something with no builtins and with
import disabled, so you would have to specify all the available
bindings, e.g.:

exec user_code in dict(ClassA=ClassA, ClassB=ClassB)

but I suspect that even this wouldn't really solve the problem, because
you can do things like:

py> class ClassA(object):
.... pass
....
py> object, = ClassA.__bases__
py> object
<type 'object'>
py> int = object.__subclasses__()[2]
py> int
<type 'int'>

so you can retrieve a lot of the builtins. I don't know how to retrieve
__import__ this way, but as soon as you figure that out, you can then
do pretty much anything you want to.

Steve
 
M

Michael Spencer

Steven said:
>
> I wish there was a way to, say, exec something with no builtins and
> with import disabled, so you would have to specify all the available
> bindings, e.g.:
>
> exec user_code in dict(ClassA=ClassA, ClassB=ClassB)
>
> but I suspect that even this wouldn't really solve the problem,
> because you can do things like:
>
> py> class ClassA(object):
> ... pass
> ...
> py> object, = ClassA.__bases__
> py> object
> <type 'object'>
> py> int = object.__subclasses__()[2]
> py> int
> <type 'int'>
>
> so you can retrieve a lot of the builtins. I don't know how to
> retrieve __import__ this way, but as soon as you figure that out, you
> can then do pretty much anything you want to.
>
> Steve

Steve

Safe eval recipe posted to cookbook:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/364469

Couldn't safe exec be programmed similarly?

'import' and 'from' are syntax, so trivially avoided

Likewise, function calls are easily intercepted

As you say, attribute access to core functions appears to present the challenge.
It is easy to intercept attribute access, harder to know what's safe. If there
were a known set of 'dangerous' objects e.g., sys, file, os etc... then these
could be checked by identity against any attribute returned

Of course, execution would be painfully slow, due to double - interpretation.

Michael
 
A

Alexander Schremmer

Yeah, I think there are a lot of people out there who would like
something like this, but it's not quite clear how to go about it. If
you search Google Groups, there are a lot of examples of how you can use
Python's object introspection to retrieve "unsafe" functions.

IMHO a safe Python would consist of a special mode that disallows all
systemcalls that could spy/harm data (IO etc.) and imports of
non-whitelisted modules. Additionally, a loop counter in the interpreter
loop would ensure that the code does not stall the process/machine.
could enter the safe mode and call the func.

I am not sure how big the patch would be, it is mainly a C macro at the
begginning of every relevant function that checks the current "mode" and
raises an exception if it is not correct. The import handler would need to
check if the module is whitelisted (based on the path etc.).

Python is too dynamic to get this working while just using tricks that
manipulate some builtins/globals etc.

Kind regards,
Alexander
 
A

Alexander Schremmer

could enter the safe mode and call the func.

This might be even enhanced like this:
allowed_domains=['file-IO', 'net-IO', 'devices', 'gui'],
allowed_modules=['_sre'])

Every access to objects that are not in the specified domains are
restricted by the interpreter. Additionally, external modules (which are
expected to be not "decorated" by those security checks) have to be in the
modules whitelist to work flawlessy (i.e. not generate exceptions).

Any comments about this from someone who already hacked CPython?

Kind regards,
Alexander
 
J

Jack Diederich

could enter the safe mode and call the func.

This might be even enhanced like this:
allowed_domains=['file-IO', 'net-IO', 'devices', 'gui'],
allowed_modules=['_sre'])

Any comments about this from someone who already hacked CPython?

Yes, this comes up every couple months and there is only one answer:
This is the job of the OS.
Java largely succeeds at doing sandboxy things because it was written that
way from the ground up (to behave both like a program interpreter and an OS).
Python the language was not, and the CPython interpreter definitely was not.

Search groups.google.com for previous discussions of this on c.l.py

-Jack
 
S

Steven Bethard

Jack said:
Yes, this comes up every couple months and there is only one answer:
This is the job of the OS.
Java largely succeeds at doing sandboxy things because it was written that
way from the ground up (to behave both like a program interpreter and an OS).
Python the language was not, and the CPython interpreter definitely was not.

Search groups.google.com for previous discussions of this on c.l.py

Could you give some useful queries? Every time I do this search, I get
a few results, but never anything that really goes into the security
holes in any depth. (They're ususally something like -- "look, given
object, I can get int" not "look, given object, I can get eval,
__import__, etc.)

Steve
 
A

aurora

It is really necessary to build a VM from the ground up that includes OS
ability? What about JavaScript?

sys.safecall(func, maxcycles=1000)
could enter the safe mode and call the func.

This might be even enhanced like this:
import sys
sys.safecall(func, maxcycles=1000,
allowed_domains=['file-IO', 'net-IO', 'devices',
'gui'],
allowed_modules=['_sre'])

Any comments about this from someone who already hacked CPython?

Yes, this comes up every couple months and there is only one answer:
This is the job of the OS.
Java largely succeeds at doing sandboxy things because it was written
that
way from the ground up (to behave both like a program interpreter and an
OS).
Python the language was not, and the CPython interpreter definitely was
not.

Search groups.google.com for previous discussions of this on c.l.py

-Jack
 
J

Jack Diederich

Could you give some useful queries? Every time I do this search, I get
a few results, but never anything that really goes into the security
holes in any depth. (They're ususally something like -- "look, given
object, I can get int" not "look, given object, I can get eval,
__import__, etc.)

A search on "rexec bastion" will give you most of the threads,
search on "rexec bastion diederich" to see the other times I tried to
stop the threads by reccomending reading the older ones *wink*.

Thread subjects:
Replacement for rexec/Bastion?
Creating a capabilities-based restricted execution system
Embedding Python in Python
killing thread ?

-Jack
 
J

Jack Diederich

On Tue, 25 Jan 2005 22:08:01 +0100, I wrote:

sys.safecall(func, maxcycles=1000)
could enter the safe mode and call the func.

This might be even enhanced like this:

import sys
sys.safecall(func, maxcycles=1000,
allowed_domains=['file-IO', 'net-IO', 'devices',
'gui'],
allowed_modules=['_sre'])

Any comments about this from someone who already hacked CPython?

Yes, this comes up every couple months and there is only one answer:
This is the job of the OS.
Java largely succeeds at doing sandboxy things because it was written
that
way from the ground up (to behave both like a program interpreter and an
OS).
Python the language was not, and the CPython interpreter definitely was
not.

Search groups.google.com for previous discussions of this on c.l.py
It is really necessary to build a VM from the ground up that includes OS
ability? What about JavaScript?

See the past threads I reccomend in another just-posted reply.

Common browser implementations of Javascript have almost no features, can't
import C-based libraries, and can easilly enter endless loops or eat all
available memory. You could make a fork of python that matches that feature
set, but I don't know why you would want to.

-Jack
 
S

Steven Bethard

Jack said:
A search on "rexec bastion" will give you most of the threads,
search on "rexec bastion diederich" to see the other times I tried to
stop the threads by reccomending reading the older ones *wink*.

Thread subjects:
Replacement for rexec/Bastion?
Creating a capabilities-based restricted execution system
Embedding Python in Python
killing thread ?

Thanks for the keywords -- I hadn't tried anything like any of these.
Unfortunately, they leave me with the same feeling as before... The
closest example that I saw that actually showed a security hole made use
of __builtins__. As you'll note from the beginning of this thread, I
was considering the case where no builtins are provided and imports are
disabled.

I also read a number of messages that had the same problems I do -- too
many threads just say "look at google groups", without saying what to
search for. They also often spend most of their time talking about
abstract problems, without showing code that illustrates how to break
the "security". For example, I never found anything close to describing
how to retrieve, say, 'eval' or '__import__' given only 'object'.

What would be really nice is a wiki that had examples of how to derive
"unsafe" functions from 'object'. I'd be glad to put one together, but
so far, I can't find many examples... If you want to consider reading
and writing of files as "unsafe", then I guess this might be one:
file = object.__subclasses__()[16]
If I could see how to go from 'object' (or 'int', 'str', 'file', etc.)
to 'eval' or '__import__', that would help out a lot...

Steve
 
D

Dieter Maurer

Steven Bethard said:
Fuzzyman wrote:
...

I already wrote about the "RestrictedPython" which is part of Zope,
didn't I?

Please search the archive to find a description...


Dieter
 
A

Alex Martelli

Steven Bethard said:
If I could see how to go from 'object' (or 'int', 'str', 'file', etc.)
to 'eval' or '__import__', that would help out a lot...
[<type 'type'>, <type 'weakref'>, <type 'int'>, <type 'basestring'>,
<type 'list'>, <type 'NoneType'>, <type 'NotImplementedType'>, <type
'module'>, <type 'zipimport.zipimporter'>, <type 'posix.stat_result'>,
<type 'posix.statvfs_result'>, <type 'dict'>, <type 'function'>, <class
'site._Printer'>, <class 'site._Helper'>, <type 'set'>, <type 'file'>]

Traipse through these, find one class that has an unbound method, get
that unbound method's func_globals, bingo.


Alex
 
N

Nick Coghlan

Alex said:
If I could see how to go from 'object' (or 'int', 'str', 'file', etc.)
to 'eval' or '__import__', that would help out a lot...

[<type 'type'>, <type 'weakref'>, <type 'int'>, <type 'basestring'>,
<type 'list'>, <type 'NoneType'>, <type 'NotImplementedType'>, <type
'module'>, <type 'zipimport.zipimporter'>, <type 'posix.stat_result'>,
<type 'posix.statvfs_result'>, <type 'dict'>, <type 'function'>, <class
'site._Printer'>, <class 'site._Helper'>, <type 'set'>, <type 'file'>]

Traipse through these, find one class that has an unbound method, get
that unbound method's func_globals, bingo.

So long as any Python modules are imported using the same restricted environment
their func_globals won't contain eval() or __import__ either.

And C methods don't have func_globals at all.

However, we're talking about building a custom interpreter here, so there's no
reason not to simply find the dangerous functions at the C-level and replace
their bodies with "PyErr_SetString(PyExc_Exception, "Access to this operation
not allowed in restricted build"); return NULL;".

Then it doesn't matter *how* you get hold of file(), it still won't work. (I can
hear the capabilities folks screaming already. . .)

Combine that with a pre-populated read-only sys.modules and a restricted custom
interpreter would be quite doable. Execute it in a separate process and things
should be fairly solid.

Cheers,
Nick.
 
S

Steven Bethard

Alex said:
If I could see how to go from 'object' (or 'int', 'str', 'file', etc.)
to 'eval' or '__import__', that would help out a lot...

[<type 'type'>, <type 'weakref'>, <type 'int'>, <type 'basestring'>,
<type 'list'>, <type 'NoneType'>, <type 'NotImplementedType'>, <type
'module'>, <type 'zipimport.zipimporter'>, <type 'posix.stat_result'>,
<type 'posix.statvfs_result'>, <type 'dict'>, <type 'function'>, <class
'site._Printer'>, <class 'site._Helper'>, <type 'set'>, <type 'file'>]

Traipse through these, find one class that has an unbound method, get
that unbound method's func_globals, bingo.

Thanks for the help! I'd played around with object.__subclasses__ for
a while, but I hadn't realized that func_globals was what I should be
looking for.

Here's one route to __builtins__:

py> string_Template = object.__subclasses__()[17]
py> builtins = string_Template.substitute.func_globals['__builtins__']
py> builtins['eval']
<built-in function eval>
py> builtins['__import__']
<built-in function __import__>

Steve
 
A

Alex Martelli

Nick Coghlan said:
So long as any Python modules are imported using the same restricted
environment their func_globals won't contain eval() or __import__ either.

Sure, as long as you don't need any standard library module using eval
from Python (or can suitably restrict them or the eval they use), etc,
you can patch up this specific vulnerability.
And C methods don't have func_globals at all.

Right, I used "unbound method" in the specific sense of "instance of
types.UnboundMethodType" (bound ones or any Python-coded function you
can get your paws on work just as well).
However, we're talking about building a custom interpreter here, so there's no

It didn't seem to me that Steven's question was so restricted; and since
he thanked me for my answer (which of course is probably inapplicable to
some custom interpreter that's not written yet) it appears to me that my
interpretation of his question was correct, and my answer useful to him.
reason not to simply find the dangerous functions at the C-level and replace
their bodies with "PyErr_SetString(PyExc_Exception, "Access to this operation
not allowed in restricted build"); return NULL;".

Then it doesn't matter *how* you get hold of file(), it still won't work.
(I can hear the capabilities folks screaming already. . .)

Completely removing Python-level access to anything dangerous might be a
safer approach than trying to patch one access route after another, yes.

Combine that with a pre-populated read-only sys.modules and a restricted
custom interpreter would be quite doable. Execute it in a separate process
and things should be fairly solid.

If you _can_ execute (whatever) in a separate process, then an approach
based on BSD's "jail" or equivalent features of other OS's may be able
to give you all you need, without needing other restrictions to be coded
in the interpreter (or whatever else you run in that process).


Alex
 
A

Aahz

Steven Bethard said:
If I could see how to go from 'object' (or 'int', 'str', 'file', etc.)
to 'eval' or '__import__', that would help out a lot...
[<type 'type'>, <type 'weakref'>, <type 'int'>, <type 'basestring'>,
<type 'list'>, <type 'NoneType'>, <type 'NotImplementedType'>, <type
'module'>, <type 'zipimport.zipimporter'>, <type 'posix.stat_result'>,
<type 'posix.statvfs_result'>, <type 'dict'>, <type 'function'>, <class
'site._Printer'>, <class 'site._Helper'>, <type 'set'>, <type 'file'>]

Traipse through these, find one class that has an unbound method, get
that unbound method's func_globals, bingo.

One thing my company has done is written a ``safe_eval()`` that uses a
regex to disable double-underscore access.
 
A

Alex Martelli

Aahz said:
...
One thing my company has done is written a ``safe_eval()`` that uses a
regex to disable double-underscore access.

will the regex catch getattr(object, 'subclasses'.join(['_'*2]*2)...?-)


Alex
 
S

Skip Montanaro

Alex> will the regex catch getattr(object,
Alex> 'subclasses'.join(['_'*2]*2)...?-)

Now he has two problems. ;-)

Skip
 
S

Stephen Thorne

Alex> will the regex catch getattr(object,
Alex> 'subclasses'.join(['_'*2]*2)...?-)

Now he has two problems. ;-)

I nearly asked that question, then I realised that 'getattr' is quite
easy to remove from the global namespace for the code in question, and
assumed that they had already thought of that.

Stephen.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top