Creating a capabilities-based restricted execution system

S

Serge Orlov

John Roth said:
Much of this thread has focused on "capabilities" and the use of
proxies to implement capabilities. AFIAC, that's not only putting
attention on mechanism before policy, but it's putting attention on
mechanism in the wrong place.

I'm not sure why it should be discussed here since Sean refered
to E in the first post (http://www.erights.org/), so I think he's
comfortable with the policy defined by E? I think he has
missed the part that implementation should help as much as
it can prevent leaking capabilities from one security domain to
another. I pointed to that already.
What I *haven't* seen in this thread is very much consideration of
what people want from a security implementation.

I think Sean is talking about his own implementation. I didn't
see anywhere he said he's going to write general implementation
for other people. He said what he wants from his implementation.
One problem I've been playing around with is: how would you
implement something functionally equivalent to the Unix/Linux
chroot() facility? The boundaries are that it should not require
coding changes to the application that is being restricted, and it
should allow any and all Python extension (not C language
extension) to operate as coded (at least as long as they don't
try to escape the jail!) Oh, yes. It has to work on Windows,
so it's not a legitimate response to say: "use chroot()."

I don't see any unsolvable problems. Could you be more specific
what is the problem? (besides time, money, need to support
alternative python implementation, etc...)

-- Serge.
 
S

Serge Orlov

Sean R. Lynch said:
Ok, I think you've pretty much convinced me here. My choices for
protected attributes were to either name them specially and only allow
those attribute accesses on the name "self" (which I treat specially),
or to make everything protected by default, pass all attribute access
through a checker function (which I was hoping to avoid), and check for
a special attribute to define which attributes are supposed to be
public. Do you think it's good enough to make all attributes protected
as opposed to private by default?

Are you talking about C++ like protected fields and methods? What if
untrusted code subclasses your proxy object?
I guess I'll just make attributes protected by default, and force the
programmer to go out of their way to make things public. Then I can use
the Zope/RestrictedPython technique of assuming everything is insecure
until proven otherwise, and only expose parts of the interface on
built-in types that have been audited.

Thinking about str.encode I conviced myself that global state shouldn't
be shared by different security domains so that means codecs.py and
__builtins__ must be imported into each security domain separately.
It's pretty easy to do with codecs.py since it's python code. But importing
__builtins__ more than once is pretty hard since it wasn't designed
for that.

-- Serge.
 
S

Sean R. Lynch

Serge said:
I'm not sure why it should be discussed here since Sean refered
to E in the first post (http://www.erights.org/), so I think he's
comfortable with the policy defined by E? I think he has
missed the part that implementation should help as much as
it can prevent leaking capabilities from one security domain to
another. I pointed to that already.

I am comfortable (so far) with the policy defined by E. However, I've
been learning more about that policy as I go, including the necessity of
helping the programmer prevent leaks, which I've started to implement by
making objects completely opaque by default and requiring that classes
list attributes that they want to make public. I have kept my
name-mangling scheme for private attributes. I'm working on making
classes opaque while still allowing code to call methods defined on
superclasses but only on self, not on other objects that happen to
inherit from the same superclass.
I think Sean is talking about his own implementation. I didn't
see anywhere he said he's going to write general implementation
for other people. He said what he wants from his implementation.

I would like my implementation to be as general as possible, but I'm
writing it for my own projects. All this talk of "breaking existing
code" and the like is not particularly relevant to me because, while I'd
like code to look as much like regular Python as possible, it's simply
not possible not to break existing code while helping the programmer
prevent leaks. Making objects opaque by default is going to break a hell
of a lot more code than not having a type() builtin, so I think people
can see why I'm not too concerned about leaving various builtins out.

This is an interesting problem, but not one I'm trying to solve here.
I'm modifying RestrictedPython to make it possible to use a pure
capabilities-based security model in an application server. The
application server must scale to tens of thousands of security domains,
and I see no reason why the security model can't or shouldn't be
language-based instead of OS-based. There's E for Java, why can't we
make something similar for Python? There is nothing particularly special
about Java that makes it more suitable for E than Python is. Both have
unforgeable references. I've already added object encapsulation. I'm
working on eliminating any static mutable state.

Ultimately, I'd like to have user-level threads, too. I'm considering
either using Stackless for this or doing some mangling of ASTs to make
it easier to use generators as coroutines. Unfortunately, I can't think
of a way for the compiler to tell that you're calling a coroutine from
within a coroutine and therefore needs to output "yield (locals,
resultvarname, func, args, kwargs)" instead of a regular function call
without using some special syntax. Actually, I don't even know if it's
possible to modify the locals dict of a running generator without
causing trouble.
 
S

Sean R. Lynch

Serge said:
Are you talking about C++ like protected fields and methods? What if
untrusted code subclasses your proxy object?

Hmmm. I was thinking you'd trust those you were allowing to subclass
your classes a bit more than you'd trust people to whom you'd only give
instances, but now that you mention it, you're right. I should make all
attributes fully private by default, requiring the progammer to declare
both protected and public attributes, and I should make attributes only
writable by the class on which they're declared. I guess I also need to
make it impossible to override any attribute unless it's declared OK to
do so.

I wonder if each of these things can be done with capabilities? A
reference to a class is basically the capability to subclass it. I could
create a concept of "slots" as well. This would require a change in
syntax, however; you'd be calling setter(obj, value) and getter(obj),
and this isn't really something I could cover up in the compiler. I
think I'll forget about this for now because E just uses Java's own
object encapsulation, so I guess I should just stick with creating
Java-like object encapsulation in Python.

I need to implement a callsuper() function as well, because I don't want
to be giving programmers access to unbound methods.
Thinking about str.encode I conviced myself that global state shouldn't
be shared by different security domains so that means codecs.py and
__builtins__ must be imported into each security domain separately.
It's pretty easy to do with codecs.py since it's python code. But importing
__builtins__ more than once is pretty hard since it wasn't designed
for that.

Global *mutable* state shouldn't be shared, AFAICT. I believing making
sure no mutable state is reachable through __builtins__ and having a new
globals dict for each security domain should be enough. Any modules that
are imported would need to be imported separately for each domain, which
should be possible with a modified __import__ builtin. I don't have any
intention of allowing import of unaudited C modules.
 
J

John Roth

Serge Orlov said:
I think Sean is talking about his own implementation. I didn't
see anywhere he said he's going to write general implementation
for other people. He said what he wants from his implementation.

I see that point, and now that it's been made explicit (I missed
it the first time around, sorry,) I'm ok with it.
I don't see any unsolvable problems. Could you be more specific
what is the problem? (besides time, money, need to support
alternative python implementation, etc...)

Well, I don't see any unsolvable problems either. The biggest
sticking point is that the Unices use hard links to create
a directory tree that has the necessary programs availible.
Windows does not have this capability, so an implementation
would have to build a virtual directory structure, intercept all
paths and map them to the virtual structure backwards and
forwards.

The reason I find it an interesting problem is that I can't see
any way to do it with the kind of "generic" facility that was
in the Python Restricted execution facility, at least without a
complete redesign of the file and directory functions and
classes in the os module. Without that, it would
require code in the C language implementation modules.
Right now the file and directory management modules are a
real mess.

John Roth
 
S

Serge Orlov

Sean R. Lynch said:
Global *mutable* state shouldn't be shared, AFAICT.

Right, I missed this simple rule. My mind is still confined by my recent
attempt to add security by only translating bytecode without any changes
to the interpreter.
I believing making
sure no mutable state is reachable through __builtins__

Are you going to create multiple __builtins__ or you're just going
to get rid of any global objects in __builtins__? The first lets you
handle str.encode the right way.
and having a new
globals dict for each security domain should be enough. Any modules that
are imported would need to be imported separately for each domain,

Can C modules be imported more than once in CPython?
which
should be possible with a modified __import__ builtin. I don't have any
intention of allowing import of unaudited C modules.

Agreed.

-- Serge.
 
S

Serge Orlov

John Roth said:
Well, I don't see any unsolvable problems either. The biggest
sticking point is that the Unices use hard links to create
a directory tree that has the necessary programs availible.
Windows does not have this capability, so an implementation
would have to build a virtual directory structure, intercept all
paths and map them to the virtual structure backwards and
forwards.

The reason I find it an interesting problem is that I can't see
any way to do it with the kind of "generic" facility that was
in the Python Restricted execution facility, at least without a
complete redesign of the file and directory functions and
classes in the os module. Without that, it would
require code in the C language implementation modules.
Right now the file and directory management modules are a
real mess.

Right, you can do it with a custom importer and wrapper
functions over all file and directory functions. But that's
a mess over a mess and any mess is *bad* for security.
The way out the mess is probably filepath object that
should consolidate all access to files and directories.
If you wanted to make a point that std library should
be designed with security in mind I agree with you.
One step in that direction is to design everything OO.
OO design plays nice with capabilities.

-- Serge.
 
J

John Roth

Serge Orlov said:
Right, you can do it with a custom importer and wrapper
functions over all file and directory functions. But that's
a mess over a mess and any mess is *bad* for security.
The way out the mess is probably filepath object that
should consolidate all access to files and directories.
If you wanted to make a point that std library should
be designed with security in mind I agree with you.
One step in that direction is to design everything OO.
OO design plays nice with capabilities.

-- Serge.

Sean Ross took a pass at this idea in the thread
"Finding File Size" starting on 1/1. That got renamed
to "Filename Type" somewhere fairly quick.

There's now a pre-pep http://tinyurl.com/2578q
for the notion, thanks to Gerrit Holl.

John Roth
 
S

Sean R. Lynch

Serge said:
Right, I missed this simple rule. My mind is still confined by my recent
attempt to add security by only translating bytecode without any changes
to the interpreter.

You were translating bytecode rather than working with ASTs? That would
be hard to maintain, considering that Zope found it too difficult to
maintain even manipulating concrete syntax trees. Also, I don't really
consider that I'm modifying the interpreter, I'm just giving the
interpreter a different globals dict.
Are you going to create multiple __builtins__ or you're just going
to get rid of any global objects in __builtins__? The first lets you
handle str.encode the right way.

I'm not sure what you mean by this. I'm creating a dict for
__builtins__, but AFAIK it's not possible for code to modify the
__builtins__ dict other than through the name __builtins__, which starts
with an underscore and so is invalid. All of the objects I have in
__builtins__ right now are immutable within the restricted environment
because they're either functions or classes.

Python modules that are imported in the restricted environment will be
read-only and each domain will get its own copy. This should prevent
leaks caused by two domains importing the same module and then
performing operations that affect the state of the module. Modules will
need to explicitly specify what names they want to export the same way
classes do in order to prevent inadvertent leaks.
Can C modules be imported more than once in CPython?

Not that I'm aware of, which is why they will need to be audited for
mutable state and other sources of leaks and excess privilege. C modules
that we need that have problems will get proxies the same way E has
proxies for Swing.
 
S

Sean R. Lynch

Ok, so how do I handle overriding methods?

Allowing code that inherits from a class to override any method it can
access could result in security problems if the parent doesn't expect a
given method to be overridden. Shadowing the overridden method could
result in unexpected behavior. And, of course, in some cases, we *want*
to allow the programmer to override methods, because this is what code
reuse is all about.

I already have a solution in mind for __init__: provide another method
that is always called on every superclass for any object that is created
from a class that inherits from that superclass. The method takes no
arguments and is called after __init__.

Another option is to simply not provide any protection from inheriting
classes. The more I think about it, the better this sounds. Classes
don't contain any data other than perhaps class data, and a programmer
shouldn't be allowing another programmer to subclass classes that
contain data they want to protect. I could put all attributes back to
being protected (rather than private) by default, and only have a single
extra declaration required in a class statement to declare what
attributes you want to make public.

Can anyone (Serge?) think of an example of a case where this might cause
leaks or privilege escalation? Remember that classes are opaque, so code
can't get at unbound methods. Subclassing a class doesn't give you any
special access to members of that class.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
SterlingLa
Top