Embedding Python in Python

P

Phil Frost

You probably want something like this:

globalDict = {}
exec(stringOfPythonCodeFromUser, globalDict)

globalDict is now the global namespace of whatever was in
stringOfPythonCodeFromUser, so you can grab values from that and
selectivly import them into your namespace.
 
P

Paul Rubin

Robey Holderith said:
Anyone know a good way to embed python within python?
No.

I'd like to allow user-defined scriptable objects. I'd
like to give them access to modify pieces of my classes.
I'd like to disallow access to pretty much the rest of
the modules.

There was a feature called rexec/Bastion for that purposes in older
version of Python, but it was removed because it was insecure.
Any ideas/examples?

Run your sensitive stuff in a separate process (or separate computer)
and allow the hostile clients to communicate through sockets.
 
J

JCM

Paul Rubin said:
There was a feature called rexec/Bastion for that purposes in older
version of Python, but it was removed because it was insecure.
Run your sensitive stuff in a separate process (or separate computer)
and allow the hostile clients to communicate through sockets.

If you're concerned about security, another possibility is to parse
the user's code and look for anything potentially dangerous. You'll
need to be aggressive, but I believe it's possible. For example,
disallow exec statements, the identifier "eval", any identifier of
__this__ form, import statements, etc. This is overly restrictive,
but it will provide security.
 
R

Robey Holderith

Anyone know a good way to embed python within python?

Now before you tell me that's silly, let me explain
what I'd like to do.

I'd like to allow user-defined scriptable objects. I'd
like to give them access to modify pieces of my classes.
I'd like to disallow access to pretty much the rest of
the modules.

Any ideas/examples?

-Robey
 
P

Phil Frost

No. An easy way to escape that is to start one's code with
'del __builtins__', then python will add the default __builtins__ back
to the namespace. Restricting what arbitrary code can do has been
discussed many, many times, and it seems there is no way to do it short
of reimplementing a python interpretor.
 
P

Paul Rubin

JCM said:
If you're concerned about security, another possibility is to parse
the user's code and look for anything potentially dangerous. You'll
need to be aggressive, but I believe it's possible. For example,
disallow exec statements, the identifier "eval", any identifier of
__this__ form, import statements, etc. This is overly restrictive,
but it will provide security.

By the time you're done with all that, you may as well design a new
restricted language and interpret just that.

Hint:
e = vars()['__builtins__'].eval
print e('2+2')

Even Java keeps getting new holes found, and Python is not anywhere
near Java when it comes to this kind of thing.
 
J

JCM

By the time you're done with all that, you may as well design a new
restricted language and interpret just that.
Hint:
e = vars()['__builtins__'].eval
print e('2+2')
Even Java keeps getting new holes found, and Python is not anywhere
near Java when it comes to this kind of thing.

I don't think it's as difficult as you think. Your snippet of code
would be rejected by the rules I suggested. You'd also want to
prohibit other builtins like compile, execfile, input, reload, vars,
etc.
 
R

Robey Holderith

You probably want something like this:

globalDict = {}
exec(stringOfPythonCodeFromUser, globalDict)

globalDict is now the global namespace of whatever was in
stringOfPythonCodeFromUser, so you can grab values from that and
selectivly import them into your namespace.

So using this (with a little additional reading) it looks like I
can do this:

globalDict = {'__builtins__': <my modules here>}
exec(<pythonCodeFromUser>, globalDict)

And that this will disallow both importing of new modules and direct
access to my namespace. It will however allow access to the

Would this be secure?

Paul, what's your take on this?

-Robey
 
J

Jack Diederich

By the time you're done with all that, you may as well design a new
restricted language and interpret just that.
Hint:
e = vars()['__builtins__'].eval
print e('2+2')
Even Java keeps getting new holes found, and Python is not anywhere
near Java when it comes to this kind of thing.

I don't think it's as difficult as you think. Your snippet of code
would be rejected by the rules I suggested. You'd also want to
prohibit other builtins like compile, execfile, input, reload, vars,
etc.
foo = "ev" + "al"
e = vars()['__builtins__'].__dict__[foo]
print e('2+2')

This is a job for the operating system and not python.
Google groups for rexec and Bastion if you want to read ten lenghty
discussions of why this is the OS's job.

-Jack
 
J

JCM

Jack Diederich said:
I don't think it's as difficult as you think. Your snippet of code
would be rejected by the rules I suggested. You'd also want to
prohibit other builtins like compile, execfile, input, reload, vars,
etc.
foo = "ev" + "al"
e = vars()['__builtins__'].__dict__[foo]
print e('2+2')

Also would be rejected by my original set of rules (can't use
__dict__). But I'd disallow vars too.
 
J

JCM

I'm going to have to agree with Paul on this one. I do not feel up to
the task of thinking of every possible variant of malicious code. There
are far too many ways of writing the exact same thing. I think it would
be much easier to write my own interpreter.

Well it certainly isn't easier to write your own interpreter if you're
talking about the effort you'd need to put into it. And I'm not
convinced it's that tricky to come up with a set of syntax rules to
decide whether a piece of code is simple/safe enough to run. It
basically comes down to disallowing certain statements and certain
identifiers. Of course you'll end up rejecting a lot of code that
isn't malicious.

If you're interested enough, I'll try to throw a safety-checker
together. You'd have to be pretty interested though (I'm lazy).
 
R

Robey Holderith

By the time you're done with all that, you may as well design a new
restricted language and interpret just that.
Hint:
e = vars()['__builtins__'].eval
print e('2+2')
Even Java keeps getting new holes found, and Python is not anywhere
near Java when it comes to this kind of thing.

I don't think it's as difficult as you think. Your snippet of code
would be rejected by the rules I suggested. You'd also want to
prohibit other builtins like compile, execfile, input, reload, vars,
etc.

I'm going to have to agree with Paul on this one. I do not feel up to
the task of thinking of every possible variant of malicious code. There
are far too many ways of writing the exact same thing. I think it would
be much easier to write my own interpreter.

-Robey
 
J

Jack Diederich

Jack Diederich said:
I don't think it's as difficult as you think. Your snippet of code
would be rejected by the rules I suggested. You'd also want to
prohibit other builtins like compile, execfile, input, reload, vars,
etc.
foo = "ev" + "al"
e = vars()['__builtins__'].__dict__[foo]
print e('2+2')

Also would be rejected by my original set of rules (can't use
__dict__). But I'd disallow vars too.

Google groups for this topic, it's been dead horse kicked.
You would have to eliminate getarr too and any C func that can
result in an infite loop.

Not-python's-job-ly,

-Jack
 
J

JCM

Jack Diederich said:
Jack Diederich said:
On Wed, Aug 18, 2004 at 07:44:47PM +0000, JCM wrote: ...
I don't think it's as difficult as you think. Your snippet of code
would be rejected by the rules I suggested. You'd also want to
prohibit other builtins like compile, execfile, input, reload, vars,
etc.

foo = "ev" + "al"
e = vars()['__builtins__'].__dict__[foo]
print e('2+2')

Also would be rejected by my original set of rules (can't use
__dict__). But I'd disallow vars too.
Google groups for this topic, it's been dead horse kicked.
You would have to eliminate getarr too and any C func that can
result in an infite loop.

Infinite loops (and other resource use) are a different story, not
addressed by source code inspection. I worked on a project which
needed to run untrusted code, and we dealt with the infinite-loop
situation by always running untrusted code on the main thread and
signalling it if it took too long to execute (this worked on unix--I
don't know what you'd do on Windows). I realize this could leave data
in a bad state. Infinite loops are harder to deal with.
 
R

Robey Holderith

No. An easy way to escape that is to start one's code with
'del __builtins__', then python will add the default __builtins__ back
to the namespace. Restricting what arbitrary code can do has been
discussed many, many times, and it seems there is no way to do it short
of reimplementing a python interpretor.

Out of curiosity I tried the following in 2.3.4


#------Begin Code

import random

globalDict = {'__builtins__':random}
localDict = {}
execfile("test2.py", globalDict, localDict)

print globalDict
print localDict

localDict['move']()

#------- End Code


Where test2.py looked like this:


#---------Begin Code

print __builtins__

try:
del __builtins__
print 'del worked'
except:
pass

try:
exec('del __builtins__')
print('exec del worked')
except:
pass

try:
import sys
print 'Import Worked'
except:
pass

try:
f = file('out.tmp','w')
f.write('asdfasdf')
f.close()
print 'File Access Worked'
except:
pass

seed()

def move():
print __builtins__

#------ End Code

I sure it has a crack in in somewhere, but it doesn't
seem to be del __builtins__ .

-Robey
 
R

Robey Holderith

I've found the crack in the armor. See additions below.

-Robey

Where test2.py looked like this:


#---------Begin Code

print __builtins__

try:
del __builtins__
print 'del worked'
except:
pass

try:
exec('del __builtins__')
print('exec del worked')
except:
pass

try:
import sys
print 'Import Worked'
except:
pass

try:
f = file('out.tmp','w')
f.write('asdfasdf')
f.close()
print 'File Access Worked'
except:
pass

seed()

def move():
#Add the following for a nice security hole
global __builtins__
del __builtins__
 
P

Paul Rubin

JCM said:
need to be aggressive, but I believe it's possible. For example,
disallow exec statements, the identifier "eval", any identifier of
__this__ form, import statements, etc. This is overly restrictive,
but it will provide security.
Hint:
e = vars()['__builtins__'].eval
print e('2+2')

I don't think it's as difficult as you think. Your snippet of code
would be rejected by the rules I suggested. You'd also want to
prohibit other builtins like compile, execfile, input, reload, vars, etc.

I don't see how. Your rules were to disallow:

1) exec statements. My example doesn't use it.

2) eval identifier. My example uses eval as an attribute and not an
identifier. You can eliminate the use of eval as an attribute with
e = getattr(vars()('__builtins__'), 'ev'+'al').
Now not even the string 'eval' appears in one piece.
3) identifiers like __this__. My example doesn't use any. It
uses a constant string of that form, not an identifier. The
string could be computed instead, like the eval example above.
4) import statements. My example doesn't use them.

Conclusion, my example gets past your suggested rules. I also didn't
use compile, execfile, input, or reload. I did use vars but there are
probably other ways to do the same thing. You can't take something
full of holes and start plugging holes until you think you found them
all. You have to start with something that has no holes. The Python
crowd has been through this many times already; do some searches for
rexec/Bastion security.
 
R

Robey Holderith

Well it certainly isn't easier to write your own interpreter if you're
talking about the effort you'd need to put into it. And I'm not
convinced it's that tricky to come up with a set of syntax rules to
decide whether a piece of code is simple/safe enough to run. It
basically comes down to disallowing certain statements and certain
identifiers. Of course you'll end up rejecting a lot of code that
isn't malicious.

If you're interested enough, I'll try to throw a safety-checker
together. You'd have to be pretty interested though (I'm lazy).


Don't do it on my behalf. I started far too many projects doing something
similar before I realized that the only effective way to do security was
from the bottom up. The problem looks something like this (assuming each
function has 10 places where it is implemented.

Level | Malicious Variation Count
-----------------------------------------
0 | 10^0
1 | 10^1
2 | 10^2
x | 10^x

Suffice to say that in simple code... it is doable. In a
mature interpreter... near impossible.

-Robey
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top