Sandboxed Python: memory limits?

C

Chris Angelico

Is it possible, and if so is it easy, to limit the amount of memory an
embedded Python interpreter is allowed to allocate? I don't want to
ulimit/rlimit the process if I don't have to (or rather, I want the
process's limit to be high, and the Python limit much lower), but just
to force Python to throw MemoryError sooner than it otherwise would
(my code can then graciously deal with the exception).

Google turned up this thread:
http://stackoverflow.com/questions/1760025/limit-python-vm-memory

The answers given include resource.setrlimit (which presumably goes
straight back to the API, which will affect the whole process), and a
simple counter (invasive to the code). But I want something that I can
impose from the outside.

I have a vague memory of reading somewhere that it's possible to
replace the Python memory allocator. This would be an option, if
there's no simple way to say "your maximum is now 16MB", but I now
can't find it back. Was I hallucinating?

Hoping not to reinvent any wheels today!

Thanks!

Chris Angelico
 
M

Martin v. Loewis

I have a vague memory of reading somewhere that it's possible to
replace the Python memory allocator. This would be an option, if
there's no simple way to say "your maximum is now 16MB", but I now
can't find it back. Was I hallucinating?

You can adjust the implementations of PyMem_Malloc and PyObject_Malloc.
This would catch many allocations, but not all of them. If you adjust
PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations -
but extensions modules which directly call malloc() still would bypass
this accounting.

Regards,
Martin
 
C

Chris Angelico

You can adjust the implementations of PyMem_Malloc and PyObject_Malloc.
This would catch many allocations, but not all of them. If you adjust
PyMem_MALLOC instead of PyMem_Malloc, you catch even more allocations -
but extensions modules which directly call malloc() still would bypass
this accounting.

I'm not too concerned about extensions, here; in any case, I lock most
of them off. I just want to prevent stupid stuff like this:

a='a'
while True:
a+=a

from bringing the entire node to its knees. Obviously that will
eventually bomb with MemoryError, but I'd rather it be some time
*before* the poor computer starts thrashing virtual memory.

(Hmm. I tried the above code in Python 2.6.6 on my scratch box, with
3GB of memory, and it actually died with "OverflowError: strings are
too large to concat" at 1GB. Must be the 32-bit Python on there, heh.
But repeating the exercise in the same Python with a second variable
produces the expected MemoryError.)

If it's too difficult, I'll probably just tell my boss that we need
8GB of physical memory in these things, and then disable virtual
memory. That'll ensure that MemoryError happens before the hard disk
starts grinding performance into dust :)

Chris Angelico
 
C

Chris Angelico

I'm not too concerned about extensions, here; in any case, I lock most
of them off. I just want to prevent stupid stuff like this:

a='a'
while True:
   a+=a

from bringing the entire node to its knees. Obviously that will
eventually bomb with MemoryError, but I'd rather it be some time
*before* the poor computer starts thrashing virtual memory.

To clarify: One node will be hosting multiple clients' code, and if it
runs out of physical memory, performance for everyone else will be
severely impacted. So I'm hoping to restrict the script's ability to
consume all of memory, without (preferably) ulimit/rlimiting the
entire process (which does other things as well). But if it can't be,
it can't be.

Chris Angelico
 
M

Martin v. Loewis

Am 07.04.2011 02:06, schrieb Chris Angelico:
I'm not too concerned about extensions, here; in any case, I lock most
of them off. I just want to prevent stupid stuff like this:

a='a'
while True:
a+=a

That would certainly be caught by instrumenting PyObject_MALLOC. More
generally, I believe that if you instrument the functions I mentioned,
your use case is likely covered.

Regards,
Martin
 
M

Martin v. Loewis

Am 07.04.2011 02:06, schrieb Chris Angelico:
I'm not too concerned about extensions, here; in any case, I lock most
of them off. I just want to prevent stupid stuff like this:

a='a'
while True:
a+=a

That would certainly be caught by instrumenting PyObject_MALLOC. More
generally, I believe that if you instrument the functions I mentioned,
your use case is likely covered.

Regards,
Martin
 
D

David Bolen

Chris Angelico said:
So I'm hoping to restrict the script's ability to
consume all of memory, without (preferably) ulimit/rlimiting the
entire process (which does other things as well). But if it can't be,
it can't be.

Just wondering, but rather than spending the energy to cap Python's
allocations internally, could similar effort instead be directed at
separating the "other things" the same process is doing? How tightly
coupled is it? If you could split off just the piece you need to
limit into its own process, then you get all the OS tools at your
disposal to restrict the resources of that process.

Depending on what the "other" things are, it might not be too hard to
split apart, even if you have to utilize some IPC mechanism to
coordinate among the two pieces. Certainly might be of the same order
of magnitude of tweaking Python to limit memory internally.

-- David
 
C

Chris Angelico

Just wondering, but rather than spending the energy to cap Python's
allocations internally, could similar effort instead be directed at
separating the "other things" the same process is doing?  How tightly
coupled is it?  If you could split off just the piece you need to
limit into its own process, then you get all the OS tools at your
disposal to restrict the resources of that process.

Well, what happens is roughly this:

Process begins doing a lengthy operation.
Python is called upon to generate data to use in that.
C collects the data Python generated, reformats it, stores it in
database (on another machine).
C then proceeds to use the data, further manipulating it, lots of
processing that culminates in another thing going into the database.

The obvious way to split it would be to send it to the database twice,
separately, as described above (the current code optimizes it down to
a single INSERT at the bottom, keeping it in RAM until then). This
would work, but it seems like a fair amount of extra effort (including
extra load on our central database server) to achieve what I'd have
thought would be fairly simple.

I think it's going to be simplest to use a hardware solution - throw
heaps of RAM at the boxes and then just let them do what they like. We
already have measures to ensure that one client's code can't "be evil"
repeatedly in a loop, so I'll just not worry too much about this
check. (The project's already well past its deadlines - mainly not my
fault!, and if I tell my boss "We'd have to tinker with Python's
internals to do this", he's going to put the kybosh on it in two
seconds flat.)

Thanks for the information, all!

Chris Angelico
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top