Python memory handling

F

frederic.pica

Greets,

I've some troubles getting my memory freed by python, how can I force
it to release the memory ?
I've tried del and gc.collect() with no success.
Here is a code sample, parsing an XML file under linux python 2.4
(same problem with windows 2.5, tried with the first example) :
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
#Using http://www.pixelbeat.org/scripts/ps_mem.py to get memory
information
import cElementTree as ElementTree #meminfo: 2.3 Mb private, 1.6 Mb
shared
import gc #no memory change

et=ElementTree.parse('primary.xml') #meminfo: 34.6 Mb private, 1.6 Mb
shared
del et #no memory change
gc.collect() #no memory change

So how can I free the 32.3 Mb taken by ElementTree ??

The same problem here with a simple file.readlines()
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
del data #meminfo: 11.5 Mb private, 1.4 Mb shared
gc.collect() # no memory change

But works great with file.read() :
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
del data #meminfo: 1.1 Mb private, 1.4 Mb shared
gc.collect() # no memory change

So as I can see, python maintain a memory pool for lists.
In my first example, if I reparse the xml file, the memory doesn't
grow very much (0.1 Mb precisely)
So I think I'm right with the memory pool.

But is there a way to force python to release this memory ?!

Regards,
FP
 
M

Marc 'BlackJack' Rintsch

frederic.pica said:
So as I can see, python maintain a memory pool for lists.
In my first example, if I reparse the xml file, the memory doesn't
grow very much (0.1 Mb precisely)
So I think I'm right with the memory pool.

But is there a way to force python to release this memory ?!

AFAIK not. But why is this important as long as the memory consumption
doesn't grow constantly? The virtual memory management of the operating
system usually takes care that only actually used memory is in physical
RAM.

Ciao,
Marc 'BlackJack' Rintsch
 
F

frederic.pica

AFAIK not. But why is this important as long as the memory consumption
doesn't grow constantly? The virtual memory management of the operating
system usually takes care that only actually used memory is in physical
RAM.

Ciao,
Marc 'BlackJack' Rintsch

Because I'm an adept of small is beautiful, of course the OS will swap
the unused memory if needed.
If I daemonize this application I will have a constant 40 Mb used, not
yet free for others applications. If another application need this
memory, the OS will have to swap and loose time for the other
application... And I'm not sure that the system will swap first this
unused memory, it could also swap first another application... AFAIK.
And these 40 Mb are only for a 7 Mb xml file, what about parsing a big
one, like 50 Mb ?

I would have preferred to have the choice of manually freeing this
unused memory or setting manually the size of the memory pool

Regards,
FP
 
P

Paul Melis

Hello,

I've some troubles getting my memory freed by python, how can I force
it to release the memory ?
I've tried del and gc.collect() with no success.
[...]

The same problem here with a simple file.readlines()
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
del data #meminfo: 11.5 Mb private, 1.4 Mb shared
gc.collect() # no memory change

But works great with file.read() :
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
del data #meminfo: 1.1 Mb private, 1.4 Mb shared
gc.collect() # no memory change

So as I can see, python maintain a memory pool for lists.
In my first example, if I reparse the xml file, the memory doesn't
grow very much (0.1 Mb precisely)
So I think I'm right with the memory pool.

But is there a way to force python to release this memory ?!

This is from the 2.5 series release notes
(http://www.python.org/download/releases/2.5.1/NEWS.txt):

"[...]

- Patch #1123430: Python's small-object allocator now returns an arena to
the system ``free()`` when all memory within an arena becomes unused
again. Prior to Python 2.5, arenas (256KB chunks of memory) were never
freed. Some applications will see a drop in virtual memory size now,
especially long-running applications that, from time to time, temporarily
use a large number of small objects. Note that when Python returns an
arena to the platform C's ``free()``, there's no guarantee that the
platform C library will in turn return that memory to the operating
system.
The effect of the patch is to stop making that impossible, and in
tests it
appears to be effective at least on Microsoft C and gcc-based systems.
Thanks to Evan Jones for hard work and patience.

[...]"

So with 2.4 under linux (as you tested) you will indeed not always get
the used memory back, with respect to lots of small objects being
collected.

The difference therefore (I think) you see between doing an f.read() and
an f.readlines() is that the former reads in the whole file as one large
string object (i.e. not a small object), while the latter returns a list
of lines where each line is a python object.

I wonder how 2.5 would work out on linux in this situation for you.

Paul
 
F

frederic.pica

Hello,

I've some troubles getting my memory freed by python, how can I force
it to release the memory ?
I've tried del and gc.collect() with no success.
[...]



The same problem here with a simple file.readlines()
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.readlines() #meminfo: 12 Mb private, 1.4 Mb shared
del data #meminfo: 11.5 Mb private, 1.4 Mb shared
gc.collect() # no memory change
But works great with file.read() :
#Python interpreter memory usage : 1.1 Mb private, 1.4 Mb shared
import gc #no memory change
f=open('primary.xml') #no memory change
data=f.read() #meminfo: 7.3Mb private, 1.4 Mb shared
del data #meminfo: 1.1 Mb private, 1.4 Mb shared
gc.collect() # no memory change
So as I can see, python maintain a memory pool for lists.
In my first example, if I reparse the xml file, the memory doesn't
grow very much (0.1 Mb precisely)
So I think I'm right with the memory pool.
But is there a way to force python to release this memory ?!

This is from the 2.5 series release notes
(http://www.python.org/download/releases/2.5.1/NEWS.txt):

"[...]

- Patch #1123430: Python's small-object allocator now returns an arena to
the system ``free()`` when all memory within an arena becomes unused
again. Prior to Python 2.5, arenas (256KB chunks of memory) were never
freed. Some applications will see a drop in virtual memory size now,
especially long-running applications that, from time to time, temporarily
use a large number of small objects. Note that when Python returns an
arena to the platform C's ``free()``, there's no guarantee that the
platform C library will in turn return that memory to the operating
system.
The effect of the patch is to stop making that impossible, and in
tests it
appears to be effective at least on Microsoft C and gcc-based systems.
Thanks to Evan Jones for hard work and patience.

[...]"

So with 2.4 under linux (as you tested) you will indeed not always get
the used memory back, with respect to lots of small objects being
collected.

The difference therefore (I think) you see between doing an f.read() and
an f.readlines() is that the former reads in the whole file as one large
string object (i.e. not a small object), while the latter returns a list
of lines where each line is a python object.

I wonder how 2.5 would work out on linux in this situation for you.

Paul


Hello,

I will try later with python 2.5 under linux, but as far as I can see,
it's the same problem under my windows python 2.5
After reading this document :
http://evanjones.ca/memoryallocator/python-memory.pdf

I think it's because list or dictionnaries are used by the parser, and
python use an internal memory pool (not pymalloc) for them...

Regards,
FP
 
J

Josh Bloom

If the memory usage is that important to you, you could break this out
into 2 programs, one that starts the jobs when needed, the other that
does the processing and then quits.
As long as the python startup time isn't an issue for you.
 
F

frederic.pica

If the memory usage is that important to you, you could break this out
into 2 programs, one that starts the jobs when needed, the other that
does the processing and then quits.
As long as the python startup time isn't an issue for you.


Yes it's a solution, but I think it's not a good way, I did'nt want to
use bad hacks to bypass a python specific problem.
And the problem is everywhere, every python having to manage big
files.
I've tried xml.dom.minidom using a 66 Mb xml file => 675 Mb of memory
that will never be freed. But that time I've got many unreachable
object when running gc.collect()
Using the same file with cElementTree took me 217 Mb, with no
unreachable object.
For me it's not a good behavior, it's not a good way to let the system
swap this unused memory instead of freeing it.
I think it's a really good idea to have a memory pool for performance
reason, but why is there no 'free block' limit ?
Python is a really really good language that can do many things in a
clear, easier and performance way I think. It has always feet all my
needs. But I can't imagine there is no good solution for that problem,
by limiting the free block pool size or best, letting the user specify
this limit and even better, letting the user completely freeing it
(with also the limit manual specification)

Like:
import pool
pool.free()
pool.limit(size in megabytes)

Why not letting the user choosing that, why not giving the user more
flexibility ?
I will try later under linux with the latest stable python

Regards,
FP
 
C

Chris Mellon

Like:
import pool
pool.free()
pool.limit(size in megabytes)

Why not letting the user choosing that, why not giving the user more
flexibility ?
I will try later under linux with the latest stable python

Regards,
FP

The idea that memory allocated to a process but not being used is a
"cost" is really a fallacy, at least on modern virtual memory sytems.
It matters more for fully GCed languages, where the entire working set
needs to be scanned, but the Python GC is only for breaking refcounts
and doesn't need to scan the entire memory space.

There are some corner cases where it matters, and thats why it was
addressed for 2.5, but in general it's not something that you need to
worry about.
 
T

Thorsten Kampe

* (31 May 2007 06:15:18 -0700)
And I'm not sure that the system will swap first this
unused memory, it could also swap first another application... AFAIK.

Definitely not, this is the principal function of virtual memory in
every Operating System.
 
T

Thorsten Kampe

* Chris Mellon (Thu, 31 May 2007 12:10:07 -0500)
The idea that memory allocated to a process but not being used is a
"cost" is really a fallacy, at least on modern virtual memory sytems.
It matters more for fully GCed languages, where the entire working set
needs to be scanned, but the Python GC is only for breaking refcounts
and doesn't need to scan the entire memory space.

There are some corner cases where it matters, and thats why it was
addressed for 2.5, but in general it's not something that you need to
worry about.

If it's swapped to disk than this is a big concern. If your Python app
allocates 600 MB of RAM and does not use 550 MB after one minute and
this unused memory gets into the page file then the Operating System
has to allocate and write 550 MB onto your hard disk. Big deal.

Thorsten
 
K

Klaas

If it's swapped to disk than this is a big concern. If your Python app
allocates 600 MB of RAM and does not use 550 MB after one minute and
this unused memory gets into the page file then the Operating System
has to allocate and write 550 MB onto your hard disk. Big deal.

You have a long-running python process that allocates 550Mb of _small_
objects and then never again uses more than a tenth of that space?

This is an abstract corner case, and points more to a multi-process
design rather than a flaw in python.

The unbounded size of python's int/float freelists are slightly more
annoying problems, but nearly as trivial.

-Mike
 
M

Matthew Woodcraft

Josh Bloom said:
If the memory usage is that important to you, you could break this out
into 2 programs, one that starts the jobs when needed, the other that
does the processing and then quits.
As long as the python startup time isn't an issue for you.

And if python startup time is an issue, another possibility is to fork
before each job and do the work in the child.

-M-
 
C

Chris Mellon

* Chris Mellon (Thu, 31 May 2007 12:10:07 -0500)

If it's swapped to disk than this is a big concern. If your Python app
allocates 600 MB of RAM and does not use 550 MB after one minute and
this unused memory gets into the page file then the Operating System
has to allocate and write 550 MB onto your hard disk. Big deal.

It happens once, and only in page-sized increments. You'd have to have
unusual circumstances to even notice this "big deal", totally aside
from the unusual and rare conditions that would trigger it.
 
L

Leo Kislov

Hello,

I will try later with python 2.5 under linux, but as far as I can see,
it's the same problem under my windows python 2.5
After reading this document :http://evanjones.ca/memoryallocator/python-memory.pdf

I think it's because list or dictionnaries are used by the parser, and
python use an internal memory pool (not pymalloc) for them...

If I understand the document correctly you should be able to free
list
and dict caches if you create more than 80 new lists and dicts:

[list(), dict() for i in range(88)]

If it doesn't help that means 1) list&dict caches don't really work
like I think or 2) pymalloc cannot return memory because of
fragmentation and that is not simple to "fix".

-- Leo
 
A

Andrew MacIntyre

Using the same file with cElementTree took me 217 Mb, with no
unreachable object.
For me it's not a good behavior, it's not a good way to let the system
swap this unused memory instead of freeing it.
I think it's a really good idea to have a memory pool for performance
reason, but why is there no 'free block' limit ?
Python is a really really good language that can do many things in a
clear, easier and performance way I think. It has always feet all my
needs. But I can't imagine there is no good solution for that problem,
by limiting the free block pool size or best, letting the user specify
this limit and even better, letting the user completely freeing it
(with also the limit manual specification)

Like:
import pool
pool.free()
pool.limit(size in megabytes)

Why not letting the user choosing that, why not giving the user more
flexibility ?

Because its not easy, and its an unusual edge case that hasn't attracted
developer effort (the PyMalloc change for 2.5 was contributed by someone
who desperately needed it, not a core Python developer; it was also a
non-trivial effort to get right).

You should also appreciate something about PyMalloc: it only handles
allocation requests of 256 bytes or smaller, and this limitation is part
of PyMalloc's design.

If most of your allocations are >256 bytes, you're at the mercy of the
platform malloc and heap fragmentation can be a killer. This is probably
why the getlines() approach mentioned would appear to relinquish (most
of) the memory: the list was probably comprised mostly of PyMalloc
allocations.

I haven't checked, but cElementTree may internally not be using PyMalloc
anyway, as the package is stated to be usable back to Python 1.5 - long
before the current allocation management came into effect. In which
case, you're at the mercy of the platform malloc... The pure Python
ElementTree might play more your way, at a performance cost.

--
 
N

Nick Craig-Wood

Andrew MacIntyre said:
You should also appreciate something about PyMalloc: it only handles
allocation requests of 256 bytes or smaller, and this limitation is part
of PyMalloc's design.

If most of your allocations are >256 bytes, you're at the mercy of the
platform malloc

You can tweak this if you are using libc (eg under linux) at least :-

http://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html

Setting M_MMAP_THRESHOLD should result in blocks that are perfectly
free()able back to the OS if allocated with malloc(). By default this
is 128k I think so you can set it to 4k and it should help a lot.

Note that a mmap block is a minimum of 4k (under x86 - one OS page
anyway) so set this too small and you program will use a *lot* of
memory, but only temporarily ;-)

If PyMalloc stretched up to 4k and M_MMAP_THRESHOLD was set to 4k then
you'd have the perfect memory allocator...
 
?

=?iso-8859-1?B?RnLpZOlyaWMgUElDQQ==?=

Greets,

Sorry for my late answer, google groups lost my post...
First, thanks you for your explanations about memory handling in the
os and python.
I've tried with python 2.5 under linux :
For the parsing of a 66 Mb xml file with cElementTree :
When starting python : 2.1 Mb private memory used
import xml.etree.cElementTree as ElementTree #3.4 Mb used
et=ElementTree.parse('otherdata.xml') #218.6 Mb used
del et #43.3 Mb used
et=ElementTree.parse('otherdata.xml') #218.6 Mb used
del et #60.6 Mb used
et=ElementTree.parse('otherdata.xml') #218.6 Mb used
del et #54.1 Mb used
et=ElementTree.parse('otherdata.xml') #218.6 Mb used
del et #54.1 Mb used
et=ElementTree.parse('otherdata.xml') #218.6 Mb used
del et #54.1 Mb used

Why does I have a such erratic memory freeing ?
I've tried the same test many time with a new interpreter and I've got
43.3 Mb after the first free and 54.1 Mb after the others.
If there is a memory pool limit in list ans dict, why can't I goes
back to 43.3 or 54.1 Mb all the times ?

I've tried using readlines():
When starting python : 2.1 Mb private memory used
f=open('otherdata.xml') #2.2 Mb used
data=f.readlines() #113 Mb used
del data #2.7 Mb used
f.seek(0) #2.7 Mb used
data=f.readlines() #113 Mb used
del data #2.7 Mb used

That time I have a good memory handling (for my definition of memory
handling)

So is there a problem with cElementTree ?

I've done a last test with ElementTree :
When starting python : 2.1 Mb private memory used
import xml.etree.ElementTree as ElementTree #3.2 Mb used
et=ElementTree.parse('otherdata.xml') #211.4 Mb used (but very
slow :p)
del et #21.4 Mb used
et=ElementTree.parse('otherdata.xml') #211.4 Mb used
del et #29.8 Mb used

So why does I have such differences in memory freeing ? Only due to
fragmentation ?

Anyway, python 2.5 has a better memory handling than 2.4, but still
not perfect for me.
I think I've not really understood the problem with the use of malloc
(fragmentation,...)

Thanks for your help
Regards,
FP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,534
Members
45,007
Latest member
obedient dusk

Latest Threads

Top