Python Memory Manager

P

Pie Squared

I've been looking at the Python source code recently, more
specifically trying to figure out how it's garbage collector works.

I've gathered that it uses refcounting as well as some cycle-detection
algorithms, but I haven't been able to figure out some other things.

Does Python actually have a single 'heap' where all the data is
stored? Because PyObject_HEAD seemed to imply to me it was just a
linked list of objects, although perhaps I didnt understand this
correctly.

Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas. (Also, if you know any
resources for things like this, I'd be grateful for links/names)

Thanks in advance,

Pie Squared
 
P

Paul Rubin

Pie Squared said:
Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas.

As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.
It doesn't move objects in memory during GC so you can get
fragmentation. It's probably not feasible to change this in CPython
without extensive rewriting of CPython and maybe a lot of external C
modules.
(Also, if you know any
resources for things like this, I'd be grateful for links/names)

If you mean about GC in general, the old book by Jones and Lins is
still standard, I think.
 
P

Pie Squared

As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.
It doesn't move objects in memory during GC so you can get
fragmentation. It's probably not feasible to change this in CPython
without extensive rewriting of CPython and maybe a lot of external C
modules.


If you mean about GC in general, the old book by Jones and Lins is
still standard, I think.

Thanks for the quick reply!

That answered my question, and I'll check out the book you're
referring to - it's exactly what I need, I think. Again, thanks!
 
M

MartinRinehart

I researched this for some Java I wrote. Try to avoid shuffling
physical memory - you'll write a lot less code and it will be faster,
too.

Use an "allocated" list and an "available" list. Keep them in address
order. Inserting (moving list elements from insertion point to end)
and deleting (vice-versa) are near-zero cost operations on Intel
boxes. ( Two millis to move a million ints at 1GHz 5 years ago when I
wrote http://www.martinrinehart.com/articles/repz.html - probably half
that today.)

The worst choice is the "best fit" allocation algorithm. (Grabbing
most of a free bit leaves a probably useless small bit. Grab from the
first big piece you find.) Circular first-fit is probably best.
(Testing was for compiler-type applications.)

When space is freed, insert link in ordered chain of free space
blocks. Then combine with prev/next blocks if they are free.

And when you find the function in Python that matches Java's
System.arraycopy(), please tell me what it's called. I'm sure there
must be one.
 
S

Steve Holden

Paul said:
As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.
It doesn't move objects in memory during GC so you can get
fragmentation. It's probably not feasible to change this in CPython
without extensive rewriting of CPython and maybe a lot of external C
modules.


If you mean about GC in general, the old book by Jones and Lins is
still standard, I think.

You also need to be aware that there are certain allocation classes
which use arenas, areas of storage dedicated to specific object types.
When those objects are destroyed their space is returned to the arena,
but the arena is either never returned to the free pool or only returned
to it when the last allocated item is collected.

There seem to be more of those kind of tricks coming up in 2.6 and 3.0.

regards
Steve
 
P

Pie Squared

I researched this for some Java I wrote. Try to avoid shuffling
physical memory - you'll write a lot less code and it will be faster,
too.

Use an "allocated" list and an "available" list. Keep them in address
order. Inserting (moving list elements from insertion point to end)
and deleting (vice-versa) are near-zero cost operations on Intel
boxes. ( Two millis to move a million ints at 1GHz 5 years ago when I
wrotehttp://www.martinrinehart.com/articles/repz.html- probably half
that today.)

It seems to me that another, perhaps better strategy, would be to
allocate a large heap space, then store a pointer to the base of the
heap, the current heap size, and the beginning of the free memory.
When you need to 'allocate' more room, just return a pointer to some
location in the heap and increment the start-of-free-memory pointer.
That way, allocation really IS free, more or less. Wouldn't that be
more efficient? Perhaps I'm missing something.

As a side note, I'm new to Usenet, so I'm not exactly sure... are
'tangents' like this - since this IS a Python newsgroup, after all -
okay?

Anyway, thanks for the suggestion.
 
C

Christian Heimes

Pie said:
I've been looking at the Python source code recently, more
specifically trying to figure out how it's garbage collector works.

I've gathered that it uses refcounting as well as some cycle-detection
algorithms, but I haven't been able to figure out some other things.

Python uses ref counting for all objects and an additional GC for
container objects like lists and dicts. The cyclic GC depends on ref
counting.
Does Python actually have a single 'heap' where all the data is
stored? Because PyObject_HEAD seemed to imply to me it was just a
linked list of objects, although perhaps I didnt understand this
correctly.

In release builds PyObject_HEAD only contains the ref count and a link
to the object type. In Py_DEBUG builds it also contains a double linked
list of all allocated objects to debug reference counting bugs.

Python uses its own optimized memory allocator. Be sure you have read
Object/obmalloc.c!
Also, if it does, how does it deal with memory segmentation? This
question bothers me because I've been trying to implement a moving
garbage collector, and am not sure how to deal with updating all
program pointers to objects on the heap, and thought perhaps an answer
to this question would give me some ideas. (Also, if you know any
resources for things like this, I'd be grateful for links/names)

I don't think it's possible to implement a moving GC. You'd have to
chance some fundamental parts of Python.

Christian
 
P

Paul Rubin

Pie Squared said:
It seems to me that another, perhaps better strategy, would be to
allocate a large heap space, then store a pointer to the base of the
heap, the current heap size, and the beginning of the free memory.
When you need to 'allocate' more room, just return a pointer to some
location in the heap and increment the start-of-free-memory pointer.
That way, allocation really IS free, more or less. Wouldn't that be
more efficient? Perhaps I'm missing something.

The problem here is with a high allocation rate, you have to GC a lot
more often, which typically involves copying live data. So now you
have to figure out how to reduce the amount of copying using
generational schemes, (these days) come up with ways to make all this
work in the presence of parallel threads, etc. It sounds like you're
new to the subject, so for now I'll summarize the situation by saying
Python uses a fairly simple scheme that's a reasonable match for its
implementation as a small, medium performance interpreter; however,
higher performance GC'd language implementations end up doing much
more complicated things. See the Jones and Lins book that I
mentioned, and there's an even older one by Appel, and the Wikipedia
article on garbage collection may have some links you can look at.
Also, Structure and Interpretation of Computer Programs (full text
online) includes a simple implementation of a copying GC, that might
be more tutorial.
As a side note, I'm new to Usenet, so I'm not exactly sure... are
'tangents' like this - since this IS a Python newsgroup, after all - okay?

It's reasonably on topic, compared with last week's long digression
about the dimensional analysis of the Kessel Run.
 
T

thebjorn

It seems to me that another, perhaps better strategy, would be to
allocate a large heap space, then store a pointer to the base of the
heap, the current heap size, and the beginning of the free memory.
When you need to 'allocate' more room, just return a pointer to some
location in the heap and increment the start-of-free-memory pointer.
That way, allocation really IS free, more or less. Wouldn't that be
more efficient? Perhaps I'm missing something.
Deallocation?

As a side note, I'm new to Usenet, so I'm not exactly sure... are
'tangents' like this - since this IS a Python newsgroup, after all -
okay?

It varies depending on the group, c.l.py is pretty tolerant as long as
it's interesting ;-)

To bring it back to Python, I was under the impression that the GC was
a generational collector and not a simple mark-sweep, but it's been a
while since I read about it...

-- bjorn
 
M

Martin v. Löwis

Also, if it does, how does it deal with memory segmentation? This
As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.

That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
scheme that allows incremental collection without write barriers. This
particular scheme heavily relies on refcounting itself (specifically,
an object is garbage in a certain generation when all references to
it come from the same generation).

As for the consequences of the scheme (i.e. no compaction), you are
right.

Regards,
Martin
 
P

Paul Rubin

Martin v. Löwis said:
That's not exactly true, i.e. it isn't mark-and-sweep, but some similar
scheme that allows incremental collection without write barriers. This
particular scheme heavily relies on refcounting itself (specifically,
an object is garbage in a certain generation when all references to
it come from the same generation).

Ah, thanks. I made another post which I guess is also somewhat wrong.
 
G

greg

Paul said:
As I understand it, Python primarily uses reference counting, with a
mark and sweep scheme for cycle breaking tacked on as an afterthought.

It's not mark-and-sweep, it's a cycle detector. It goes through
all allocated objects of certain types, and all objects reachable
from those objects, trying to find sets of objects whose reference
counts are all accounted for by references within the set. Then
it picks an arbitrary object in each set and clears all the
references from that object. This breaks the cycle, and allows all
the objects in it to be reclaimed by their reference counts
dropping to zero.

(There's some kind of generational scheme in there as well,
I think, but I don't know the details.)

In a sense, this is the opposite of mark-and sweep. A mark-and-
sweep collector assumes that everything is garbage unless it
can be proven that it's not. Python's cycle detector, on the other
hand, assumes that nothing is garbage unless it can be proven
that it is.

An advantage of this refcount/cycle detection scheme is that
there is no "root set" that must be maintained at all costs.
This makes it fairly easy to interface Python with external
libraries that have their own notions of how to manage memory.
It doesn't move objects in memory during GC so you can get
fragmentation.

It never moves PyObject structs, but variable-sized objects
such as lists have their contents stored in a separate block
of memory, that can be moved when its size changes.
 
G

greg

Christian said:
In release builds PyObject_HEAD only contains the ref count and a link
to the object type. In Py_DEBUG builds it also contains a double linked
list of all allocated objects to debug reference counting bugs.

There's also a doubly-linked list used by the cycle detector,
but it doesn't hold all objects, only those that need to
participate in cyclic GC. Objects such as numbers and strings,
which don't contain references to other objects, don't need
to be on this list, since they can never be part of a cycle.

Some other immutable types such as tuples also don't need to
be on it, even though they contain references, because it's
impossible to create a cycle consisting entirely of such
objects. There has to be at least one mutable object in the
cycle, and the GC will be able to find the cycle via that
object.
 
M

MartinRinehart

Paul said:
The problem here is with a high allocation rate, you have to GC a lot
more often, which typically involves copying live data.

This is last century's issue. Copying data, RAM to RAM, is nearly free
using the Intel architecture.

This short article, http://www.martinrinehart.com/articles/repz.html
explains why.

I'd use one int per clock as a rule of thumb for the current copy rate
in a single-core CPU.
 
S

Steve Holden

This is last century's issue. Copying data, RAM to RAM, is nearly free
using the Intel architecture.

This short article, http://www.martinrinehart.com/articles/repz.html
explains why.

I'd use one int per clock as a rule of thumb for the current copy rate
in a single-core CPU.

You have a strange idea of "nearly free", and I find your analysis a
little simplistic. You may take advantage of direct memory access, but
cycles are still consumed. I presume also (though here my knowledge of
present-day Intel and AMD architectures gets a little sketchy) that many
data cache lines could still be invalidated by these moves, and that
also has a significant effect on performance.

Extending an integer array from 100 to 150 items is a pretty puny
operation when you compare it with the amount of data that might need to
be moved during a compactifying garbage collection of a 20MB Python
program image.

Not only that, but all pointers to an object have to be updated when it
is relocated.

regards
Steve
 
R

rbossy

Quoting Steve Holden said:
[...]
Not only that, but all pointers to an object have to be updated when it
is relocated.

"Any problem in computer science can be solved by another level of indirection"
-- David John Wheeler

;-)

RB
 
J

Jeff Schwab

This is last century's issue. Copying data, RAM to RAM, is nearly free
using the Intel architecture.

What's "the Intel architecture?" Do you mean the x86_64 architecture
that was actually developed by AMD, or x86 for x > some number, or do
you actually mean IA64?
 
M

MartinRinehart

Jeff said:
What's "the Intel architecture?" Do you mean the x86_64 architecture
that was actually developed by AMD, or x86 for x > some number, or do
you actually mean IA64?

I mean chips of the family that goes back to the 8086 and 8088 chips,
chips that support the REPZ prefix to the MOVSW instruction.
 
P

Paul Rubin

I mean chips of the family that goes back to the 8086 and 8088 chips,
chips that support the REPZ prefix to the MOVSW instruction.

repz movsw is a pretty lame way to copy data on current x86's.
Use XMM instead.
 
M

MartinRinehart

Steve said:
You have a strange idea of "nearly free" ...

Extending an integer array from 100 to 150 items is a pretty puny
operation when you compare it with the amount of data that might need to
be moved during a compactifying garbage collection of a 20MB Python
program image.

20 MBs = 5 M 32-bit words = 1.25 millis to move half of them on a
2GHz
machine. Don't know how much a milli costs where you live.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top