Queue cleanup

Paul Rubin · Sep 4, 2010

Lawrence D'Oliveiro said:
That reinforces my point, about how easy it was to check the correctness of
the code. In this case one simple fix, like this ...
would render the code watertight. See how easy it is?

Well, no, it's irrelevant how easy it is to fix the issue after it's
pointed out. What matters is how easy it was to create it in the first
place. You posted a 30-line code sample as obviously free of memory
leaks, but even a good programmer like you didn't notice that it had the
potential for a nasty memory overwrite error after an unbalanced decref.
Keep in mind that a memory leak usually just means the program can
eventually bog down and stops working, but an overwrite error is often a
security hole that can lead to total compromise of your entire computer.
Now extrapolate that error rate from 30 lines to a program the size of
Firefox (something like 5 MLOC), and you should see how fraught with
danger that style of programming is. Even the most skilled and careful
programmers are going to slip up once in a while.

Part of the problem is C itself. No language can eliminate many
complicated bugs without creating severe usability problems, but good
languages (unlike C) can eliminate most "silly" bugs. I had a "dark
night of the soul" after reading some of the bug-finding papers at

http://www.stanford.edu/~engler/

and have been terrified of C ever since. I'm just always skeptical when
anyone says they're sure any piece of C code is obviously bug-free.
It's quite easy to get overconfident about it (as I've done myself more
than once). I spent about 5 minutes reviewing your patched code (and
the underlying implementations of the C API functions it calls) and
didn't see any other issues, and the code is probably ok now, but I'd
have to spend a lot more time tracing through the API layer before I
could be really sure.

Anyway, you should check your patch into github if you haven't.

John Nagle · Sep 4, 2010

You'll notice that I was very careful to qualify my statement with "as
most people think of it". Also, because CPython has two different memory
management mechanisms, refcounting and cycle detection, and the module
that controls cycle detection is called "gc", I think it's simpler to
follow along with the Python docs -- and critically important to remind
people that there are in fact two different systems.

Personally, I'd like to have reference counting only, an enforced
prohibition on loops (backpointers must be weak pointers), RAII,
and reliably ordered finalization.

A big advantage of reference counting is that finalization happens
in the thread that releases the object, and in the correct order.
GC and finalization/destructors do not play well together at all.
Microsoft once tried to get the hard cases to work right. See
"managed C++". Not a happy story.

John Nagle

Lawrence D'Oliveiro · Sep 5, 2010

MRAB said:
I thought it was just that if it wasn't explicitly forbidden then
someone might try to use it and then sue if something went wrong, even
though common sense would have said that it was a bad idea in the first
place!

But youâ€™ll notice that Free Software comes with no such restrictions. In
fact, it is contrary to commonly-accepted Free Software guidelines to impose
any sort of restrictions on areas of use.

Lawrence D'Oliveiro · Sep 5, 2010

Personally, I'd like to have reference counting only, an enforced
prohibition on loops (backpointers must be weak pointers), RAII,
and reliably ordered finalization.

Is there a cheap way of checking at runtime for circular structures?

A big advantage of reference counting is that finalization happens
in the thread that releases the object, and in the correct order.
GC and finalization/destructors do not play well together at all.
Microsoft once tried to get the hard cases to work right. See
"managed C++". Not a happy story.

Thank you for that. Another arrow for my anti-GC quiver.

Lawrence D'Oliveiro · Sep 5, 2010

A minimal naive implementation indeed doubles the memory requirements,
but from a Python perspective where every integer takes something like
24 bytes already, even that doesn't seem so terrible.

Doubling 24 is less terrible than doubling 4 or 8?? Youâ€™re kidding, right?

More sophisticated implementations use multiple small heaps or other
tricks.

More and more complications to patch up the idea. At some point, you have to
admit there is something fundamentally flawed about the whole concept.

The new heap is filled sequentially so accesses to it will have good
locality.

Unfortunately, thatâ€˜s not how locality of reference works. It doesnâ€™t matter
whether the objects being accessed are close together in memory or far apart
(not with modern fully-associative caches, anyway), what does matter is the
frequency distribution of references, namely that the vast majority of
references are to a tiny minority of objects.

Your generational garbage collector completely breaks this assumption, by
regularly forcing an access to every single object in the heap. Cache-
thrashing, anyone?

It's also the case that programs with very large memory consumption tend
to use most of the memory for large arrays that don't contain pointers
(think of a database server with a huge cache). That means the gc
doesn't really have to think about all that much of the memory.

But your generational garbage collector still has to copy those very large
objects to the new heap, with all the cache-hostile consequences therefrom.

By the way, isnâ€™t this the opposite of the array-of-pointers example you
were using earlier to try to cast reference-counting in a bad light? It
seems to me a reference count would work very well for such a large, simple
object.

Same thing with manual allocation. That moves the costs off the
computer and onto the programmer. Not good, most of the time.

Unfortunately, your argument falls down. It is a truism that hardware costs
continue to come down, while programmers remain expensive. As I said before,
computing performance has improved by something like five orders of
magnitude over the last half-century. This has rendered all kinds of
techniques, like high-level languages, dynamic memory allocation,
stacks, hardware floating-point, memory protection and so on, which were
once considered controversial because of their expense, cheap enough to
become commonplace.

But not garbage collection. This is because of the asymmetric way in which
hardware has become faster: the biggest improvements have been in the parts
that were already the fastest to begin with (the CPU), while RAM speeds have
improved much less, and backing-store speeds least of all. Hence the need
for intermediate layers of cache to bridge the gap. But the effectiveness of
that caching depends crucially on certain assumptions about the runtime
behaviour of the programs: and garbage collection breaks those assumptions.

Really, I'm no gc expert, but the stuff you're saying about gc is quite
ill-informed. You might want to check out some current literature.

You may want to enlighten yourself by meditating on this seeming paradox of
modern computing hardware: memory is cheap, but accessing memory is
expensive.

Paul Rubin · Sep 5, 2010

Lawrence D'Oliveiro said:
Doubling 24 is less terrible than doubling 4 or 8?? Youâ€™re kidding, right?

No, it would be doubling 4 or 8 bytes. The extra overhead like the
reference count would not be there to bloat up the integer like in
Python.

More and more complications to patch up the idea. At some point, you have to
admit there is something fundamentally flawed about the whole concept.

Oh sheesh, that's silly. Every area of programming where performance
matters is full of optimization tricks. Look at any serious database
implementation for example. Or any compiler. Look at Python's
implementation of dictionaries. Yeah, the optimizations add complexity
to improve performance, sometimes even in heuristic ways that can fail.
That doesn't mean the concepts are fundamentally flawed. GC is no
different.

what does matter is the frequency distribution of references,

Sorry, just I meant during the gc operation itself. The gc's access
pattern in the new heap is completely sequential as the gc just copies
stuff to it linearly from from the old heap, bumping a pointer upwards.
The access pattern when the user application is running is of course not
predictable.

Your generational garbage collector completely breaks this assumption, by
regularly forcing an access to every single object in the heap. Cache-
thrashing, anyone?

In the minor collections, the whole minor heap fits in cache, so there's
no thrashing. The major collections can use a different strategy, or
you can simply rely on their relative infrequency. Why do you speculate
like this? If you run a GHC program with profiling active, it tells you
exactly how much time is spent in minor gc and how much time is in major
gc, and it's all generally pretty tolerable unless your program has
bugs. (Unfortunately Haskell programs are notoriously susceptable to a
certain type of bug that causes them to still give the right answers,
but use much more memory than they should. The usual sign of that
happening is high gc load).

But your generational garbage collector still has to copy those very large
objects to the new heap, with all the cache-hostile consequences therefrom.

Not necessarily, depends on how you write the program and how the gc works.

By the way, isnâ€™t this the opposite of the array-of-pointers example you
were using earlier to try to cast reference-counting in a bad light?

I wasn't trying to cast reference counting in a bad light, I was
pointing out that reference counting can experience pauses just like
traditional gc approaches. Most programs including "soft" real time
programs can tolerate an occasional pause. If your program is not one
of those, and you need guaranteed upper bounds on pauses so you can't
use traditional gc, switching from gc to reference counting won't save
you.

It seems to me a reference count would work very well for such a
large, simple object.

Mark/sweep would do it too. Some gc's use a hybrid approach, with
mark/sweep for older or larger objects.

But not garbage collection. This is because of the asymmetric way in
which hardware has become faster:... the effectiveness of that
caching depends crucially on certain assumptions about the runtime
behaviour of the programs: and garbage collection breaks those
assumptions. ...
You may want to enlighten yourself by meditating on this seeming paradox of
modern computing hardware: memory is cheap, but accessing memory is
expensive.

I'm interested in receiving enlightment if you've got some pointers into
the research literature that back up your views. Right now it sounds
like you're going by some intuitions you have that aren't backed up by
evidence. Anyone who has performance-tuned a program knows that
intuition can give reasonable starting points for experiments and
measurements, but once the results start coming in, a lot of the
intuitions end up invalidated.

By now, there is enough experimental and theoretical literature about gc
that opinions like yours, that don't seem to be grounded in any
knowledge of that literature, are not very persuasive no matter how
superficially attractive the raw intuition might be. Especially in your
case, where you seem to have decided ahead of time what conclusion you
want to reach and are looking for ways to justify it.

John Nagle · Sep 5, 2010

Is there a cheap way of checking at runtime for circular structures?

It's an interesting technical problem to design a system where
circular references are detected immediately, at the moment
of creation. However, Python already detects loops during
garbage collection. If you set

gc.set_debug(gc.DEBUG_SAVEALL)

all the loops show up in "gc.garbage".

Thank you for that. Another arrow for my anti-GC quiver.

Unoptimized reference counting, which is what CPython does, isn't
all that great either. The four big bottlenecks in Python are boxed
numbers, attribute lookups, reference count updates, and the GIL.

John Nagle

Paul Rubin · Sep 5, 2010

John Nagle said:
Unoptimized reference counting, which is what CPython does, isn't
all that great either. The four big bottlenecks in Python are boxed
numbers, attribute lookups, reference count updates, and the GIL.

The performance hit of having to lock the refcounts before update has
been the historical reason for keeping the GIL. The LOCK prefix takes
something like 100 cycles on an x86. Is optimizing the refcount updates
going to anywhere near make up for that?

Python's "with" statement as an approach to RAII has seemed ok to me. I
can't think of a time when I've really had to use a finalizer for
something with dynamic extent. They've always seemed like a code smell
to me.

John Nagle · Sep 5, 2010

The performance hit of having to lock the refcounts before update has
been the historical reason for keeping the GIL. The LOCK prefix takes
something like 100 cycles on an x86. Is optimizing the refcount updates
going to anywhere near make up for that?

I've argued for an approach in which only synchronized or immutable
objects can be shared between threads. Then, only synchronized objects
have refcounts. See
"http://www.animats.com/papers/languages/pythonconcurrency.html"

Guido doesn't like it. He doesn't like any "restrictions".
So we're stuck dragging around the boat anchor.

I'd hoped that the Unladen Swallow people might come up with some
really clever solution, but they seem to be stuck. It's been almost
a year since the last quarterly release. Maybe Google is putting their
effort into Go.

What's so striking is that Shed Skin can deliver 20x to 60x
speedups over CPython, while PyPy and Unladen Swallow have
trouble getting 1.5x. The question is how much one has to
restrict the language to get a serious performance improvement.

Python's "with" statement as an approach to RAII has seemed ok to me. I
can't think of a time when I've really had to use a finalizer for
something with dynamic extent. They've always seemed like a code smell
to me.

The problem appears when you have an object that owns something, like
a window or a database connection. "With" is single-level.

John Nagle

Paul Rubin · Sep 7, 2010

Lawrence D'Oliveiro said:
But youâ€™ll notice that Free Software comes with no such
restrictions. In fact, it is contrary to commonly-accepted Free
Software guidelines to impose any sort of restrictions on areas of use.

Free software comes with an explicit disclaimer of liability (you get
what you pay for). The Sun stuff ($$$$$) may have either an express or
implied warranty that could mean they get hit up for damages if the
software is wrong. IANAL YMMV etc.

Gregory Ewing · Sep 8, 2010

Paul said:
Now extrapolate that error rate from 30 lines to a program the size of
Firefox (something like 5 MLOC), and you should see how fraught with
danger that style of programming is.

But you don't write 5 MLOC of code using that programming style.
You use it to write a small core something along the lines of,
oh, I don't know, a Python interpreter, and then write the rest
of the code on top of that platform.

Gregory Ewing · Sep 8, 2010

Lawrence said:
But alone of all of these, garbage collection still remains just as costly
to implement as ever. That should tell you something about how poorly it
matches the characteristics of modern computing hardware.

So maybe we need to redesign the hardware.

Perhaps reference counts could be stored in their own
special area of memory, updated in parallel with main
memory accesses so that they don't slow anything down.
Make it multiported so that all your cores can access
it at once without locking. Maybe even build it out of
counters instead of registers, so there's no data bus,
only an address bus and inc/dec control lines.

Dennis Lee Bieber · Sep 8, 2010

So maybe we need to redesign the hardware.

Remember the iAPX-432?
http://en.wikipedia.org/wiki/Intel_iAPX_432#Garbage_collection

Lawrence D'Oliveiro · Sep 9, 2010

Well, no, it's irrelevant how easy it is to fix the issue after it's
pointed out.

In that case, I can similarly discount your attempts to fix up issues with
garbage collectors after theyâ€™re pointed out, can I not?

Part of the problem is C itself.

And yet, what are these complicated garbage collectors, that you intend
relying on to work correctly with all their layers of tricks upon tricks,
written in? C.

Paul Rubin · Sep 9, 2010

Lawrence D'Oliveiro said:
In that case, I can similarly discount your attempts to fix up issues with
garbage collectors after theyâ€™re pointed out, can I not?

Adapting GC algorithms to improve their performance or better use the
capabilities of new hardware is much different than saying they didn't
work in the first place. They've been around for 5 decades, they (like
Python programs) work fine if you don't mind the performance cost, and
for many applications that cost is acceptable even for quite simple and
naive GC's. The continued work is to get that cost down even further.
(And before criticizing GC performance, Have you ever profiled CPython
to see how much time it's spending messing with reference counts? I
didn't think so).

And yet, what are these complicated garbage collectors, that you intend
relying on to work correctly with all their layers of tricks upon tricks,
written in? C.

What tricks on tricks? Even the fanciest GC's are orders of magnitude
less complicated than any serious database, optimizing compiler, OS
kernel, file system, etc. Real-world software is complicated. Get used
to that fact, and look for ways to manage the complexity and keep
complex programs safe. Choosing to program with unsafe methods because
you wish the complexity didn't exist is just deluded.

It actually seems pretty crazy to me to write a garbage collector in C
today even though it a GC needs unsafe pointer operations in a few
places. C made some sense in the 1980's when computers were smaller and
people didn't know better. A lot of the C code around today is legacy
code from that era. These days it makes more sense to use a safe
language with a few isolated "jailbreaks" (like an OS kernel that has a
few lines of assembly code) than to write the whole thing in a language
whose unsafety is everywhere.

Here's a good paper by R. Fateman (criticizing C) that I just came across:

http://www.franz.com/resources/educational_resources/white_papers/fault.prevention.pdf

He suggests using Lisp instead, but one can't have everything ;-).

FWIW, here are a couple pages about verifying GC's:

http://flint.cs.yale.edu/flint/publications/hgc.html
http://www.cs.technion.ac.il/~erez/Papers/verified-gc-popl09.pdf
etc.

I just don't understand that weird agenda you seem to have. But
whatever.

Steven D'Aprano · Sep 9, 2010

And yet, what are these complicated garbage collectors, that you intend
relying on to work correctly with all their layers of tricks upon
tricks, written in? C.

Not necessarily.

Pascal, despite the contempt it is held in by university computer science
departments, isn't quite dead, and some Pascal compilers use garbage
collectors written in Pascal. FreePascal, I believe, is one of them.

Likewise for other not-dead-yet low-level languages like Ada and Forth.
As surprising as it seems to many, C is not the only low-level language
around suitable for writing high-quality, efficient code. Just ask the
Lisp community, which is thriving. For some definition of thriving.

Admittedly C has far more attention to it than the others, so [insert
weasel words here] the best C compilers tend to produce more efficient
code than the best of the others, but Pascal, Ada and similar give you
more security than C.

I believe that when computer scientists of the future look back at the
last few decades, they will judge that on balance C did more harm than
good. Not that C is the only language that people can write buggy or
insecure code, but C does tend to give the bugs so much help...

John Nagle · Sep 9, 2010

What tricks on tricks? Even the fanciest GC's are orders of magnitude
less complicated than any serious database, optimizing compiler, OS
kernel, file system, etc. Real-world software is complicated. Get used
to that fact, and look for ways to manage the complexity and keep
complex programs safe. Choosing to program with unsafe methods because
you wish the complexity didn't exist is just deluded.

Garbage collectors are difficult from a theoretical standpoint,
and it's very difficult to make a correct concurrent garbage collector
without using formal methods. But garbage collectors are not large
pieces of code.

John Nagle

Gregory Ewing · Sep 12, 2010

Dennis said:
Remember the iAPX-432?
http://en.wikipedia.org/wiki/Intel_iAPX_432#Garbage_collection

Not quite what I had in mind. That sounds like a conventional
GC algorithm that happens to be implemented in microcode. I'm
thinking about ways of designing the memory itself to help
with GC. Instead of putting all the smarts in the CPU, move
some of them into the RAM.

daemon thread cleanup approach	7	May 29, 2014
Multiprocessing.Queue - I want to end.	3	Apr 30, 2009
using queue	6	Sep 1, 2009
How can I structure the final array to meet the requirements of Bootstrap Tree View for building a tree in JavaScript?	1	Mar 29, 2024
Pythonicity of some algorithms	2	May 10, 2008
Real-world use of concurrent.futures	1	May 8, 2014
multiprocessing: pool with blocking queue	4	Jul 2, 2009
Queue limitations?	7	Mar 15, 2006

Queue cleanup

Paul Rubin

John Nagle

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Paul Rubin

John Nagle

Paul Rubin

John Nagle

Paul Rubin

Gregory Ewing

Gregory Ewing

Dennis Lee Bieber

Lawrence D'Oliveiro

Paul Rubin

Steven D'Aprano

John Nagle

Gregory Ewing

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads