Queue cleanup

Paul Rubin · Aug 31, 2010

Lawrence D'Oliveiro said:
Meanwhile, real-world programmers get on to writing real-world code that is
productive and efficient on real-world systems.

It's pretty well established by now that GC doesn't have any significant
speed penalty compared with manual allocation. It does consume more
memory, which is acceptable a lot of the time. It certainly leads to
more reliable programs.

Lawrence D'Oliveiro · Sep 1, 2010

It's pretty well established by now that GC doesn't have any significant
speed penalty compared with manual allocation. It does consume more
memory, which is acceptable a lot of the time. It certainly leads to
more reliable programs.

And yet Java code, for example, doesnâ€™t have a reputation for greater
reliability compared to, say code written in Ada or C++, or even C. What is
the Java runtime written in? C. Why not use Java, if there is no speed
penalty in doing so?

Lawrence D'Oliveiro · Sep 1, 2010

Reference counting has the same problem. If you drop the last reference
to a complex structure, it could take quite a long time to free all the
components.

One difference is the interaction with caching behaviour. When a reference-
counted object is freed, the odds are that happens fairly soon after the
last access, so the object will still be in the CPU cache, and access will
be fast.

Whereas garbage collection will happen at some indeterminate time long after
the last access to the object, when it very likely will no longer be in the
cache, and have to be brought back in just to be freed, quite likely bumping
out something else that the running program needs to access.

This is one reason why garbage collection is still considered an expensive
technique. Computing power has improved by something like five orders of
magnitude over the last half-century, making possible all kinds of
productivity-enhancing techniques that were once considered expensive to
become commonplace: high-level languages, dynamic memory allocation, stacks,
hardware floating-point, memory protection and so on.

But alone of all of these, garbage collection still remains just as costly
to implement as ever. That should tell you something about how poorly it
matches the characteristics of modern computing hardware.

Lawrence D'Oliveiro · Sep 1, 2010

Refcounting is susceptable to the same pauses for reasons already
discussed.

Doesnâ€™t seem to happen in the real world, though.

Paul Rubin · Sep 1, 2010

Lawrence D'Oliveiro said:
And yet Java code, for example, doesnâ€™t have a reputation for greater
reliability compared to, say code written in Ada or C++, or even C. What is
the Java runtime written in? C. Why not use Java, if there is no speed
penalty in doing so?

The Java runtime (such as the garbage collector) needs untyped pointers
and can't be written in Java. It might be possible to write a type-safe
GC in something like ATS, which has extremely precise types, but that is
almost alien technology by comparison to writing in C.

Java has considerably greater reputation for reliability than C or C++.
Ada is a different story, but Ada programs (because of the application
area Ada is used in) tend not to use a lot of dynamic memory allocation
in the first place. A little googling shows there are GC extensions
available for Ada, though I don't know if they are used much.

Paul Rubin · Sep 1, 2010

Lawrence D'Oliveiro said:
Whereas garbage collection will happen at some indeterminate time long after
the last access to the object, when it very likely will no longer be in the
cache, and have to be brought back in just to be freed,

GC's for large systems generally don't free (or even examine) individual
garbage objects. They copy the live objects to a new contiguous heap
without ever touching the garbage, and then they release the old heap.
That has the effect of improving locality, since the new heap is
compacted and has no dead objects. The algorithms are generational
(they do frequent gc's on recently-created objects and less frequent
ones on older objects), so "minor" gc operations are on regions that fit
in cache, while "major" ones might have cache misses but are infrequent.

Non-compacting reference counting (or simple mark/sweep gc) has much
worse fragmentation and consequently worse cache locality than
copying-style gc.

Paul Rubin · Sep 1, 2010

Lawrence D'Oliveiro said:
Doesnâ€™t seem to happen in the real world, though.

def f(n):
from time import time
a = [1] * n
t0 = time()
del a
t1 = time()
return t1 - t0

for i in range(9):
print i, f(10**i)

on my system prints:

0 2.86102294922e-06
1 2.14576721191e-06
2 3.09944152832e-06
3 1.00135803223e-05
4 0.000104904174805
5 0.00098991394043
6 0.00413608551025
7 0.037693977356
8 0.362598896027

Looks pretty linear as n gets large. 0.36 seconds (the last line) is a
noticable pause.

Dennis Lee Bieber · Sep 2, 2010

GC's for large systems generally don't free (or even examine) individual
garbage objects. They copy the live objects to a new contiguous heap
without ever touching the garbage, and then they release the old heap.
That has the effect of improving locality, since the new heap is
compacted and has no dead objects. The algorithms are generational
(they do frequent gc's on recently-created objects and less frequent
ones on older objects), so "minor" gc operations are on regions that fit
in cache, while "major" ones might have cache misses but are infrequent.

That sounds suspiciously like the original Macintosh OS, with its
"handles"... IE, double-indirection. "user" pointers pointing into a
table of pointers to the actual memory... That way the GC could move the
actual data anywhere and only have to update ONE pointer, and could do
that transparently to the user code (as its pointers point to the
non-movable pointer)

If you don't have the double indirection of handles, it means
finding ALL references to the moved object, and updating them en-mass
before allowing user code to resume execution.

John Nagle · Sep 2, 2010

I guess that is how the so-called smart pointers in the Boost C++
template library work. I haven't used them so I don't have personal
experience with how convenient or reliable they are, or what kinds of
constraints they imposed on programming style. I've always felt a bit
suspicious of them though, and I seem to remember Alex Martelli (I hope
he shows up here again someday) advising against using them.

"Smart pointers" in C++ have never quite worked right. They
almost work. But there always seems to be something that needs
access to a raw C pointer, which breaks the abstraction.
The mold keeps creeping through the wallpaper.

Also, since they are a bolt-on at the macro level in C++,
reference count updates aren't optimized and hoisted out of
loops. (They aren't in CPython either, but there have been reference
counted systems that optimize out most reference count updates.)

John Nagle

Paul Rubin · Sep 2, 2010

That sounds suspiciously like the original Macintosh OS, with its
"handles"... IE, double-indirection.

Nah, a double indirection on every access would be a terrible
performance hit. The classic approach is when you move an object to the
new heap, you leave a tagged forwarding pointer at its former location
the old heap, giving the its location in the new heap. Then as you move
other objects, you dereference the pointers in them to see whether they
point to moved or unmoved objects, and relocate any unmoved ones. A
more complete explanation is here:

http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-33.html#%_sec_5.3.2

Lawrence D'Oliveiro · Sep 4, 2010

FromArray is a parameter to the function, with no type check to make
sure it's really an array. In fact your code allows for the possibility
that it doesn't support the buffer_info operation (if I understand the
purpose of the null return check after the PyObject_CallMethod) which
means it's prepared for the argument to -not- be an array.

That reinforces my point, about how easy it was to check the correctness of
the code. In this case one simple fix, like this

diff --git a/spuhelper.c b/spuhelper.c
index 83fd4eb..2ba8197 100644
--- a/spuhelper.c
+++ b/spuhelper.c
@@ -151,10 +151,12 @@ static void GetBufferInfo
if (TheBufferInfo == 0)
break;
AddrObj = PyTuple_GetItem(TheBufferInfo, 0);
- LenObj = PyTuple_GetItem(TheBufferInfo, 1);
if (PyErr_Occurred())
break;
Py_INCREF(AddrObj);
+ LenObj = PyTuple_GetItem(TheBufferInfo, 1);
+ if (PyErr_Occurred())
+ break;
Py_INCREF(LenObj);
*addr = PyInt_AsUnsignedLongMask(AddrObj);
*len = PyInt_AsUnsignedLongMask(LenObj);

would render the code watertight. See how easy it is?

Lawrence D'Oliveiro · Sep 4, 2010

Lawrence D'Oliveiro said:
Lawrence D'Oliveiro said:

Doesnâ€™t seem to happen in the real world, though.

Click to expand...

def f(n):
from time import time
a = [1] * n
t0 = time()
del a
t1 = time()
return t1 - t0

for i in range(9):
print i, f(10**i)

on my system prints:

0 2.86102294922e-06
1 2.14576721191e-06
2 3.09944152832e-06
3 1.00135803223e-05
4 0.000104904174805
5 0.00098991394043
6 0.00413608551025
7 0.037693977356
8 0.362598896027

Looks pretty linear as n gets large. 0.36 seconds (the last line) is a
noticable pause.

Which just proves the point. You had to deliberately set up the situation to
make that happen. And it remains just as easy to pinpoint where it is
happening, so you can control it.

With a garbage collector, you donâ€™t have that control. Even if you try to
avoid freeing a single large structure at once, itâ€™s still liable to batch
up a lot of small objects to free at once, so the problem can still happen.

Lawrence D'Oliveiro · Sep 4, 2010

GC's for large systems generally don't free (or even examine) individual
garbage objects. They copy the live objects to a new contiguous heap
without ever touching the garbage, and then they release the old heap.

And suddenly youâ€™ve doubled the memory requirements. And on top of that,
since youâ€™re moving the valid objects into different memory, youâ€™re forcing
cache misses on all of them as well.

This is the continuing problem with garbage collection: all the attempts to
make it cheaper just end up moving the costs somewhere else.

Lawrence D'Oliveiro · Sep 4, 2010

Java has considerably greater reputation for reliability than C or C++.

Wonder why Sunâ€™s licence explicitly forbade its use in danger-critical areas
like nuclear power plants and the like, then?

Ada is a different story, but Ada programs (because of the application
area Ada is used in) tend not to use a lot of dynamic memory allocation
in the first place. A little googling shows there are GC extensions
available for Ada, though I don't know if they are used much.

Letâ€™s put it this way: the life-support system on the International Space
Station is written in Ada. Would you trust your life to code written in
Java?

Paul Rubin · Sep 4, 2010

Lawrence D'Oliveiro said:
Wonder why Sunâ€™s licence explicitly forbade its use in danger-critical
areas like nuclear power plants and the like, then?

Probably because Sun lawyers demanded it. Is there a Sun C or C++
compiler with a license that doesn't have that restriction? Even if
there is, it just means those languages are so unreliable that the
lawyers felt confident that any meltdown could be blamed on a bug in the
user's rather than the compiler ;-).

Letâ€™s put it this way: the life-support system on the International Space
Station is written in Ada. Would you trust your life to code written in
Java?

The scary thing is I don't know whether I'm already doing that. Life
support systems have hard real-time requirements (Ada's forte) but I'd
expect lots of military decision-support systems are written in Java.
Maybe one of them will raise a false alert and somebody will launch a
war.

MRAB · Sep 4, 2010

Probably because Sun lawyers demanded it. Is there a Sun C or C++
compiler with a license that doesn't have that restriction? Even if
there is, it just means those languages are so unreliable that the
lawyers felt confident that any meltdown could be blamed on a bug in the
user's rather than the compiler ;-).

The scary thing is I don't know whether I'm already doing that. Life
support systems have hard real-time requirements (Ada's forte) but I'd
expect lots of military decision-support systems are written in Java.
Maybe one of them will raise a false alert and somebody will launch a
war.

I thought it was just that if it wasn't explicitly forbidden then
someone might try to use it and then sue if something went wrong, even
though common sense would have said that it was a bad idea in the first
place!

Paul Rubin · Sep 4, 2010

Lawrence D'Oliveiro said:
And suddenly youâ€™ve doubled the memory requirements. And on top of that,
since youâ€™re moving the valid objects into different memory, youâ€™re forcing
cache misses on all of them as well.

A minimal naive implementation indeed doubles the memory requirements,
but from a Python perspective where every integer takes something like
24 bytes already, even that doesn't seem so terrible. More
sophisticated implementations use multiple small heaps or other tricks.
It still costs something in memory footprint, but less than the minimal
implementation's 2x cost.

The new heap is filled sequentially so accesses to it will have good
locality. You do have to jump around within the old heap, but again,
with generational schemes, in the more frequent collections, the old
heap fits entirely in cache. For example, GHC's minor heap size is
256kB. For major collections, GHC switches (or used to) from copying to
a mark/compact scheme once the amount of live data in the heap goes over
some amount, giving the best of both worlds.

It's also the case that programs with very large memory consumption tend
to use most of the memory for large arrays that don't contain pointers
(think of a database server with a huge cache). That means the gc
doesn't really have to think about all that much of the memory.

This is the continuing problem with garbage collection: all the attempts to
make it cheaper just end up moving the costs somewhere else.

Same thing with manual allocation. That moves the costs off the
computer and onto the programmer. Not good, most of the time.

Really, I'm no gc expert, but the stuff you're saying about gc is quite
ill-informed. You might want to check out some current literature.

Dennis Lee Bieber · Sep 4, 2010

And suddenly you’ve doubled the memory requirements. And on top of that,
since you’re moving the valid objects into different memory, you’re forcing
cache misses on all of them as well.

Not to mention having to ensure that one finds ALL the references to
the object so that they can be updated to the new address! Somehow I
don't see a C compiler being smart enough to find intermediary pointer
variables that had been operated upon (maybe by the addition of some
constant offset in a function treating the value as an integer, and then
passing that to a function that casts to a pointer again?)

Hence the old Macintosh double reference "handle" scheme where user
code gets pointers into a fixed/unmovable chunk of memory which only
contains addresses to the actual data objects. This way the GC can move
the objects, update a single pointer, and user code is unaffected.

Paul Rubin · Sep 4, 2010

Dennis Lee Bieber said:
Not to mention having to ensure that one finds ALL the references to
the object so that they can be updated to the new address! Somehow I
don't see a C compiler being smart enough to find intermediary pointer

We're not talking about C compilers, which can cast any value at all
into pointers. Languages designed for garbage collection are normally
type-safe and gc is a well-understood problem, though (like compilers)
the higher-performing ones are complicated. But, nothing in principle
stops anyone from implementing Python with such methods.

Aahz · Sep 4, 2010

[gc]

A minimal naive implementation indeed doubles the memory requirements,
but from a Python perspective where every integer takes something like
24 bytes already, even that doesn't seem so terrible.

Many people still use 32-bit Python -- an int is twelve bytes there.

daemon thread cleanup approach	7	May 29, 2014
Multiprocessing.Queue - I want to end.	3	Apr 30, 2009
using queue	6	Sep 1, 2009
How can I structure the final array to meet the requirements of Bootstrap Tree View for building a tree in JavaScript?	1	Mar 29, 2024
Pythonicity of some algorithms	2	May 10, 2008
Real-world use of concurrent.futures	1	May 8, 2014
multiprocessing: pool with blocking queue	4	Jul 2, 2009
Queue limitations?	7	Mar 15, 2006

Queue cleanup

Paul Rubin

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Paul Rubin

Paul Rubin

Paul Rubin

Dennis Lee Bieber

John Nagle

Paul Rubin

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Lawrence D'Oliveiro

Paul Rubin

MRAB

Paul Rubin

Dennis Lee Bieber

Paul Rubin

Aahz

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads