The future of "frozen" types as the number of CPU cores increases

John Nagle · Feb 16, 2010

In the beginning, Python had some types which were "frozen", and some which
weren't. Now, there's a trend towards having both "frozen" and "unfrozen"
versions of built-in types. We now have "frozen" and "unfrozen" sets,
dictionaries, and byte arrays. It's becoming clear that that the concept of
"frozen" is separate from the type involved.

Maybe "frozen" should be generalized, so that all "unfrozen" objects also
have "frozen" versions.

I'd suggest

p = freeze(q)

which would generate a "frozen" copy of q. (For lists, this would
return a tuple. For unfrozen sets, a frozen set. Etc.)

p = deepfreeze(q)

would generate "frozen" deep copy of q, for which each object it references
is also a "frozen" copy.

For objects,

class c(frozenobject) :
def __init__(self, someparam) :
self.p = someparam
...

would result in a class which returns frozen objects from the constructor.

In concurrent programs, frozen objects can be shared without locking.
This simplifies locking considerably. This may provide a way to get out
from the Global Interpreter Lock. One possible implementation would be
to have unfrozen objects managed by reference counting and locking as
in CPython. Frozen objects would live in a different memory space and be
garbage collected by a concurrent garbage collector.

If we add "synchronized" objects with built-in locking, like Java,
threading becomes much cleaner.

class d(synchronized) :
...

A typical "synchronized" object would be a queue. Anything with
state shared across threads has to be synchronized. Only one
thread at a time can be active within a synchronized object,
due to a built-in implicit lock.

Semantics of synchronized objects:

"Synchronized" objects cannot be frozen. Trying to freeze
them just returns the object.

If a thread blocks on an explicit "wait", the synchronized
object is temporarily unlocked during the "wait". This
allows threads to wait for, say, a queue to get another
entry from another thread.

If only synchronized objects and frozen objects can be shared across
threads, the GIL becomes unnecessary. Big performance win on multicore
CPUs.

"Synchronized" was a pain in Java because Java doesn't have "frozen"
objects. Too much effort went into synchronizing little stuff. But
in a language with frozen objects, the little stuff would be frozen,
and wouldn't need its own locks.

Threaded programs that already use queue objects to communicate with
each other are almost ready for this kind of architecture now. If
the queue class made a frozen copy of any unfrozen, unsynchronized
copy placed on a queue, conversion to this approach could be almost
automatic.

What would probably happen in practice is that potential race conditions in
existing programs would raise "unshareable object" exceptions,
indicating that something needed to be synchronized. That's a good thing.
This approach makes threading much easier for the typical programmer.
Instead of race conditions and random errors, you get error messages.

And we get rid of the GIL.

I look at this as Python's answer to multicore CPUs and "Go".

John Nagle

Terry Reedy · Feb 16, 2010

In the beginning, Python had some types which were "frozen",
> and some which weren't.

In the beginning, there was only one 'frozen' general purpose collection
type, the tuple. And Guido resisted the suggestion that lists and tuple
were mutable and frozen versions of each other.

However, things have changed, and lists and tuple *are* effectively
mutable and hashable versions of each other, and we have the three other
pairs you mentioned. The idea of freeze() may have been floated (and
punctured) during Py3 discussions, but I think it a fairly good one.

Terry Jan Reedy

John Nagle · Feb 17, 2010

Terry said:
In the beginning, there was only one 'frozen' general purpose collection
type, the tuple. And Guido resisted the suggestion that lists and tuple
were mutable and frozen versions of each other.

However, things have changed, and lists and tuple *are* effectively
mutable and hashable versions of each other, and we have the three other
pairs you mentioned. The idea of freeze() may have been floated (and
punctured) during Py3 discussions, but I think it a fairly good one.

Yes, we're now at the point where all the built-in mutable types
have "frozen" versions. But we don't have that for objects. It's
generally considered a good thing in language design to offer,
for user defined types, most of the functionality of built-in ones.

It's the concurrency aspect of this that interests me, though.
A language with immutable objects can potentially handle concurrency
more safely than one where everything is potentially mutable.
The language knows what can't change, which simplifies locking.

John Nagle

Paul Rubin · Feb 17, 2010

John Nagle said:
It's the concurrency aspect of this that interests me, though.
A language with immutable objects can potentially handle concurrency
more safely than one where everything is potentially mutable.
The language knows what can't change, which simplifies locking.

I wonder how well that applies to tuples containing mutable objects,
e.g. t = ([1,2], [3,4])

Alf P. Steinbach · Feb 17, 2010

* Paul Rubin:

John Nagle said:
John Nagle said:

It's the concurrency aspect of this that interests me, though.
A language with immutable objects can potentially handle concurrency
more safely than one where everything is potentially mutable.
The language knows what can't change, which simplifies locking.

Click to expand...

I wonder how well that applies to tuples containing mutable objects,
e.g. t = ([1,2], [3,4])

One would need a collection type that's immutable "all the way".

Current frozen types don't have that.

However, there's also the question of the immutability of what names refer to.
It seems to me that this means that the compiler can't really assume very much
without very deep analysis. And so, if it boils down to the programmer applying
assumptions about mutability/constantness, then types may not be The Answer.

Cheers,

- Alf

Steven D'Aprano · Feb 17, 2010

Yes, we're now at the point where all the built-in mutable types
have "frozen" versions. But we don't have that for objects. It's
generally considered a good thing in language design to offer, for user
defined types, most of the functionality of built-in ones.

It's not hard to build immutable user-defined types. Whether they're
immutable enough is another story

.... def __new__(meta, classname, bases, classDict):
.... def __setattr__(*args):
.... raise TypeError("can't change immutable class")
.... classDict['__setattr__'] = __setattr__
.... classDict['__delattr__'] = __setattr__
.... return type.__new__(meta, classname, bases, classDict)
........ __metaclass__ = FrozenMeta
.... def __init__(self, x):
.... self.__dict__['x'] = x
....Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __setattr__
TypeError: can't change immutable class

It's a bit ad hoc, but it seems to work for me. Unfortunately there's no
way to change __dict__ to a "write once, read many" dict.

John Nagle · Feb 17, 2010

Daniel said:
A concurrent garbage collector for frozen object would still need to walk
unfrozen objects, to find references to frozen objects.

Right. But the reverse is not the case. Reference count updating
wouldn't need to look at frozen object space.

One way to implement locking is something like this:

Mutable objects have a reference count field, a lock field,
and an "owner" field. Initially, the owner of an object is its thread.
If an object's only reference is a field of a synchronized object, the
owner is the synchronized object. Mutable objects with an owner can
be manipulated by that owner without locking.

When an object reference is passed to another thread or
another synchronized object, the object becomes multi-owner and
the "owner" field is set to null. Thereafter, the object must
be locked during updates.

The headache with this is that when an object becomes multi-owner,
so does everything it points to. So there's a transient as the
system runs down the references, locking objects momentarily and
clearing owner fields.

John Nagle

sjdevnull · Feb 17, 2010

Yes, we're now at the point where all the built-in mutable types
have "frozen" versions. But we don't have that for objects. It's
generally considered a good thing in language design to offer, for user
defined types, most of the functionality of built-in ones.

Click to expand...

It's not hard to build immutable user-defined types. Whether they're
immutable enough is another story

... def __new__(meta, classname, bases, classDict):
... def __setattr__(*args):
... raise TypeError("can't change immutable class")
... classDict['__setattr__'] = __setattr__
... classDict['__delattr__'] = __setattr__
... return type.__new__(meta, classname, bases, classDict)
...

... __metaclass__ = FrozenMeta
... def __init__(self, x):
... self.__dict__['x'] = x
...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __setattr__
TypeError: can't change immutable class

It's a bit ad hoc, but it seems to work for me. Unfortunately there's no
way to change __dict__ to a "write once, read many" dict.

Which makes it not really immutable, as does the relative ease of
using a normal setattr:

.... t.__dict__['x'] = "foo"
.... print t.x
foo
.... object.__setattr__(t, "x", 42)
.... print t.x
42

John Nagle · Feb 18, 2010

John said:
I look at this as Python's answer to multicore CPUs and "Go".

On that note, I went to a talk at Stanford yesterday by one of the
designers of Intel's Nelahem core. The four-core, eight thread
version is out now. The six-core, twelve thread version is working;
the speaker has one in his lab. The eight-core, sixteen thread version
is some months away. This isn't an expensive CPU; this is Intel's
"converged" mainstream product. (Although there will be a whole range
of "economy" and "performance" versions, all with the same core but
with some stuff turned off.)

Python isn't ready for this. Not with the GIL.

Multiple processes are not the answer. That means loading multiple
copies of the same code into different areas of memory. The cache
miss rate goes up accordingly.

John Nagle

Ethan Furman · Feb 18, 2010

John said:
On that note, I went to a talk at Stanford yesterday by one of the
designers of Intel's Nelahem core. The four-core, eight thread
version is out now. The six-core, twelve thread version is working;
the speaker has one in his lab. The eight-core, sixteen thread version
is some months away. This isn't an expensive CPU; this is Intel's
"converged" mainstream product. (Although there will be a whole range
of "economy" and "performance" versions, all with the same core but
with some stuff turned off.)

Python isn't ready for this. Not with the GIL.

Multiple processes are not the answer. That means loading multiple
copies of the same code into different areas of memory. The cache
miss rate goes up accordingly.

John Nagle

Will the new GIL in 3.2 make this workable? It would still be one
thread at a time, though, wouldn't it.

~Ethan~

Chris Rebert · Feb 18, 2010

On Thu said:
Â On that note, I went to a talk at Stanford yesterday by one of the
designers of Intel's Nelahem core. Â The four-core, eight thread
version is out now. Â The six-core, twelve thread version is working;
the speaker has one in his lab. Â The eight-core, sixteen thread version
is some months away. Â This isn't an expensive CPU; this is Intel's
"converged" mainstream product. Â (Although there will be a whole range
of "economy" and "performance" versions, all with the same core but
with some stuff turned off.)

Â Python isn't ready for this. Â Not with the GIL.

Is any language, save perhaps Erlang, really ready for it?

Cheers,
Chris

Steven D'Aprano · Feb 18, 2010

John said:
John said:

I look at this as Python's answer to multicore CPUs and "Go".

Click to expand...

On that note, I went to a talk at Stanford yesterday by one of the
designers of Intel's Nelahem core. The four-core, eight thread version
is out now. [...]

Python isn't ready for this. Not with the GIL.

Pardon me, but Python is perfectly ready for this. Why aren't you using
Jython or IronPython, if the GIL is such a drag on your use-case?

Gregory Ewing · Feb 19, 2010

John said:
One way to implement locking is something like this:

Mutable objects have a reference count field, a lock field,
and an "owner" field. Initially, the owner of an object is its thread.
If an object's only reference is a field of a synchronized object, the
owner is the synchronized object.

The trouble with that is that there will hardly ever be
"only one reference" to any object that you do anything
with, even just looking at it, because temporary references
come and go as the interpreter goes about its business.

A simple demonstration of this:

Python 2.5 (r25:51908, Apr 8 2007, 22:22:18)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1809)] on darwin
Type "help", "copyright", "credits" or "license" for more information.2

Even though you might think that there is only
one reference to foo, the very act of passing it
as a parameter to getrefcount() causes another
reference to come into existence temporarily.

sjdevnull · Feb 21, 2010

Multiple processes are not the answer. That means loading multiple
copies of the same code into different areas of memory. The cache
miss rate goes up accordingly.

A decent OS will use copy-on-write with forked processes, which should
carry through to the cache for the code.

John Nagle · Feb 21, 2010

A decent OS will use copy-on-write with forked processes, which should
carry through to the cache for the code.

That doesn't help much if you're using the subprocess module. The
C code of the interpreter is shared, but all the code generated from
Python is not.

This will get even worse when Unladen Swallow starts generating vast
swaths of unshared x86 instructions.

John Nagle

Paul Rubin · Feb 21, 2010

John Nagle said:
That doesn't help much if you're using the subprocess module. The
C code of the interpreter is shared, but all the code generated from
Python is not.

Emacs in days of yore used a hack called "unexec" to bring its Lisp code
into the pure text segment. I think it's not used any more because
machines are now big enough that the benefits aren't that great.
Basically in the Emacs build process, you'd start up the C portion of
Emacs, then tell it to load all its Lisp code, then call "unexec".
Unexec basically performed a core dump of the process, then ran
something that manipulated the original Emacs image and the core dump
into a new executable that had the static Lisp code and data now marked
as part of the (immutable) program area. Unexec necessarily had
platform dependencies, but this was managed with a bunch of ifdefs in a
reasonably clean and maintainable way. I could imagine doing something
similar with a large Python program.

sjdevnull · Feb 21, 2010

That doesn't help much if you're using the subprocess module. The
C code of the interpreter is shared, but all the code generated from
Python is not.

Of course. Multithreading also fails miserably if the threads all try
to call exec() or the equivalent.

It works fine if you use os.fork().

Paul Boddie · Feb 21, 2010

A decent OS will use copy-on-write with forked processes, which should
carry through to the cache for the code.

True, but the principal issue with CPython and copy-on-write is the
reference counting. Consider a large list shared with the child
processes which is to be processed (either in part or in its entirety)
by each of them. As soon as the child processes start referencing each
of the elements, the reference count for each element object will need
to be changed and pages will start to be copied for each child
process. That's why John wants to avoid the reference counting of
shared data and why the Unladen Swallow project has apparently
considered storing reference counts in separate regions of memory.

Such shared data is really "owned" by the parent process, so it makes
little sense for the child processes to manage it with a view to
collecting it later. After all, such data should have no cost to each
of the child processes (except, perhaps, for the size of the address
space referenced by each process, and any resource issues arising from
managing that).

Paul

John Nagle · Feb 23, 2010

Of course. Multithreading also fails miserably if the threads all try
to call exec() or the equivalent.

It works fine if you use os.fork().

Forking in multithreaded programs is iffy. What happens depends
on the platform, and it's usually not what you wanted to happen.
Solaris before version 10 forks all threads, and both new processes
have all the threads running. POSIX semantics are to fork only the
thread making the request. The problem is that this leaves the other
Python threads in the new process with Python state, locks and reference
counts set, but no OS thread to run them. See

"http://bugs.python.org/issue1683"
"http://bugs.python.org/issue874900"
"http://bugs.python.org/issue6721"
"http://bugs.python.org/issue7242"

There are at least two open bug reports in this area.

If you fork and "exec", which discards the state of Python in the child
process, there's less trouble. But if you're actually forking a
Python environment and running it, you're going to have at least a
memory leak, and probably further difficulties.

John Nagle

sjdevnull · Feb 23, 2010

Forking in multithreaded programs is iffy. What happens depends
on the platform, and it's usually not what you wanted to happen.

Well, yeah. And threading in multiprocess apps is iffy. In the real
world, though, multiprocessing is much more likely to result in a
decent app than multithreading--and if you're not skilled at either,
starting with multiprocessing is by far the smarter way to begin.

Basically, multiprocessing is always hard--but it's less hard to start
without shared everything. Going with the special case (sharing
everything, aka threading) is by far the stupider and more complex way
to approach multiprocssing.

And really, for real-world apps, it's much, much more likely that
fork() will be sufficient than that you'll need to explore the
vagueries of a multithreaded solution. Protected memory rocks, and in
real life it's probably 95% of the time where threads are only even
considered if the OS can't fork() and otherwise use processes well.

The future of Python immutability	50	Sep 3, 2009
[ANN]: "newthreading" - an approach to simplified thread usage, anda path to getting rid of the GIL	2	Jun 25, 2010
How to keep the order of executing tasks? - Help needed.	1	Feb 21, 2023
I would like to use awk to calculate the total number of records processed	1	Aug 25, 2022
The Future of Python Threading	34	Aug 10, 2007
Symbols and frozen strings	12	Sep 6, 2007
tornado.web ioloop add_timeout eats CPU	3	Sep 3, 2012
What is the purpose of type() and the types module and what is a type?	1	Jun 27, 2013

The future of "frozen" types as the number of CPU cores increases

John Nagle

Terry Reedy

John Nagle

Paul Rubin

Alf P. Steinbach

Steven D'Aprano

John Nagle

sjdevnull

John Nagle

Ethan Furman

Chris Rebert

Steven D'Aprano

Gregory Ewing

sjdevnull

John Nagle

Paul Rubin

sjdevnull

Paul Boddie

John Nagle

sjdevnull

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads