The future of Python immutability

J

John Nagle

Python's concept of immutability is useful, but it could be more
general.

In the beginning, strings, tuples, and numbers were immutable, and
everything else was mutable. That was simple enough. But over time,
Python has acquired more immutable types - immutable sets and immutable
byte arrays. Each of these is a special case.

Python doesn't have immutable objects as a general concept, but
it may be headed in that direction. There was some fooling around
with an "immmutability API" associated with NumPy back in 2007, but
that was removed. As more immutable types are added, a more general
approach may be useful.

Suppose, for discussion purposes, we had general "immutable objects".
Objects inherited from "immutableobject" instead of "object" would be
unchangeable once "__init__" had returned. Where does this take us?

Immutability is interesting for threaded programs, because
immutable objects can be shared without risk. Consider a programming
model where objects shared between threads must be either immutable or
"synchronized" in the sense that Java uses the term. Such programs
are free of most race conditions, without much programmer effort to
make them so.

Java "synchronized" turned out to be a headache partly because trying to
figure out how to lock all the "little stuff" being passed around
a headache. But Java doesn't have immutable objects. Python does,
and that can be exploited to make thread-based programming cleaner.

The general idea is that synchronized objects would have built in
locks which would lock at entry to any function of the object and
unlock at exit. The lock would also unlock at explicit waits. A
"Queue" object would be a good example of a synchronized object.

With this mechanism, multi-thread programs with shared data
structures can be written with little or no explicit locking by
the programmer. If the restrictions are made a bit stricter,
strict enough that threads cannot share mutable unsynchronized data,
removal of the "global interpreter lock" is potentially possible.
This is a route to improved performance on modern multi-core CPUs.

John Nagle
 
N

Nigel Rantor

John said:
Python's concept of immutability is useful, but it could be more
general.

In the beginning, strings, tuples, and numbers were immutable, and
everything else was mutable. That was simple enough. But over time,
Python has acquired more immutable types - immutable sets and immutable
byte arrays. Each of these is a special case.

Python doesn't have immutable objects as a general concept, but
it may be headed in that direction. There was some fooling around
with an "immmutability API" associated with NumPy back in 2007, but
that was removed. As more immutable types are added, a more general
approach may be useful.

Suppose, for discussion purposes, we had general "immutable objects".
Objects inherited from "immutableobject" instead of "object" would be
unchangeable once "__init__" had returned. Where does this take us?

Immutability is interesting for threaded programs, because
immutable objects can be shared without risk. Consider a programming
model where objects shared between threads must be either immutable or
"synchronized" in the sense that Java uses the term.

Yes, this is one of the reasons I am currently learning Haskell, I am
not yet anywhwere near proficient but the reason I am looking into FP is
because of some of the claims of the FP community, particularly Erlang,
regarding the benefits of pure FP with respect to multi-threading.

It's a shame this post came right now since I'm not really up-to-speed
enough with Haskell to comment on it with repsect to multi-threading.

<context>
I program Perl, Java and C++ for my day job, I've spent a lot of time
making multithreaded programs work correctly and have even experienced
the POE on a large project. So my comments below are based on experience
of these languages.
> Such programs are free of most race conditions, without much
> programmer effort to make them so.

I disagree. They are not free of most race conditions, and it still
takes a lot of effort. Where did you get this idea from? Have you been
reading some Java primer that attempts to make it sound easy?
Java "synchronized" turned out to be a headache partly because
trying to
figure out how to lock all the "little stuff" being passed around
a headache. But Java doesn't have immutable objects. Python does,
and that can be exploited to make thread-based programming cleaner.

This is nothing to do with Java, any multithreaded language that has
mutable shared state has exactly the same problems. Can we talk about
threading rather than Java please? Additionally Java provides a lot more
than monitors (synchronized) for controlling multiple threads.

Java does have immutable objects. Strings in Java are immutable for
example. As are the object-based numeric types, Bytes, Characters etc.

There are lots and lots of immutable types in Java and you can make your
own by creating a class with no mutator methods and declaring it "final".
The general idea is that synchronized objects would have built in
locks which would lock at entry to any function of the object and
unlock at exit. The lock would also unlock at explicit waits. A
"Queue" object would be a good example of a synchronized object.

With this mechanism, multi-thread programs with shared data
structures can be written with little or no explicit locking by
the programmer. If the restrictions are made a bit stricter,
strict enough that threads cannot share mutable unsynchronized data,
removal of the "global interpreter lock" is potentially possible.
This is a route to improved performance on modern multi-core CPUs.

Right, this is where I would love to have had more experience with Haksell.

Yes, as soon as you get to a situation where no thread can access shared
state that is mutable your problems go away, you're also getting no work
done becasue the threads, whilst they may be performing lots of
interesting calculations have no way of allowing the rest of the
program, or operating system, know about it.

You can, today, in any language that provides threads, make any number
of threaded programs that do not contain any race conditions, it's just
that most of them are terribly dull and uninteresting.

I'd love for someone from the Haskell/Erlang/<other> pure FP community
provide some canonical example of how this is acheived in pure FP. I'll
get there soon but I'm not going to skip ahead in my reading, I'm still
trying to learn the basics.

So, in response to your point of trying to get an immutable API so that
Python can easily have multi-threaded programs that do not present race
conditions I would say the following:

That is not the challenge, that's the easy part. The challenge is
getting useful information out of a system that has only been fed
immutable objects.

Regards,

Nigel
 
S

Stefan Behnel

Nigel said:
I disagree. They are not free of most race conditions, and it still
takes a lot of effort. Where did you get this idea from? Have you been
reading some Java primer that attempts to make it sound easy?

Read again what he wrote. In a language with only immutable data types
(which doesn't mean that you can't efficiently create modified versions of
a data container), avoiding race conditions is trivial. The most well known
example is clearly Erlang. Adding "synchronised" data structures to that
will not make writing race conditions much easier.

Stefan
 
N

Nigel Rantor

Stefan said:
Read again what he wrote. In a language with only immutable data types
(which doesn't mean that you can't efficiently create modified versions of
a data container), avoiding race conditions is trivial. The most well known
example is clearly Erlang. Adding "synchronised" data structures to that
will not make writing race conditions much easier.

My comment you quoted was talking about Java and the use of
synchronized. I fthat was unclear I apologise.

Please feel free to read the entirety of my post before replying.

n
 
S

Stefan Behnel

Nigel said:
My comment you quoted was talking about Java and the use of
synchronized. I fthat was unclear I apologise.

Well, it was clear. But it was also unrelated to what the OP wrote. He was
talking about the semantics of "synchronized" in Java, not the use.

Stefan
 
S

Stefan Behnel

John said:
With this mechanism, multi-thread programs with shared data
structures can be written with little or no explicit locking by
the programmer. If the restrictions are made a bit stricter,
strict enough that threads cannot share mutable unsynchronized data,
removal of the "global interpreter lock" is potentially possible.
This is a route to improved performance on modern multi-core CPUs.

The main problem with this is that the existing data structures are, well,
there. You can't replace them without breaking all existing Python code,
i.e. without basically inventing a new language. If that's required for
removing the GIL, I doubt that it will ever be done.

Stefan
 
P

Paul Rubin

Stefan Behnel said:
Read again what he wrote. In a language with only immutable data types
(which doesn't mean that you can't efficiently create modified versions of
a data container), avoiding race conditions is trivial. The most well known
example is clearly Erlang. Adding "synchronised" data structures to that
will not make writing race conditions much easier.

Nonetheless, Erlang is subject to all kinds of traditional threading
problems such as deadlocks. Haskell's use of software transactional
memory may(?) avoid some of the problems but at a performance cost.
 
G

Gabriel Genellina

Python's concept of immutability is useful, but it could be more
general.

Immutability is interesting for threaded programs, because
immutable objects can be shared without risk. Consider a programming
model where objects shared between threads must be either immutable or
"synchronized" in the sense that Java uses the term. Such programs
are free of most race conditions, without much programmer effort to
make them so.

In the current CPython implementation, every object has a reference count,
even immutable ones. This must be a writable field - and here you have
your race condition, even for immutable objects.
 
J

John Nagle

Gabriel said:
In the current CPython implementation, every object has a reference
count, even immutable ones. This must be a writable field - and here you
have your race condition, even for immutable objects.

That's an implementation problem with CPython. I'm looking at this as
a language design issue. Shed Skin, which is garbage-collected, doesn't
have that problem. Look ahead to a new generation of Python implementations
that go fast and use multiprocessors effectively.

John Nagle
 
R

r

"""The future of Python immutability"""

Define future:
The future is a time period commonly understood to contain all
events that have yet to occur. It is the opposite of the past, and is
the time after the present

Define immutability:
Not subject or susceptible to change. In object-oriented and
functional programming, an immutable object is an object whose state
cannot be modified after it is created. This is in contrast to a
mutable object, which can be modified after it is created.



hmm, applying this logic i'd have to say about the same really.

;-)
 
F

Francesco Bochicchio

Right, this is where I would love to have had more experience with Haksell.

Yes, as soon as you get to a situation where no thread can access shared
state that is mutable your problems go away, you're also getting no work
done becasue the threads, whilst they may be performing lots of
interesting calculations have no way of allowing the rest of the
program, or operating system, know about it.

Threads could communicate only with channels, message queue, or
equivalent. Is what
I do that as much as I can, to avoid the headache of sharing data
between threads. It is
less efficient than the shared data model and adds latency, but ensure
that each thread
is self-contained, making for safer programming and opening the way to
better parallelization.
AFAIK erlang Processes and scala Actors implement a similar model at
language level.

In python, there is kamaelia that implements a similar paradigm,
although it is more concerned
with logical parallelization than with multitheading performance
issue.

I believe this kind of paradigms will bring us to the massive
multicore world easier than FP.
Consider that most FP languages have accepted a compromise and become
'unpure' (i.e. have
constructs to change variable in place). Even haskell, the only pure
FP
language I know (sort of), has things like mutable arrays.
All these features expose current FP languages at the same 'shared
resource' risks of imperative one,
although admittedly at a less level. And FP languages have their own
crop of problems - e.g how to deal
efficiently with deep recursion levels, parameters passed by copy,
huge list built in memory (if you use eager evaluation)
or build-up of thunks (if you use lazy evaluation).

But then, I'm just a programmer, so I could be easily wrong :)

Ciao
 
U

Ulrich Eckhardt

Nigel said:
Yes, this is one of the reasons I am currently learning Haskell, I am
not yet anywhwere near proficient but the reason I am looking into FP is
because of some of the claims of the FP community, particularly Erlang,
regarding the benefits of pure FP with respect to multi-threading.

Actually, I wouldn't say that FP itself changes anything there. However, FP
usually (correct me if I'm wrong) implies immutability of objects, i.e. no
variable assignment.

Such programs are free of most race conditions, without much
programmer effort to make them so.

I disagree. They are not free of most race conditions, and it still
takes a lot of effort. Where did you get this idea from? Have you been
reading some Java primer that attempts to make it sound easy? [...]
With this mechanism, multi-thread programs with shared data
structures can be written with little or no explicit locking by
the programmer. If the restrictions are made a bit stricter,
strict enough that threads cannot share mutable unsynchronized data,
removal of the "global interpreter lock" is potentially possible.
This is a route to improved performance on modern multi-core CPUs.

Right, this is where I would love to have had more experience with
Haksell.

Yes, as soon as you get to a situation where no thread can access shared
state that is mutable your problems go away, you're also getting no work
done becasue the threads, whilst they may be performing lots of
interesting calculations have no way of allowing the rest of the
program, or operating system, know about it.

I think it is the combination of immutable objects and message passing
(which he omitted mentioning) that is the point. I initially encountered
this principle in Erlang, but it also applies in other languages. For
example in C++, you can create MT-program using std::auto_ptr<T> for
mutable, exclusively owned objects and boost::shared_pointer<T const> for
shared, immutable objects.

So, in response to your point of trying to get an immutable API so that
Python can easily have multi-threaded programs that do not present race
conditions I would say the following:

That is not the challenge, that's the easy part. The challenge is
getting useful information out of a system that has only been fed
immutable objects.

You're overly pessimistic. Immutable objects are a tool, not the holy grail.
Having them available can help you solve some problems, not all of them.
You obviously still need mutable data, but if you can restrict access to
them to just one thread, there is no need for synchronisation for them.
Lastly, for the message passing, you also need shared, mutable structures
(queues), so you can't live completely without conventional locking.

Uli
 
P

Paul Rubin

Ulrich Eckhardt said:
Lastly, for the message passing, you also need shared, mutable structures
(queues), so you can't live completely without conventional locking.

But that can be completely behind the scenes in the language or
library implementation. The application programmer doesn't have to
think about those locks.
 
H

Hendrik van Rooyen

That is not the challenge, that's the easy part. The challenge is
getting useful information out of a system that has only been fed
immutable objects.

Is it really that difficult? (completely speculative):

class MyAnswer(object):
def __init__(self)
self.finished = False
self.Answer = None
self.collected = False

Then when you start a thread, you give it an instance:

ans = MyAnswer()
list_of_answers.append(c)

worker_thread = thread.start_new_thread(worker, (ans, immutable_stuff ))

def worker(ans):
ans.Answer = do_some_stuff()
ans.finished = True

Then to see if everybody is finished:

runbool = True
While runbool:
finished = False
runbool = False
for answer in list_of_answers:
if answer.finished and not answer.collected:
do_something(answer.Answer)
answer.collected = True
else:
runbool = True

This scheme gives only one thread the license to make a change to the answer
instance, so how can it go wrong?

You can also extend it for long running threads by playing ping pong with the
two booleans - the collector sets the collected boolean and clears the
finished, and the worker must clear the collected boolean before starting a
new cycle. ( the worker waits for not finished, then clears collected)

I do similar stuff in assembler and I have no problems - Am I missing
something subtle?

Of course - working with immutable objects only is a bit like working with
read only memory, or worse, write only memory - not easy.

- Hendrik
 
A

Adam Skutt

     Suppose, for discussion purposes, we had general "immutable objects".
Objects inherited from "immutableobject" instead of "object" would be
unchangeable once "__init__" had returned.  Where does this take us?

You can create this in various ways through metaclasses. I've done
it, mostly because I was curious to see how hard it would be and if it
actually gained me anything useful.
     With this mechanism, multi-thread programs with shared data
structures can be written with little or no explicit locking by
the programmer.  If the restrictions are made a bit stricter,
strict enough that threads cannot share mutable unsynchronized data,
removal of the "global interpreter lock" is potentially possible.
This is a route to improved performance on modern multi-core CPUs.
Nope, preventing mutation of the objects themselves is not enough.
You also have to forbid variables from being rebound (pointed at
another object). Consider this simple example:

---------- Thread 1 ---------- | ---------- Thread 2 ----------
a = "Foo"
spawn Thread 2
a = "Bar" print "thread 2: %s" % a
print "thread 1: %s" % a

You could see (ignoring the fact the output isn't ordered):
"thread 1: Bar"
"thread 2: Foo"
or:
"thread 1: Bar"
"thread 2: Bar"

so the fact "Foo" and "Bar" are immutable isn't enough to solve the
problem. The variables themselves, since they obey pointer semantics,
must also be forbidden from being reseated (i.e., they must be
references in the C++ sense or become 'T const * const' pointers).

Thanks,
Adam
 
S

sturlamolden

    That's an implementation problem with CPython.  I'm looking at this as
a language design issue.  Shed Skin, which is garbage-collected, doesn't
have that problem.  Look ahead to a new generation of Python implementations
that go fast and use multiprocessors effectively.

But in that case, the problem is reference counting, not mutable
objects. If you get rid of reference counts you get rid of the GIL. At
the same time you introduce far worse problems: Memory use will
increase (garbage pile up), long pauses while garbage are collected
(very bad for servers), deallocator methods of extension types not
called when you expect them to. Java's VM may be faster than CPython's
VM, but it runs less smooth.
 
S

sturlamolden

     Python doesn't have immutable objects as a general concept, but
it may be headed in that direction.  There was some fooling around
with an "immmutability API" associated with NumPy back in 2007, but
that was removed.  As more immutable types are added, a more general
approach may be useful.

I one did a test of NumPy's mutable arrays against Matlab's immutable
arrays on D4 wavelet transforms. On an array of 64 MB of double
precision floats, the Python/NumPy version was faster by an order of
magnitude. On the other hand, immutable arrays does make
multithreading easier. They are particularly interesting for GPGPUs
(OpenCL/CUDA) where multithreading is pervasive. Also they allow
removal of temporary arrays, which are created by NumPy's binary
operators.
 
S

Steven D'Aprano

I one did a test of NumPy's mutable arrays against Matlab's immutable
arrays on D4 wavelet transforms. On an array of 64 MB of double
precision floats, the Python/NumPy version was faster by an order of
magnitude.

Is the difference because of mutability versus immutability, or because
of C code in Numpy versus Matlab code? Are you comparing bananas and
pears?

Without knowing what the test consisted of, and what it actually
measures, this comparison is meaningless.
 
S

sturlamolden

Is the difference because of mutability versus immutability, or because
of C code in Numpy versus Matlab code? Are you comparing bananas and
pears?


It consisted of something like this


import numpy

def D4_Transform(x, s1=None, d1=None, d2=None):
C1 = 1.7320508075688772
C2 = 0.4330127018922193
C3 = -0.066987298107780702
C4 = 0.51763809020504137
C5 = 1.9318516525781364
if d1 == None:
d1 = numpy.zeros(x.size/2)
s1 = numpy.zeros(x.size/2)
d2 = numpy.zeros(x.size/2)
odd = x[1::2]
even = x[:-1:2]
d1[:] = odd[:] - C1*even[:]
s1[:-1] = even[:-1] + C2*d1[:-1] + C3*d1[1:]
s1[-1] = even[-1] + C2*d1[-1] + C3*d1[0]
d2[0] = d1[0] + s1[-1]
d2[1:] = d1[1:] + s1[:-1]
even[:] = C5 * s1[:]
odd[:] = C4 * d2[:]
if x.size > 2:
D4_Transform(even,s1[0:even.size/2],
d1[0:even.size/2],d2[0:even.size/2])


against Matlab:

function x = D4_Transform(x)
C1 = 1.7320508075688772;
C2 = 0.4330127018922193;
C3 = -0.066987298107780702;
C4 = 0.51763809020504137;
C5 = 1.9318516525781364;
s1 = zeros(ceil(size(x)/2));
d1 = zeros(ceil(size(x)/2));
d2 = zeros(ceil(size(x)/2));
odd = x(2:2:end);
even = x(1:2:end-1);
d1:)) = odd - C1*even;
s1(1:end-1) = even(1:end-1) + C2*d1(1:end-1) + C3*d1(2:end);
s1(end) = even(end) + C2*d1(end) + C3*d1(1);
d2(1) = d1(1) + s1(end);
d2(2:end) = d1(2:end) + s1(1:end-1);
x(1:2:end-1) = C5*s1;
x(2:2:end) = C4*d2;
if (length(x) > 2)
x(1:2:end-1) = D4_Transform(x(1:2:end-1));
end


I wasn't comparing bananas against pears. Mathworks informed me they
were using my code to investigate why Matlab was showing such slow-
downs. I am not sure what they found out, eventially. I have also
wondered if immutability vs. mutability was the reason, as NumPy
generates temporary arrays. But I have not found a better explanation
either. Anyhow, ~30 seconds for Matlab vs. ~3 seconds for Python is a
major difference. (After I did this test, Matlab has acquired a JIT
compiler which might change the timings. I haven't tested as I don't
have access to it.)


Sturla Molden
 
S

Steven D'Aprano

Nope, preventing mutation of the objects themselves is not enough. You
also have to forbid variables from being rebound (pointed at another
object). Consider this simple example:

---------- Thread 1 ---------- | ---------- Thread 2 ----------
a = "Foo"
spawn Thread 2
a = "Bar" print "thread 2: %s" % a
print "thread 1: %s" % a

You could see (ignoring the fact the output isn't ordered):
"thread 1: Bar"
"thread 2: Foo"
or:
"thread 1: Bar"
"thread 2: Bar"

so the fact "Foo" and "Bar" are immutable isn't enough to solve the
problem.

This is a side-effect of writing code that relies on global variables.
Global variables are generally a bad idea. Global constants are fine.


The variables themselves, since they obey pointer semantics,

What do you mean by "variables"? Do you mean names?

What are pointer semantics?
must also be forbidden from being reseated (i.e., they must be
references in the C++ sense or become 'T const * const' pointers).

Assuming you mean names must be forbidden from being rebound, no,
incorrect. It's only names which are shared between both threads which
must not be re-bound. If you have names local to the thread, you can
change them all you want without affecting any other thread.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top