The future of Python immutability

S

Steven D'Aprano

It consisted of something like this


Your code does a lot of unnecessary work if you're just trying to
demonstrate immutability is faster or slower than mutability. A simple
test that just adds one to each element would have much less overhead.

In any case, there's no doubt that immutable objects require extra time
to create compared to re-using an existing mutable object, and that time
is probably O(N) (until you approach the limits of available contiguous
memory, in which case you could see O(N**2) or worse behaviour). But an
order of magnitude difference?

I wasn't comparing bananas against pears. Mathworks informed me they
were using my code to investigate why Matlab was showing such slow-
downs. I am not sure what they found out, eventially. I have also
wondered if immutability vs. mutability was the reason, as NumPy
generates temporary arrays. But I have not found a better explanation
either. Anyhow, ~30 seconds for Matlab vs. ~3 seconds for Python is a
major difference.

How does Matlab speed compare to Python in general? Ruby, for example, is
an order of magnitude slower than Python (at least it was last time I
looked), not because of immutable arrays, but just because of the speed
of the language.
 
S

sturlamolden

Your code does a lot of unnecessary work if you're just trying to
demonstrate immutability is faster or slower than mutability.

No ... I was trying to compute D4 wavelet transforms. I wanted to see
how NumPy compared with Matlab.

How does Matlab speed compare to Python in general?

Matlab is an example of a language that only has immutable types.
While it works well if you only play with small arrays, it is horrible
for complex data structures.

Also consider that appending to an array in Matlab always becomes O
(n**2), as the array cannot mutate like Python lists. Thus it is
impossible to amortize to linear complexity.
 
S

sturlamolden

How do you know?

After more than 10 years experience with scientific programming I just
do. When it comes to numerics I have a gut feeling for what is fast
and what is slow.

It's not difficult actually. You just have to imagine how often the
interpreter is invoked. Explicit loops are therefore evilness in
numerical scripts. Vectorized array expressions tend to be comparable
to C in performance, because they do a lot of work independent of the
interpreter. What do we see here? Vectorized array expressions
throughout the code, no significant amount of looping by the
interpreter (just the recursion at the end).
 
A

Adam Skutt

This is a side-effect of writing code that relies on global variables.
Global variables are generally a bad idea. Global constants are fine.

Nope, the variables don't have to be global to have this problem, they
just have to be shared:
7

Passing any lambda between threads will cause the problem I described
above, regardless of whether the variables captured are global or
not.
What do you mean by "variables"? Do you mean names?
In the case of python I mean the name and the value, since all
variables in Python are pointers. (Worrying about the difference
though, is semantics)
What are pointer semantics?
Assignment to the variable causes it to point to another object (as
opposed to assigning a new value to the current object, like a C++
reference) and copying the variable does not create a copy of the
referred object (which necessarily implies their lifecycles are
independent).
Assuming you mean names must be forbidden from being rebound, no,
incorrect. It's only names which are shared between both threads which
must not be re-bound. If you have names local to the thread, you can
change them all you want without affecting any other thread.
What does it matter, seeing as Python lacks the ability altogether?

Thanks,
Adam
 
T

Terry Reedy

Adam Skutt wrote:
\
Nope, the variables don't have to be global to have this problem, they
just have to be shared:

This is a pointless replacement for 'def b(x): return x+a'
7

Passing any lambda

Python does not have lambda objects. It has lambda expressions that
produce function objects identical except for .__name__ to the
equivalent def statement output.

tjr
 
A

Adam Skutt

This is a pointless replacement for 'def b(x): return x+a'

And? That has nothing to do with anything I was saying whatsoever.
Point is: any mutable shared state is bad, and making objects
immutable isn't enough to remove all shared state, or even reduce it
to be available only with TEH EBIL global variables.
Python does not have lambda objects. It has lambda expressions that
produce function objects identical except for .__name__ to the
equivalent def statement output.
Sure sounds like python has lambda objects to me then... the fact
they're a special case of some more general construct is mostly
semantics, /especially/ in the context of the point I was actually
making, no?
 
S

Steven D'Aprano

Sure sounds like python has lambda objects to me then... the fact
they're a special case of some more general construct is mostly
semantics, /especially/ in the context of the point I was actually
making, no?

No. Lambdas are a *syntactical construct*, not an object. You wouldn't
talk about "while objects" and "if objects" and "comment objects"
*because they're not objects*. Neither are lambdas -- they're syntax,
which creates ordinary functions:
.... return x
....True


Functions created with def and functions created with lambda are
*precisely* the same type of object. There is no such thing as a "lambda
object" which is a "special case" of ordinary functions, there are just
functions.
 
S

Steven D'Aprano

Nope, the variables don't have to be global to have this problem, they
just have to be shared:

Yes, you're right, my bad. Globals are shared, but not all shared
variables are global.

In the case of python I mean the name and the value, since all variables
in Python are pointers.

Let me see if I understand you...

You say that by "variable" you mean the name and the value.

Then you say that all variables are pointers.

In other words... "all names and values in Python are pointers".

Either you're not explaining yourself correctly, or you're badly confused
about Python. In Python, names are keys in namespaces, and values are
objects. Possibly some Python implementations use pointers in some way to
implement namespaces, or objects, but Python the language doesn't have
pointers, and the Python virtual machine that executes Python byte code
doesn't have pointers either.


(Worrying about the difference though, is semantics)

Semantics are important. If we can't agree on what things mean, how can
we possibly communicate?


Assignment to the variable causes it to point to another object (as
opposed to assigning a new value to the current object, like a C++
reference) and copying the variable does not create a copy of the
referred object (which necessarily implies their lifecycles are
independent).

I can *guess* what you mean by all that, but it's just a guess. You say
"assignment to the variable", but earlier you said that to you, variables
are names and values, so I'm left wondering if you mean assignment to the
name, assignment to the value, or both. Likewise for copying the variable.

Here is my guess... you're pointing out the similarities between Python
name binding and C pointers:

* when you bind a name to a new object, you cause the name be associated
with the new object rather than mutating the existing object to become
equal to the new object;

* assigning two names to a single object causes both names to be
associated with the same object, rather than each name being associated
with independent copies of the object;

while ignoring the differences between Python name binding and C pointers:

* Python names are keys, not numeric addresses, hence you can't perform
anything like pointer arithmetic on them;

* objects in Python have no idea what, if any, names are associated with
them, unlike C pointers, where you can always ask for the address of a
variable.

Okay, I accept that if you focus only on the similarities and completely
ignore the differences, Python name binding is just like pointer
semantics.


What does it matter, seeing as Python lacks the ability altogether?

I don't understand what you mean. Python lacks the ability to do what
altogether?

If you mean that Python threads lack the ability to have local variables,
that's not true at all.
 
T

Terry Reedy

Adam said:
And? That has nothing to do with anything I was saying whatsoever.

Agreed. However, posts are read by newbies.
Posts that promote bad habits are fair game for comment.
Sure sounds like python has lambda objects to me then...

The idea that Python has 'lambda objects' had caused no end of mischief
over the years.

tjr
 
J

John Nagle

Right. What's needed for safe concurrency without global locking looks
something like this:

Object categories:
Immutable
Mutable
Synchronized
Unsynchronized
Owned by a thread.
Owned by synchronized object

"Synchronized" objects would be created with something like

class foo(synchronized) :
pass

Only one thread can be active within a synchronized object, as in Java.
So there's implicit locking at entry, unlocking at exit, and temporary
unlocking when the thread is blocked on a lock.
External access to non-function members of synchronized objects has to be
prohibited, since that would allow race conditions.

Everything else can be handled implicitly, without declarations or
annotation.

Here's the big problem:

class foo(synchronized) :
def __init__(self) :
self.items = []
def putitem(self,item) :
self.items.append(item) # adds item to object's list
def getitem(self,item) :
return(self.items.pop()) # removes item

def test()
words = ["hello","there"] # a mutable object
sobj = foo() # a synchronized object
sobj.putitem(words) # add to object
words[0] = "goodbye" # ERROR - no longer can access


The concept here is that objects have an "owner", which is either
a thread or some synchronized object. Locking is at the "owner"
level. This is simple until "ownership" needs to be transferred.
Can this be made to work in a Pythonic way, without explicit
syntax?

What we want to happen, somehow, is to
transfer the ownership of "words" from the calling thread to the object
in "putitem", and transfer it to the calling thread in "getitem".
How can this be done?

If ownership by a synchronized object has "priority" over ownership
by a thread, it works. When "putitem" above does the "append",
the instance of "foo" becomes the owner of "item". In general,
when a reference to an object is created, and the reference
is from an object owned by a synchronized object, the object
being referenced has to undergo an ownership change.

Once "putitem" has returned, after the ownership change,
it is an error (a "sharing violation?") for the calling thread to
access the object. That seems weird, but the alternative is
some explicit way of swapping ownership.

What's wrong with this? It takes two reference counts and a pointer
for each mutable object, which is a memory cost. Worse, when a
collection of unsynchronized mutable objects is passed in this manner,
all the elements in the collection have to undergo an ownership change.
That's expensive. (If you're passing trees around, it's better if the
tree root is synchronized. Then the tree root owns the tree, and
the tree can be passed around or even shared, with locking controlled
by the root.)

A compiler smart enough to notice when a variable goes "dead" can
optimize out some of the checking.

None of this affects single-thread programs at all. This is purely
needed for safe, efficient concurrency.

It's kind of a pain, but if servers are going to have tens or hundreds
of CPUs in future, we're going to have to get serious about concurrency.

John Nagle
 
A

Adam Skutt

On Sep 5, 7:38 pm, Steven D'Aprano <st...@REMOVE-
No. Lambdas are a *syntactical construct*, not an object. You wouldn't
talk about "while objects" and "if objects" and "comment objects"
*because they're not objects*.
This rhetoric precludes functions objects as well and is entirely non-
compelling.
Functions created with def and functions created with lambda are
*precisely* the same type of object.
Which means you have lambda objects. The fact they're same as any
other function is irrelevant and not especially interesting.
There is no such thing as a "lambda
object" which is a "special case" of ordinary functions, there are just
functions.
Hey, I was just trying to resolve tjr's view, he seemed to think
that .__name__ is different is pretty important, and he's the one you
should take your objections up with, not me.
 
A

Adam Skutt

Agreed.  However, posts are read by newbies.
Posts that promote bad habits are fair game for comment.
There's nothing inappropriate about using a lambda for a function I
don't care to give a name. That's the entire reason they exist.
The idea that Python has 'lambda objects' had caused no end of mischief
over the years.
As near as I can tell, this is because you're insisting on creating a
semantic distinction where there just isn't one.

Adam
 
B

Bearophile

John Nagle:
The concept here is that objects have an "owner", which is either
a thread or some synchronized object.   Locking is at the "owner"
level.  This is simple until "ownership" needs to be transferred.
Can this be made to work in a Pythonic way, without explicit
syntax?

What we want to happen, somehow, is to
transfer the ownership of "words" from the calling thread to the object
in "putitem", and transfer it to the calling thread in "getitem".
How can this be done?

There are several people that have found ways to implement such owning
of variables, for example Bartosz Milewski:

http://bartoszmilewski.wordpress.com/2009/06/02/race-free-multithreading-ownership/

It requires some extra complexities, so statically typed languages
usually don't implement this idea, even if it avoids bugs in user
code.
Implementing it in Python in a simple enough way looks like a good
topic for advanced research :)

Bye,
bearophile
 
T

Terry Reedy

Adam said:
There's nothing inappropriate about using a lambda for a function I
don't care to give a name. That's the entire reason they exist.

But you did give a name -- 'b' -- and that is when a lambda expression
is inappropriate and when a def statement should be used instead
As near as I can tell, this is because you're insisting on creating a
semantic distinction where there just isn't one.

To the contrary, I am objecting to the spurious distinction 'lambda
object' as people often use it.

tjr
 
J

John Nagle

Bearophile said:
John Nagle:

There are several people that have found ways to implement such owning
of variables, for example Bartosz Milewski:

http://bartoszmilewski.wordpress.com/2009/06/02/race-free-multithreading-ownership/

It requires some extra complexities, so statically typed languages
usually don't implement this idea, even if it avoids bugs in user
code.
Implementing it in Python in a simple enough way looks like a good
topic for advanced research :)

I hadn't seen that implementation for D. That's an interesting idea.
I'm certainly not the first to go down this road. But nobody seems to have
thought hard about this for Python yet. I looked at this years ago for C++,
but you get bogged down in the same problems that keep C++ "smart pointer"
implementations from being airtight.

In a static language like D, it takes a lot of declarations to make this
go. In Python, more of the ownership decisions have to be implicit, to keep
to a declaration-free Pythonic style.

That article has a policy that "every object has one owner, for life".
That allows compile-time checking, but it's very limiting. You can't,
for example, pass an array into a member function of a synchronized object
as a parameter and have the object keep a reference to it. (That's what
"Queue" does. for example.) Milewski would have you make a copy in
that situation. I felt that was too restrictive, but if you make
that restriction, it cuts down on overhead and simplifies the implementation.
The toughest part of this approach is doing a handoff from one owner to another.

Python has the advantage that a sizable fraction of its objects, especially
the very common ones like numbers and strings, are immutable. Immutable objects
can be shared without problems (other than storage management, but there are
ways to handle that.) So the programmer doesn't have to obsess over "who owns
a string" when a string parameter is passed to a function. So at least we
don't have to sweat the ownership issue for little stuff.

It's worth looking at this for Python because CPython's current thread model
behaves terribly on multiprocessors. We really do need something better.

John Nagle
 
S

Steven D'Aprano

On Sep 5, 7:38 pm, Steven D'Aprano <st...@REMOVE-
This rhetoric precludes functions objects as well and is entirely non-
compelling.

Functions ARE objects in Python. They even inherit from object:
.... return None
....True


Just because there is syntax for creating functions (at least two
different syntax forms actually) doesn't "preclude function objects".
There is syntax for dicts, and dict objects; syntax for lists, and list
objects; syntax for strings, and string objects. But there's syntax for
while loops, and no such thing as a while object.

Lambda expressions are syntax for creating function objects. That's all
there is to it, end of story.

Which means you have lambda objects. The fact they're same as any other
function is irrelevant and not especially interesting.

They're *function* objects, not "lambda" objects:
<type 'function'>



Hey, I was just trying to resolve tjr's view, he seemed to think that
.__name__ is different is pretty important, and he's the one you should
take your objections up with, not me.

Nice try but no. It was YOU, not Terry, claiming that lambda's are a
special kind of object different from ordinary functions.
 
S

Steven D'Aprano

Adam Skutt wrote:

But you did give a name -- 'b' -- and that is when a lambda expression
is inappropriate and when a def statement should be used instead


I think that's too strong a claim. Functions are first class objects, and
there no reason why you can't do this:

def f():
return None

g = f

So what's wrong with doing this?

g = lambda: None


To the contrary, I am objecting to the spurious distinction 'lambda
object' as people often use it.


Agreed.
 
D

Dennis Lee Bieber

Python has the advantage that a sizable fraction of its objects, especially
the very common ones like numbers and strings, are immutable. Immutable objects

We must have different ideas of "sizable"

Numbers, strings, and tuples are immutable...
Lists, dictionaries, and pretty much anything else (functions, class
instances, etc.) are mutable in one way or another... I'd say the
mutables are in the majority <G>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top