Joe said:
Thanks, I'll do the same.
That's good to hear. Your arguments are sometimes pretty good, and
usually well made, but there's been far too much insistence on all sides
about being right and not enough on reaching agreement about how
Python's well-defined semantics for assignment and function calling
should best be described.
In other words, it's a classic communication problem.
Um, no, I've admitted that it's a reference all along. Indeed, that's
pretty much the whole point: that variables in Python don't contain
objects, but merely contain references to objects that are actually
stored somewhere else (i.e. on the heap). This is explicitly stated
in the Python docs [1], yet many here seem to want to deny it.
You refer to docs about the *implementation* of Python in C. This is
irrelevant.
It's supportive. I don't understand how/why anybody would deny that
Python names are references -- it's all over the place, from any
discussion of "reference counting" (necessary to understand the life
cycle of Python object) to understanding the basics of what "a = b"
does. It seems absurd to argue that Python does NOT use references. So
the official documentation calmly discussing Python references, with no
caveats about it being internal implementation detail, seemed relevant.
I must say I find it strange when people try to contradict my assertion
that Python names are references to objects, when the (no pun intended)
reference implementation of the language uses "reference counting" to
track how many assignments have been made.
Though there is an equally vociferous faction who will happily jump up
and down all day shouting "objects don't have names", a tendency I have
myself been known to indulge from time to time (but usually only when
some novitiate asks how they can find out "what the name of an object
is"). Being of the old school, I do tend to think of Python names as
being reference variables in the sense of Algol 68. Thus they are
fixed-size and frequently of limited lifetime. Since assignment (whether
by name binding or to a container element) copies the reference, and
since strong references keep objects alive, this is one way to explain
why Python doesn't suffer from C++'s dangling pointer issue.
You say "names for", I say "references to". We're saying the same thing
(though I'm saying it with terminology that is more standard, at least
in the wider OOP world).
Naughty, naughty, there's that little "I'm right, you're wrong" thing
sneaking in again. I don't want to have to get the clue stick out here ...
"Variables do not contain anything" seems to be a little extreme here.
They must store information of some sort, or no Python program could
ever produce a useful output. And while the concept of "object
reference" may not exist in the language, it is definitely valid for
implementers.
Interestingly, while "variable" isn't an indexed term in the (2.6)
documentation, "reference count" appears in the glossary and the
Language Reference Manual (again, no pun intended) explicitly states in
its discussion of Python's data model (vis a vis the exact meaning of
immutability) that container objects contain references to other
objects. It shortly thereafter mentions the reference-counting technique
of the CPython implementation, but does not claim it as part of the
language.
The same section also mentions "reference to 'external' resources such
as files or windows ..." and "references to other objects".
Interestingly it is also made explicit that "for immutable types,
operations that compute new values may return a reference to any
existing object with the same type and value, while for mutable objects
this is not allowed" (and if any reader disagrees that the reasons for
this are obvious their part in this thread was long since over).
There's even a built-in type called a "weak reference".
So any argument that the language "doesn't have the concept of object
reference (in the sense of e.g. C++ reference)" is simply stating the
obvious: that Python has no way to declare reference variables. I would
argue myself that it has no need of such a mechanism precisely because
names are object references, and I'd like to hear counter-arguments.
Consider my memory short -- I have a large dose of crotchety to go with
that it you'd like.
Both are relevant to answering simple questions, like what happens to x
in this case:
def foo(spam):
spam = 5
foo(x)
This is a basic and fundamental thing that a programmer of a language
should know. If it's call-by-reference, then x becomes 5. If it's
call-by-value, it does not.
Well that's not true either. If I remember all the way back to my
computational science degree I seem to remember being taught that there
was call by *simple reference*, which is what I understand you to mean.
Suppose I write the following on some not-quite-Python language:
lst = ['one', 'two', 'three']
index = 1
def foo(item, i):
i = 2
item = "ouch"
foo(lst[index], index)
index == 2
lst == ['one', 'two', 'ouch']
With call by simple reference, after the call I would expect the
following conditions to be true:
index == 2
lst == ['one', 'ouch', 'three']
With full call by reference, however, arguably the change to the value
of index would induce the post-conditions
index == 2
lst == ['one', 'two', 'ouch']
because the reference made by the first argument depends on the value of
a variable mutated inside the function call.
Why the resistance to these simple and basic terms that apply to any OOP
language?
Ideally I'd like to see this discussion concluded without resorting to
democratic appeals. Otherwise, after all, we should all eat shit: sixty
billion flies can't possibly be wrong.
What does "give a new name to an object" mean? I submit that it means
exactly the same thing as "assigns the name to refer to the object".
I normally internalize "x = 3" as meaning "store a reference to the
object 3 in the slot named x", and when I see "x" in an expression I
understand it to be a reference to some object, and that the value will
be used after dereferencing has taken place.
I've seen various descriptions of Python's name binding behavior in
terms of attaching Port-It notes bearing names to the objects reference
by the names, and I have never found them convincing. The reason for
this is that names live in namespaces, whereas values live in some other
universe altogether (that I normally describe as "object space" to
beginners, though this is not a term you will come across in the python
literature). So I see the Post-it as being attached to a portion of some
namespace, and that little fixed-size piece of object space being
attached by a piece of string to a specific object. Of course any object
can have many piece of string attached, and not all of them come from
names -- some of them come from container elements, for example.
There certainly is no difference in behavior that anyone has been able
to point out between what assignment does in Python, and what assignment
does in RB, VB.NET, Java, or C++ (in the context of object pointers, of
course). If the behavior is the same, why should we make up our own
unique and different terminology for it?
One reason would be that in the other languages you have other choices
as well, so you need to distinguish between them. Python is simpler, and
so I don't see us needing the terminological complexity required in the
other contexts you name, for a start. Java messed up the whole deal by
having different kinds of objects as a sacrifice to run-time speed,
thereby breeding a whole generation of programmers with little clue
about these matters, and the .NET environment also has to resort to
"boxing" and "unboxing" from time to time. I say away with comparisons
to such horrendously complex issues. One of the reasons for Python's
continue march towards world domination (allow me my fantasies) is its
consistent simplicity. Those last two words would be my candidate for
the definition of "Pythonicity".
A reference to an object, got it.
Assigning the reference to the object, yes.
Nope, storing the reference "against" the name (more exactly, in the
memory area associated with name, though I can hear hackles rising
throughout Pythonland as I type those words).
Agreed; they're not aliases of the call arguments.
They are actually names local to the function namespace, containing
references to the arguments. Some of those arguments were provided as
names, in which case the local name contains a copy of the reference
bound to the name provided as an argument. This is, however, merely a
degenerate case of the general instance, in which an expression is
provided as an argument and evaluated, yielding (a reference to) an
object which is then bound to the parameter name in the local namespace.
Well, I'm not sure why that would be. What you've just described is
called "pass by value" in every other language.
Sigh. This surely can only be true if you insist that references are
themselves values. I hold that they are not. It seems so transparent to
me that the parameters are copies of the references passed as arguments
I find it difficult to understand how, or why, anyone would
conceptualize it differently.
That seems to contradict the actual behavior, as well as what you said
yourself above. The only way I know how to interpret "an object is
passed" is "the data of that object is copied onto the stack". But of
course that's not what happens. What actually happens is what you said
above: a name (reference) is assigned to the object. The name is a
reference; it is made to refer to the same thing that the argument
(actual parameter) referred to. This is exactly what "the reference is
passed" means, nothing more or less.
OK, so above you argue quite cogently that Python uses a
reference-passing mechanism. This make you insistence in the preceding
paragraph on calling it "pass by value" a little stubborn.
In other words, make 'x' refer to this new object.
So far so good.
Yes. But what does that mean? Does the parameter within 'foo' become
an alias of x, or a copy of it? That's what we DO need to decide.
I think you mean here that local 'x' is made to refer to the object
passed to foo. Agreed. It is NOT an alias of the actual parameter.
And that's what we need to know. So it's not call-by-reference, it's
call-by-value; the value of x (a reference to whatever object Bar()
returned) is copied from the value of the parameter (a reference to that
same object, of course).
Sigh again. You appear to want to have your cake and eat it. You are, if
effect, saying "there are no values in Python, only references",
completely ignoring the fact that it is semantically impossible to have
a reference without having something to *refer to* (which we in the
Python world, in our usual sloppy way, often call "a value").
In Algol 68 terms what you are saying is "they are refs, not ref refs".
I suspect this may be at the root of our equally stubborn insistence
that calling this mechanism "pass by value" is inviting
misunderstanding. If we didn't want to eliminate misunderstanding we
would all have stopped replying to you long ago.
Make 'x' refer to the new Foo() result, yes.
Obvious only once you've determined that Python is call-by-value. If it
were like FORTRAN, where everything is call-by-reference, then that
wouldn't be the case at all; the assignment within the function would
affect the variable (or name, if you prefer) passed in.
Sorry, I'd need to be a Fortran expert to decipher that or make a
judgment on its validity, so your appeal to the masses goes out of the
window.
I have to disagree. The model is simple, self-consistent, and
consistent with all other languages. It's making up terms and trying to
justify them and get out of the logical knots (such as your claim above
that the object itself is passed to a method) that is unnecessarily
complex.
I have to disagree. The model is clearly based on a wrong-headed
interpretation of a fairly exact understanding of Python's semantics.
I'm afraid I don't see that.
Well, if by C/C++-like languages, you mean also Java, VB.NET, and so on,
then maybe you're right -- perhaps my view is colored by my experience
with those. But alternatively, perhaps there are enough Python users
without experience in other OOP languages, that the standard terminology
was unfamiliar to them, and they made up their own, resulting in the
current linguistic mess.
Well, I started with Simula and SmallTalk back in 1973, so my experience
may be a bit light. Sorry about that. This terminology wasn't made up by
Python beginners, but by the people who invented Python. I believe they
did so on the grounds that it's easier for beginners to understand
Python's semantics without having to reference too many similar in
theory but confusingly different in practice other environments.
I would even argue that your confusion supports this argument. Your
understanding of Python is perfectly adequate, so get with the program
for Pete's sake!
regards
Steve