Objects in Python

Steven D'Aprano · Aug 25, 2012

Yet Python's variables are extremely close to C's pointer variables.

Not really. Pointer variables are no different from any other variable:
you have a named memory location that contains some data. In this case,
the data happens to be a link to another chunk of memory. The pointer
variable itself is just a named location containing data, same as a char
variable, a float variable, etc. The data is a pointer rather than a char
or float, and the operations which you can do to pointers are different
to those you can do to chars or floats, but that's true of any data type.

In languages without pointers, like Fortran 77, you can more or less
simulate them with a fixed array of memory as the heap, with integer
indexes into that array as pointers. These "pointer variables" are no
different from other "integer variables" except in the meaning you, the
programmer, gives them. This is no different from how C or Pascal treat
pointers, except that those languages have syntactical support for
pointer operations and Fortran 77 doesn't.

If
you allocate all your "real data" on the heap and do everything with
pointers, you'll have semantics very similar to Python's

You're confusing two different levels of explanation here. On the one
hand, you're talking about C semantics, where you are explicitly
responsible for managing unnamed data via indirection (pointers).
Typically, the *pointers* get given names, the data does not.

On the other hand, you talk about Python, where you have no access at all
to the pointers and memory addresses. You manage the data you actually
care about by giving them names, and then leave it up to the Python
virtual machine to transparently manage whatever indirection is needed to
make it work.

The fact that the end result is the same is hardly surprising -- Python's
VM is built on top of C pointer indirection, so of course you can start
with pointers and end up with Python semantics. But the practice of
coding are very different:

* in C, I care about identifiers ("names") in order to explicitly manage
addresses and pointers as a means to reach the data I actually care about;

* in Python, I care about identifiers in order to reach the data I
actually care about.

Chris Angelico · Aug 25, 2012

You're confusing two different levels of explanation here. On the one
hand, you're talking about C semantics, where you are explicitly
responsible for managing unnamed data via indirection (pointers).
Typically, the *pointers* get given names, the data does not.

On the other hand, you talk about Python, where you have no access at all
to the pointers and memory addresses. You manage the data you actually
care about by giving them names, and then leave it up to the Python
virtual machine to transparently manage whatever indirection is needed to
make it work.
...
* in C, I care about identifiers ("names") in order to explicitly manage
addresses and pointers as a means to reach the data I actually care about;

* in Python, I care about identifiers in order to reach the data I
actually care about.

Yet the two are almost the same. Python objects don't have names, they
just have their own data. (Leaving aside functions, which have their
names as data for the benefit of tracebacks and such.) A C pointer has
a name; a Python identifier has (or is, if you like) a name. They're
very different in how you use them only because C doesn't naturally
work with everything on the heap and pointers everywhere. In fact,
when I was interfacing Python and C, there were a few places where I
actually handed objects to Python and kept manipulating them, simply
because the Python data model suited what I was trying to do; but what
I was doing was using PyObject *some_object as though it were a Python
variable. I even did up a trivial C++ class that encapsulated the
INCREF/DECREF work, so my LocalPyObject really could be treated as a
local variable, Python-style.

Where's the difference?

ChrisA

Mark Lawrence · Aug 25, 2012

I'm just wondering out aloud if the number of times this type of thread
has been debated here will fit into a Python long or float?

Chris Angelico · Aug 25, 2012

I'm just wondering out aloud if the number of times this type of thread has
been debated here will fit into a Python long or float?

Well, when I have to store currency information, I like to store it as
an integer, using the native currency's "small unit" (eg the cent in
dollar+cent currencies). In this instance, instead of trying to count
the threads (which would be fractional), just count the number of
posts. It then is an integer, and I've yet to find any integer that
can't be represented as a Python long (or, in 3.x, int).

ChrisA

Mark Lawrence · Aug 25, 2012

Well, when I have to store currency information, I like to store it as
an integer, using the native currency's "small unit" (eg the cent in
dollar+cent currencies). In this instance, instead of trying to count
the threads (which would be fractional), just count the number of
posts. It then is an integer, and I've yet to find any integer that
can't be represented as a Python long (or, in 3.x, int).

ChrisA

That could have been fun in the good old days of pounds, shillings and
pence. Why they had to complicate things by going decimal I shall never
know. Bring back simplistic imperial measures for everything, that's
what I say.

Using long just shows I've still got a Python 2 hat on. Still when
those fine people who develop Matplotlib deliver 1.2 with its Py3k
compliance, aided or hindered by me testing on Windows, Python 3.3 here
I come.

I suppose an alternative to long (or int) or float would have been the
Decimal class from the decimal module? Opinions on this anybody?

Dennis Lee Bieber · Aug 25, 2012

I'm just wondering out aloud if the number of times this type of thread
has been debated here will fit into a Python long or float?

Well, since I don't think one can have a fractional debate (maybe if
someone starts a thread and NOBODY ever follows up on it), then float's
don't gain us anything there.

Presuming a double-precision float, we would have 14-15 significant
digits for the mantissa -- so anything greater than
(9)99,999,999,999,999 will have lost accuracy. In contrast Python longs
have effectively unlimited significant digits.

Chris Angelico · Aug 26, 2012

Well, since I don't think one can have a fractional debate (maybe if
someone starts a thread and NOBODY ever follows up on it), then float's
don't gain us anything there.

Presuming a double-precision float, we would have 14-15 significant
digits for the mantissa -- so anything greater than
(9)99,999,999,999,999 will have lost accuracy. In contrast Python longs
have effectively unlimited significant digits.

I wonder if some people are applying an alternative form of duck
typing - if it quacks like a "should Python have variables" debate, it
gets silenced with that universal grey tape...

ChrisA

Dennis Lee Bieber · Aug 26, 2012

I wonder if some people are applying an alternative form of duck
typing - if it quacks like a "should Python have variables" debate, it
gets silenced with that universal grey tape...

Which all too often is mis-used...

Expose it to a season of direct sunlight and varying temperatues and
it rapidly decomposes, leaving a messy residue...*

* I used to use it to hold a dual-band rubber-duck antenna mount to the
window of my apartment... Every few months I'd end up peeling off a
cracking plastic layer, and having to clean up dried loose-weave gauze,
and scraping off the gum -- in order to re-apply a fresh layer.

Evan Driscoll · Aug 26, 2012

The fact that the end result is the same is hardly surprising -- Python's
VM is built on top of C pointer indirection, so of course you can start
with pointers and end up with Python semantics. But the practice of
coding are very different:

* in C, I care about identifiers ("names") in order to explicitly manage
addresses and pointers as a means to reach the data I actually care about;

* in Python, I care about identifiers in order to reach the data I
actually care about.

So I find this comment very interesting. It makes me wonder if the root
cause of our (pretty minor) disagreement is in some sense related to our
mental models of *C* variables. I'm actually not much of a C programmer
specifically, but I do a lot of C++ stuff. Of those two descriptions,
I'd actually say that the Python description sounds more like how I
think about variables in C++ most of the time.

Obviously there are differences between value and reference semantics
between the two languages, but thinking about some variable being
located at some address in memory is something that I actually do pretty
rarely; I basically think of variables as naming data, and addresses
mostly come into play when thinking about points-to and aliasing
information at a more abstract level, much the same as I do in Python.

Evan

Evan Driscoll · Aug 26, 2012

No. The compiler remembers the address of 'a' by keeping notes about it
somewhere in memory during the compilation process. When you run the
compiled program, there is no longer any reference to the name 'a'.

...

The mapping of name:address is part of the *compilation* process -- the
compiler knows that variable 'x' corresponds to location 12345678, but
the compiled code has no concept of anything called 'x'. It only knows
about locations. The source code 'x = 42' is compiled into something like
'store 42 into location 12345678'. (Locations may be absolute or
relative.)

In languages with name bindings, the compiler doesn't need to track
name:address pairs. The compiled application knows about names, but not
about addresses. The source code 'x = 42' is compiled into something like
'store 42 into the namespace using key "x"'.

What you describe is sorta correct, but it's also not... you're
describing implementations rather than the language. And while the
language semantics certainly impose restrictions on the implementation,
I think in this case the situation is closer than you acknowledge:

From the Python side, I suspect that for most functions, you'd be able
to create a Python implementation that behaves more like C, and
allocates locals in a more traditional fashion. I don't know much about
it, but I'd guess that PyPy already does something along this line;
someone also mentioned that Cython (admittedly not a full-blown Python
implementation, but close for the purpose of this question) tries to do
the same thing.

On the C side, imagine a function with locals x, y, and z which never
takes the address of any of them. (You said later that "Just because the
public interface of the language doesn't give you any way to view the
fixed locations of variables, doesn't mean that variables cease to have
fixed locations.")

First, C variables may not even have a memory address. They can
disappear completely during compilation, or live in a register for their
entire life.

Second, it's possible that those variables *don't* occupy a fixed
location. If you never explicitly take an address of a variable (&x),
then I can't think of any way that the address can be observed without
invoking undefined behavior -- and this means the C compiler is free to
transform it to anything that is equivalent under the C semantics. In
particular, it can split uses of a variable into multiple ones if there
are disjoint live ranges. For instance, in:
x = 5
print x
x = 10
print x
there are two live ranges of x, one consisting of lines 1 and 2, and one
consisting of lines 3 and 4. These live ranges could have been different
variables; I could just of easily have written
x = 5
print x
y = 10
print y
and these pieces of code are observationally equivalent, so the compiler
is allowed to generate the same code for both. In particular, it could
either compile the second example to share the same memory address for x
and y (meaning that a memory address isn't uniquely named by a single
variable) or it could compile the first to put the two live ranges of x
into different memory addresses (meaning that a variable doesn't
uniquely name a memory address). In fact, I'd *expect* an optimizing
compiler to share memory for x and y, and I'd also expect to be able to
concoct an example where different live ranges of one variable wind up
at different addresses. (The latter I'm less sure of though, and I also
expect it'd be a little hard, as you'd have to come up with an example
where even at the high optimization levels you'd need to see that, both
live ranges would wind up in memory.)

Third, and more wackily, you could technically create a C implementation
that works like Python, where it stores variables (whose addresses
aren't taken) in a dict keyed by name, and generates code that on a
variable access looks up the value by accessing that dict using the name
of the variable.

Evan

88888 Dihedral · Aug 26, 2012

Jan Kuikenæ–¼ 2012å¹´8æœˆ24æ—¥æ˜ŸæœŸäº”UTC+8ä¸Šåˆ2æ™‚02åˆ†00ç§’å¯«é“ï¼š

Sometimes you don't want only six variables called 'q' but a hundred

of them

def fac(q):

if q < 1 :

return 1

else:

return q * fac(q-1)

print(fac(100))

Jan Kuiken

The long integer arithmetic operations are built in.
This makes mathematicians and designers focused on the theory side.

Chris Angelico · Aug 26, 2012

Third, and more wackily, you could technically create a C implementation
that works like Python, where it stores variables (whose addresses aren't
taken) in a dict keyed by name, and generates code that on a variable access
looks up the value by accessing that dict using the name of the variable.

That would be a reasonable way to build a C interactive interpreter.

ChrisA

Steven D'Aprano · Aug 26, 2012

That would be a reasonable way to build a C interactive interpreter.

No it wouldn't. Without fixed addresses, the language wouldn't be able to
implement pointers. C without pointers isn't C, it is something else.
Possibly called Python

I suppose you could get pointers in Namespace-C if you somehow mapped
names to addresses, and vice versa, but why would you do that? You end up
with a hybrid system that doesn't give you any advantage over C but has a
much more complicated implementation (and therefore many more new and
exciting bugs).

But if you want me to agree that you could implement C using name
binding, plus some weird scheme to track memory addresses, then yes, I
suppose you could. Then the parts of C that don't rely on fixed memory
addresses could use the name bindings (with the corresponding loss of
performance), and the parts of C which do require them could continue to
do so, and we'll have one more language with a confusing, unclear and
unclean execution model. Everybody wins! For some definition of win.

Chris Angelico · Aug 26, 2012

No it wouldn't. Without fixed addresses, the language wouldn't be able to
implement pointers. C without pointers isn't C, it is something else.
Possibly called Python

The insertion of a single rule will do it. Let it stand that &x is the
string "x" and there you are, out of your difficulty at once!

Okay, that may be a bit of a fairy tale ending and completely illogical.

ChrisA

Roy Smith · Aug 26, 2012

Chris Angelico said:
That would be a reasonable way to build a C interactive interpreter.

Except that lots of C and C++ programs assume they know how data
structures are laid out and can index forward and backward over them in
ways which the language does not promise work (but are, none the less,
useful). Say, the sort of thinks you might use python's struct module
for.

On the other hand, there is certainly a big subset of C that you could
implement that way. But it would only be useful as a simple
instructional tool.

Steven D'Aprano · Aug 26, 2012

What you describe is sorta correct, but it's also not... you're
describing implementations rather than the language. And while the
language semantics certainly impose restrictions on the implementation,

I accept that languages may choose to leave the variable-model
unspecified. I don't think they can define behaviour without implying one
model or the other. Or at least not easily - far too much language
behaviour is tied to the implementation to let us say "it's only
implementation".

For example, the reason that locals() is not writable inside Python
functions is because CPython moves away from the name binding model
inside functions as an optimization. This function prints 1 under both
CPython and Jython (but not IronPython):

def spam():
x = 1
locals()['x'] = 2
print(x)

Binding to the local namespace does not work, because functions don't
*actually* use a namespace, they use something closer to the C model. So
the two models are not interchangable and hence they aren't *just*
implementation details, they actually do affect the semantics of the
language.

I suppose you could arrange for locals() to return a proxy dictionary
which knew about the locations of variables. But what happens if you
returned that proxy to the caller, which then assigned to it later after
the function variables no longer existed?

Similarly, there are operations which are no longer allowed simply
because of the difference between name binding and locational variables:

py> def ham():
.... from math import *
....
File "<stdin>", line 1
SyntaxError: import * only allowed at module level

(In some older versions of Python, wildcard imports are allowed, and the
function then falls back on a namespace instead of fixed locations. That
is no longer the case in Python 3.2 at least.)

I think in this case the situation is closer than you acknowledge:

From the Python side, I suspect that for most functions, you'd be able
to create a Python implementation that behaves more like C, and
allocates locals in a more traditional fashion.

As I discuss above, CPython and Jython actually do something like that
inside functions. And there are observable differences in behaviour (not
just performance) between function scope and global scope.

So an implementation of Python which used fixed memory addresses
everywhere, not just in functions, would be detectably different in
behaviour than CPython. Whether those differences would be enough to
disqualify it from being called "Python" is a matter of opinion.

(Probably Guido's opinion is the only one that matters.)

[...]

On the C side, imagine a function with locals x, y, and z which never
takes the address of any of them. (You said later that "Just because the
public interface of the language doesn't give you any way to view the
fixed locations of variables, doesn't mean that variables cease to have
fixed locations.")

First, C variables may not even have a memory address. They can
disappear completely during compilation, or live in a register for their
entire life.

Variables that don't exist at runtime don't have an address at all -- in
a way, they aren't even a variable any more. They have a name in the
source code, but that's all.

As for registers, they are memory addresses, of a sort. (I didn't mean to
imply that they must live in main motherboard memory.) I call any of
these an address:

- in the heap at address 12345678
- in the GPU's video memory at address 45678
- 12th entry from the top of the stack
- register 4

Second, it's possible that those variables *don't* occupy a fixed
location. If you never explicitly take an address of a variable (&x),
then I can't think of any way that the address can be observed without
invoking undefined behavior -- and this means the C compiler is free to
transform it to anything that is equivalent under the C semantics.

I may have been too emphatic about the "fixed" part. A sufficiently
clever compiler may implement its own memory manager (on top of the
operating system's memory manager?) and relocate variables during their
lifetime. But for my purposes, the important factor is that the compiler
knows the address at every moment, even if that address changes from time
to time.

In contrast, a name binding system *doesn't* know the address of a
variable. The analogy I like is making a delivery to a hotel room. C-like
languages say:

"Deliver this package to room 1234."

Pointer semantics are like:

"Go to room 1234 and collect an envelope; deliver this package to the
room number inside the envelope."

On the other hand, name binding languages say:

"Go to the concierge at the front desk and ask for Mr Smith's room, wait
until he looks it up in the register, then deliver this package to the
room number he tells you."

Typically, you don't even have any way to store the room number for later
use. In Python, name lookups involve calculating a hash and searching a
dict. Once you've looked up a name once, there is no way to access the
hash table index to bypass that process for future lookups.

It gets worse: Python has multiple namespaces that are searched.

"Go to the Excelsior Hotel and ask the concierge for Mr Smith. If Mr
Smith isn't staying there, go across the road to the Windsor Hotel and
ask there. If he's not there, try the Waldorf Astoria, and if he's not
there, try the Hyperion."

Considering just how much work Python has to do to simply access a named
variable, it's amazing how slow it isn't.

Chris Angelico · Aug 26, 2012

It gets worse: Python has multiple namespaces that are searched.

"Go to the Excelsior Hotel and ask the concierge for Mr Smith. If Mr
Smith isn't staying there, go across the road to the Windsor Hotel and
ask there. If he's not there, try the Waldorf Astoria, and if he's not
there, try the Hyperion."

Does it? I thought the difference between function-scope and
module-scope was compiled in, and everything else boils down to one of
those. Explicit dot notation is different ("ask for Mr Smith, then ask
him where his packages box is, and put this in the box").

Hmm, okay, there's something slightly different with closures. But
it's still unambiguous at compile time.
return lambda z: x+y+z
2 0 LOAD_GLOBAL 0 (x)
3 LOAD_DEREF 0 (y)
6 BINARY_ADD
7 LOAD_FAST 0 (z)
10 BINARY_ADD
11 RETURN_VALUE

What multiple namespaces are you talking about, where things have to
get looked up at run time?

ChrisA

Roy Smith · Aug 26, 2012

[/QUOTE]

Just to pick a nit, the compiler probably doesn't know that, but the
linker does (or maybe even the run-time loader). However, we can think
of all of those as just part of the compilation tool chain, and then
we're good.

Mark Lawrence · Aug 26, 2012

Okay, that may be a bit of a fairy tale ending and completely illogical.

ChrisA

Then stick to the thread about flexible string representation, unicode
and typography

Chris Angelico · Aug 26, 2012

Then stick to the thread about flexible string representation, unicode and
typography

Hehe. Probably nobody on this list will recognize what I said, but
it's a near-quote from "Iolanthe", an opera about fairies. It's the
great denoumont, the solution to everyone's problems. And in the same
way, redefining the "take-address-of" operator could be a perfect
solution... and, just like in Iolanthe, is a rather fundamental
change, and one that would break a lot of things.

ChrisA

Processing in Python help	0	Aug 31, 2022
First steps in setting up VSCode to work with Python.	2	Mar 13, 2023
Python profiler usage with objects	6	Jun 29, 2010
Parallel python in the cloud	1	May 24, 2014
ANN: eGenix mxODBC Connect 2.1.0 - Python ODBC Database Interface	0	May 28, 2014
python destructor	1	Nov 5, 2012
Class decorator to capture the creation and deletion of objects	0	Feb 25, 2014
How can I upload a tar.bz2 file to OpenStack swift object storage container using the Python swift client?	1	Mar 22, 2024

Objects in Python

Steven D'Aprano

Chris Angelico

Mark Lawrence

Chris Angelico

Mark Lawrence

Dennis Lee Bieber

Chris Angelico

Dennis Lee Bieber

Evan Driscoll

Evan Driscoll

88888 Dihedral

Chris Angelico

Steven D'Aprano

Chris Angelico

Roy Smith

Steven D'Aprano

Chris Angelico

Roy Smith

Mark Lawrence

Chris Angelico

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads