Class-level variables - a scoping issue

J

John Nagle

Here's an obscure bit of Python semantics which
is close to being a bug:
.... classvar = 1
....
.... def fn1(self) :
.... print("fn1: classvar = %d" % (self.classvar,))
.... self.classvar = 2
.... print("fn1: classvar = %d" % (self.classvar,))
....
....fn1: classvar = 1
fn1: classvar = 2
fn1: classvar = 2
fn1: classvar = 2
fn1: classvar = 1
fn1: classvar = 2

Notice what happened here. Within "fn1", the first
reference to "self.classvar" references the class-level
version of "classvar". The assignment overrides that
and creates an object-level instance of "self.classvar".
Further references to "self.classvar" in f1 then reference the
object-level "classvar"

Creating another instance of t makes it clear that the
class-level variable never changes. To change it, it
has to be referenced as "t.classvar".

Python protects global variables from similar confusion
by making them read-only when referenced from an inner scope
without a "global" statement. But that protection isn't
applied to class-level variables referenced through 'self'.
Perhaps it should be.

John Nagle
 
S

Steven D'Aprano

Here's an obscure bit of Python semantics which is close to being a bug:

"Obscure"? It's possibly the most fundamental aspect of Python's object
model. Setting instance.attr assigns to the instance attribute, creating
it if it doesn't exist. Getting instance.attr retrieves the instance
attribute, or the class attribute, or a superclass attribute, whichever
is found first.

As for it being a bug, or close to being a bug, can you tell me which
specified behaviour it fails to match?


[...]
Notice what happened here. Within "fn1", the first
reference to "self.classvar" references the class-level version of
"classvar". The assignment overrides that and creates an object-level
instance of "self.classvar". Further references to "self.classvar" in f1
then reference the object-level "classvar"

I'm sorry, I don't quite understand what you mean by "object-level".
They're *all* objects. The ints 1 and 2 are objects. The instances t1 and
t2 are objects. The class t is an object. The global namespace is an
object. Built-ins are objects. Which of these plethora of objects do you
mean by "object-level"?

I'm going to take a guess that you probably mean to distinguish between
class attributes and instance attributes. From the context, that seems
likely. Am I right?


Creating another instance of t makes it clear that the
class-level variable never changes. To change it, it has to be
referenced as "t.classvar".
Yes.


Python protects global variables from similar confusion
by making them read-only when referenced from an inner scope without a
"global" statement. But that protection isn't applied to class-level
variables referenced through 'self'. Perhaps it should be.

The protection to class attributes is applied. When writing to self.attr,
you can't accidentally change a class attribute ("global to the class").
To change it for all instances of the class, you have to specifically
assign to class.attr instead.

I maintain that the most common use for class attributes (other than
methods, of course) is for default values which are over-ridden at the
instance level:

class Page(object):
unit = 'cm'
width = 20.9
height = 29.7
direction = 'portrait'

p1 = Page()
p2 = Page()
p2.direction = 'landscape'


it would be disastrous to have every Page instance change to landscape
just because you changed one, and it would be inconvenient to need
special syntax to allow writing to the instance attribute just because a
class attribute exists. The existing behaviour is, in my opinion, ideal.
 
J

Jonathan Gardner

    Here's an obscure bit of Python semantics which
is close to being a bug:

 >>> class t(object) :
...     classvar = 1
...
...     def fn1(self) :
...         print("fn1: classvar = %d" % (self.classvar,))
...         self.classvar = 2
...         print("fn1: classvar = %d" % (self.classvar,))
...
...

You don't quite understand the intricacies of Pythonese yet. It's
really simple, which isn't a common feature of most programming
languages.

class T(object):
classvar = 1

means: Create a class which has an attribute "classvar" initialized to
1. Assign that class to "T".

"self.classvar" means: Lookup the value for 'classvar' in 'self',
visiting its class and parent classes if you can't find it.

"self.classvar = 2" means: Create or replace the attribute 'classvar'
of 'self' with 2.

Notice how the two meanings are quite different from each other?

Fetching an attribute means something different than assigning to it.
Python doesn't remember where the attribute came from when assigning---
it simply assigns at the top level.

If you wanted to, instead, change T's classvar, then you need to be
explicit about it and write "T.classvar = 2".
 
L

Lawrence D'Oliveiro

Within "fn1", the first reference to "self.classvar" references the class-
level version of "classvar". The assignment overrides that and creates an
object-level instance of "self.classvar". Further references to
self.classvar" in f1 then reference the object-level "classvar"

I’d say there is definitely an inconsistency with the absoutely anal
restrictions on global variables, i.e. the well-known sequence

... reference to globalvar ...
globalvar = newvalue

which triggers the error “UnboundLocalError: local variable 'globalvar'
referenced before assignmentâ€.

It seems to me the same principle, that of disallowing implicit overriding
of a name from an outer scope with that from an inner one after the former
has already been referenced, should be applied here as well.
 
C

Chris Torek

Here's an obscure bit of Python semantics which
is close to being a bug:

[assigning to instance of class creates an attribute within
the instance, thus obscuring the class-level version of the
attribute]

This is sort of a feature, but one I have been reluctant to use:
you can define "default values" for instances within the class,
and only write instance-specific values into instances as needed.
This would save space in various cases, for instance.
Python protects global variables from similar confusion
by making them read-only when referenced from an inner scope
without a "global" statement. But that protection isn't
applied to class-level variables referenced through 'self'.
Perhaps it should be.

It's not really clear to me how one would distinguish between
"accidental" and "deliberate" creation of these variables,
syntactically speaking.

If you want direct, guaranteed access to the class-specific variable,
using __class__ is perhaps the Best Way right now:
... x = 42
... def __init__(self): pass
...
One could borrow the "nonlocal" keyword to mean "I know that
there is potential confusion here between instance-specific
attribute and class-level attribute", but the implication seems
backwards:

nonlocal self.foo

implies that you want self.foo to be shorthand for self.__class__.foo,
not that you know that self.__class__.foo exists but you *don't*
want to use that.

If Python had explicit local variable declarations, then:

local self.foo

would be closer to the implied semantics here.

As it is, I think Python gets this pretty much right, and if you
think this is more a bug than a feature, you can always insert
assert statements in key locations, e.g.:

assert 'foo' not in inst.__class__.__dict__, \
'overwriting class var "foo"'

(you can even make that a function using introspection, although
it could get pretty hairy).
 
J

John Nagle

I’d say there is definitely an inconsistency with the absoutely anal
restrictions on global variables, i.e. the well-known sequence

... reference to globalvar ...
globalvar = newvalue

which triggers the error “UnboundLocalError: local variable 'globalvar'
referenced before assignmentâ€.

It seems to me the same principle, that of disallowing implicit overriding
of a name from an outer scope with that from an inner one after the former
has already been referenced, should be applied here as well.

Right. That's what I'm getting at.

I understand how the current semantics fall out of the obvious
implementation. But I don't see those semantics as particularly
desirable. The obvious semantics for globals are similar, but
that case is so error-prone that it was made an error.

(If you want default values for an instance, you define them
in __init__, not as class-level attributes.)

John Nagle
 
J

John Posner

I understand how the current semantics fall out of the obvious
implementation. But I don't see those semantics as particularly
desirable. The obvious semantics for globals are similar, but
that case is so error-prone that it was made an error.

Nicely stated.
(If you want default values for an instance, you define them
in __init__, not as class-level attributes.)

Since it's unlikely that the language will change, perhaps a naming
convention would help. I'm not sure I like this myself, but ...

Class attributes are often used as "class constants", so how about
naming them with UPPERCASE names, like other constants? When you choose
to override one of these constants, like this:

self.EGGS = 4

.... the awkward looks of the statement serve as a hint that something
special is happening.

#------------
class SpamMeal:
EGGS = 2

def __init__(self, egg_count=None):
if egg_count:
self.EGGS = egg_count

def Report(self):
print "This meal includes %d eggs." % self.EGGS

meal = SpamMeal()
meal.Report() # "This meal includes 2 eggs."

meal = SpamMeal(3)
meal.Report() # "This meal includes 3 eggs."

meal = SpamMeal()
meal.EGGS = 4
meal.Report() # "This meal includes 4 eggs."
#------------

-John Posner
 
S

Steven D'Aprano

Class attributes are often used as "class constants", so how about
naming them with UPPERCASE names, like other constants? When you choose
to override one of these constants, like this:

self.EGGS = 4

Er what?

If self.EGGS is meant as a constant, you shouldn't be re-assigning to it
*ever*.

... the awkward looks of the statement serve as a hint that something
special is happening.

I disagree that anything special is happening. The current behaviour is,
to my mind, the correct behaviour.

#------------
class SpamMeal:
EGGS = 2

def __init__(self, egg_count=None):
if egg_count:
self.EGGS = egg_count

def Report(self):
print "This meal includes %d eggs." % self.EGGS

meal = SpamMeal()
meal.Report() # "This meal includes 2 eggs."

meal = SpamMeal(3)
meal.Report() # "This meal includes 3 eggs."

meal = SpamMeal()
meal.EGGS = 4
meal.Report() # "This meal includes 4 eggs."
#------------


Apart from the ugliness of the attribute name, this does not differ in
the slightest from the current behaviour. How could it? Merely changing
the attribute name "eggs" to "NAME" can't change the behaviour. Your
SpamMeal instances still default to 2 eggs, assigning to instance.EGGS
still creates an instance attribute while leaving SpamMeal.EGGS
untouched, and it is still necessary to assign to SpamMeal.EGGS in order
to globally change all instances (that don't already have their own EGGS
attribute).

Since nothing has changed, what's the point? What exactly is the
surprising behaviour that you are trying to flag?
 
E

Ethan Furman

As has been noted, this is the documented behavior, and completely
rational in my mind with the way instances should interact with their
classes.

I completely disagree. Inner and outer scopes are working in the
procedural (/functional?) paradigm, whereas classes and instances are
working in the OO paradigm.

You don't make every auto driver wear helmets because motorcycle drivers
have to*.
(If you want default values for an instance, you define them
in __init__, not as class-level attributes.)

If you want non-class mutable attributes, you assign them in __init__.
There is nothing incorrect about assigning default immutable attributes
at the class level.

~Ethan~
 
G

Gregory Ewing

Lawrence said:
It seems to me the same principle, that of disallowing implicit overriding
of a name from an outer scope with that from an inner one after the former
has already been referenced, should be applied here as well.

How would you intend to enforce such a restriction?
 
J

Jean-Michel Pichavant

John said:
Here's an obscure bit of Python semantics which
is close to being a bug:

... classvar = 1
...
... def fn1(self) :
... print("fn1: classvar = %d" % (self.classvar,))
... self.classvar = 2
... print("fn1: classvar = %d" % (self.classvar,))

I don't think it's a bug.

None would name a class attribute and an instance attribute with the
same name. So I'm assuming that you want to assign 2 to the *class*
attribute (since it's named classvar :eek:) ).

You just did it wrong, self.classvar=2 creates the instance attribute
classvar. This is easly fixed by using some well choosen coding rules:

when working with class attributes within an instance either :

always write t.classvar
or
always write self.__class__.classvar

Problem solved. You could ask for python to reject self.classvar but
that would break to OO lookup algo (== an instance has acces to its
class attributes).

JM
 
J

John Posner

Er what?

If self.EGGS is meant as a constant, you shouldn't be re-assigning to it
*ever*.

That's why I prefaced my suggestion with, "I'm not sure I like this myself".
Apart from the ugliness of the attribute name, this does not differ in
the slightest from the current behaviour. How could it? Merely changing
the attribute name "eggs" to "NAME" can't change the behaviour.

Right. As I said the beginning of my message, I was proposing a "naming
convention" only, not new functionality. The purpose of my example was
to show how the naming convention would look in practice, not to
demonstrate how new functionality would work.
Since nothing has changed, what's the point? What exactly is the
surprising behaviour that you are trying to flag?

No surprising behavior, just a surprising look:

self.EGGS = ...

.... which might remind the programmer what's going on -- the redefining
of a "constant". This was just a suggestion; I hoped it might be helpful
to the OP (or might suggest another, better approach to him). But
perhaps he'll react to this suggestion, as Steven did, with "you
shouldn't be re-assigning to it *ever*".

In my own (hobbyist, not professional) Python programs, I sometimes use
ALL-CAPS names for parameters related to the size of the display screen.
These parameters sort of "feel" constant, but they're not, given the
multiplicity of screens and users' ability to adjust screen resolution.

-John Posner
 
D

Dennis Lee Bieber

No surprising behavior, just a surprising look:

self.EGGS = ...

... which might remind the programmer what's going on -- the redefining
of a "constant". This was just a suggestion; I hoped it might be helpful

But if it is supposed to be a "constant" defined at the class level,
it would be better to just not use the instance (self.) when referencing
it. By stuffing the class name into the reference it is even more
explicit that this is a class level attribute and not an instance
attribute, and probably shouldn't be changed.
 
C

Chris Rebert

       But if it is supposed to be a "constant" defined at the class level,
it would be better to just not use the instance (self.) when referencing
it. By stuffing the class name into the reference it is even more
explicit that this is a class level attribute and not an instance
attribute, and probably shouldn't be changed.

Yes, however that's
(1) likely slightly slower due to the global lookup for the class name
(2) brittle WRT inheritance; subclasses can't override the value
(3) brittle WRT class renaming/refactoring, unless you use
self.__class__.CONSTANT, which is uglier

Cheers,
Chris
 
I

Ixokai

Python protects global variables from similar confusion
by making them read-only when referenced from an inner scope
without a "global" statement.

No. It doesn't. Assignments simply always apply to local variables,
unless you explicitly indicate otherwise. There is no "protection" going
on here.

Its the exact same principle of what's going on with classes/instances;
assignment to an instance always apply to variables local to that
instance-- and not global to the class-- unless you explicitly indicate
otherwise.

There is no protection of the global going on: in all cases, you can
create a new local variable that shadows the global that was previously
there.

The syntax is just different. For regular, free variables, you
"explicitly indicate otherwise" by using the global statement. For
class/instance variables, you "explicitly indicate otherwise" by doing
class.var or self.__class__.var = whatever.

Yes, the behavior of "get" and "set" are distinctly different. "get" in
all cases -- both free variables in functions, and when getting
attributes from your instance -- looks up the tree, starting at your
most local namespace and walking up until it finds what you want. For
free variables, this is a short list: locals(), enclosing nested
function scopes, globals(), builtins.

For attributes, it starts on your instance, then goes to the class, then
up the superclass tree.

The "sets" in all cases don't walk up the tree at all. They only set in
the most local namespace, unless again, you explicitly tell it to do
something else.

Sure, this means you can accidentally shadow a global and run into
problems. It also means you can forget that "explicitly tell it" and run
into other problems.

But its still a feature, and basically part of the core, fundamental
object model of Python.

--

Stephen Hansen
... Also: Ixokai
... Mail: me+list/python (AT) ixokai (DOT) io
... Blog: http://meh.ixokai.io/


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.10 (Darwin)

iQEcBAEBAgAGBQJMs47HAAoJEKcbwptVWx/lZpAH/3RUUMEv9CvapHTCv8ktD7RW
u/lroKPQNn5To/gXx9o41L44/Fm5SwlIn84HcDam1yQhCOWugU7HlZrTr1p7QfhU
nup4ptEihhtBf5y1MxAxmkM0Nx7Ru/Un08iqAnwqnL8S4etpE5lXa49/Cu3mt6Ht
8Hv1hiO5e+61ksIxhnEv+UIKbDMtohPVhLJFYU+fcU8FgT3MDVxKAo6kgQ8VU5RW
YmE7rd/czLmyddV6oyDoeeVFdjVOeNG2ktUpDkdoHoK8eK+a+5HIlkwlnRrYS7k1
HbZKcy4xw1zZK/CTot3wPrrqEn/HPZ3fzxTphY3rZetxwH1D+oKIdqksskR3Trw=
=KPJj
-----END PGP SIGNATURE-----
 
G

Gregory Ewing

Lawrence said:
The same way it’s already enforced.

I don't see how that's possible, except in a very limited and
unreliable way. The compiler would have to be able to determine,
statically, when there was some piece of code that could assign
to a given instance variable of some instance of a class, and
arrange for an "unbound attribute error" to be raised if you
try to reference it before it's been assigned.

When you consider that potential assignments to attributes
can occur in any module, not necessarily the one where the class
is defined, and that classes and instances can be dynamically
modified just about any way at all, this does not seem to be
feasible.

Name lookup and attribute lookup are really quite different
things. The former occurs in an extremely restricted context,
so much so that the compiler knows exactly where every name
is or could be defined. The latter is completely open, and
the compiler can hardly tell anything at all.

So, any enforcement of "unbound attribute" conditions would have
to be done at run time. But this is impossible, because at the
point where you reference an attribute that has no instance
binding, there's no way of telling whether something might give
it an instance binding in the future. At least not without
Guido releasing the time machine's kernel code so that it can
be incorporated into the Python VM...
 
L

Lawrence D'Oliveiro

I don't see how that's possible, except in a very limited and
unreliable way. The compiler would have to be able to determine,
statically, when there was some piece of code that could assign
to a given instance variable of some instance of a class, and
arrange for an "unbound attribute error" to be raised if you
try to reference it before it's been assigned.

If you can’t do it statically, do it dynamically.
 
J

Jonathan Gardner

     (If you want default values for an instance, you define them
in __init__, not as class-level attributes.)

I beg to differ. I've seen plenty of code where defaults are set at
the class level. It makes for some rather nice code.

I'm thinking of lxml.html.cleaner right now, which has a ton of
options that would be almost impossible to manage without class
variables.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,562
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top