data hiding/namespace pollution

Alex Hunsley · Oct 31, 2005

There's no really specific questions in this post, but I'm looking for
people's thought on the issues within...

The two main versions I've encountered for data pseudo-hiding
(encapsulation) in python are:

method 1:

_X - (single underscore) - just cosmetic, a convention to let someone
know that this data should be private.

method 2:

__X - (double underscore) - mangles the name (in a predictable way).
Avoids name pollution.

How often does either tend to get used? Personally, I'd be a little
worried about using method 1, because namespace clashes could happen. Is
this overly paranoid?

Also, I presume that rather than people writing their own manual getter
and setter methods, they tend to use either overloading on __getattr__
and __setattr__, or the Property class (which itself uses aforementioned
methods). Overloading __getattr__ etc. seems more attractive to me, as
then I can capture access to unknown names, and raise an exception!
(I really don't like the idea of random attribute name typos going
unnoticed when accessing attributes in a class!)

Note: I do know that the use of the above things is quite dependent on
what exactly you're coding, the size of the project etc., but what I'm
trying to find out about is the python communities' recognised good
practices.

thanks,
alex

bruno at modulix · Oct 31, 2005

Alex said:
There's no really specific questions in this post, but I'm looking for
people's thought on the issues within...

The two main versions I've encountered for data pseudo-hiding
(encapsulation)

<OT>
Hmmm... Are data-hiding and encapsulation really the same things ?

in python are:

method 1:

_X - (single underscore) - just cosmetic, a convention to let someone
know that this data should be private.

method 2:

__X - (double underscore) - mangles the name (in a predictable way).
Avoids name pollution.

How often does either tend to get used? Personally, I'd be a little
worried about using method 1, because namespace clashes could happen. Is
this overly paranoid?

Probably.

Note that prefixing names with a single underscore have a 'protected'
semantic - which means that such names (well, the objects that are bound
to...) can be overriden/extends by child classes.

I personnally only use the double-underscore notation only for things
that are *really* implementation-specific *and* should *really not* be
overriden.

Also, I presume that rather than people writing their own manual getter
and setter methods, they tend to use either overloading on __getattr__
and __setattr__, or the Property class (which itself uses aforementioned
methods).

Yeps... This is the pythonic way.

Overloading __getattr__ etc. seems more attractive to me, as
then I can capture access to unknown names, and raise an exception!
(I really don't like the idea of random attribute name typos going
unnoticed when accessing attributes in a class!)

Err... Have you *really* tried to access an inexistant attribute ? This
is usually not 'unnoticed' (unless you consider the raising of an
AttributeError as being the same as 'unnoticed' !-)

I personnaly use 'magic' accessors only for delegation or like, and
properties (or custom descriptors) for anything else (that requires
it...). This avoid the Big-Switch-Syndrom in __getattr__ and setattr__,
and is much more explicit (API, documentation, introspection etc...).

Note: I do know that the use of the above things is quite dependent on
what exactly you're coding, the size of the project etc., but what I'm
trying to find out about is the python communities' recognised good
practices.

Then launch your python interactive shell and type "import this"

HTH

Alex Hunsley · Oct 31, 2005

bruno said:
<OT>
Hmmm... Are data-hiding and encapsulation really the same things ?
</OT>

No, they're not, I was just being careless there, please disregard any
apparent implication that they are.

Probably.

Note that prefixing names with a single underscore have a 'protected'
semantic - which means that such names (well, the objects that are bound
to...) can be overriden/extends by child classes.

Ah, my mistake, not merely cosmetic then! Thanks.

I personnally only use the double-underscore notation only for things
that are *really* implementation-specific *and* should *really not* be
overriden.
ok.

Yeps... This is the pythonic way.

Err... Have you *really* tried to access an inexistant attribute ? This
is usually not 'unnoticed' (unless you consider the raising of an
AttributeError as being the same as 'unnoticed' !-)

Sorry, I wasn't being clear. What I should have said is that I don't
like the idea of a typo in an assignment causing the assigning of the
wrong thing.
e.g. imagine a simple value-holding class:

class Values:
pass

v = Values()

v.conductoin = 10

.... I meant to type 'conduction' in the source but spelt it wrong.
My value won't be there when elsewhere I refer to the correct attribute:
"conduction".

I personnaly use 'magic' accessors only for delegation or like, and
properties (or custom descriptors) for anything else (that requires
it...). This avoid the Big-Switch-Syndrom in __getattr__ and setattr__,
and is much more explicit (API, documentation, introspection etc...).

Right, good point.

Then launch your python interactive shell and type "import this"

Thanks for that, I didn't know about that!
alex

Jorge Godoy · Oct 31, 2005

Alex Hunsley said:
Sorry, I wasn't being clear. What I should have said is that I don't like the
idea of a typo in an assignment causing the assigning of the wrong thing.
e.g. imagine a simple value-holding class:

class Values:
pass

v = Values()

v.conductoin = 10

... I meant to type 'conduction' in the source but spelt it wrong.
My value won't be there when elsewhere I refer to the correct attribute:
"conduction".

Recently there was a big thread where that was raised again (yep, you're not
the first, nor the second, nor the third...). You should write unittests, use
tools like pychecker, pylint, etc.

Alex Hunsley · Oct 31, 2005

Jorge said:
Recently there was a big thread where that was raised again (yep, you're not
the first, nor the second, nor the third...). You should write unittests, use
tools like pychecker, pylint, etc.

Yup, I'm plannig on using pyunit. Didn't know about pychecker though,
thanks for that!

Alex Hunsley · Oct 31, 2005

Jorge said:
Recently there was a big thread where that was raised again (yep, you're not
the first, nor the second, nor the third...). You should write unittests, use
tools like pychecker, pylint, etc.

Btw, can you recall the subject line of the thread? I'd like to google
groups for it and have a read of that thread...
ta!
alex

Jorge Godoy · Oct 31, 2005

Alex Hunsley said:
Btw, can you recall the subject line of the thread? I'd like to google groups
for it and have a read of that thread...
ta!

Search for: "alex martelli pychecker" on comp.lang.python... I don't have the
thread's name anymore. You'll probably find more than one thread with
that.

Steven D'Aprano · Oct 31, 2005

There's no really specific questions in this post, but I'm looking for
people's thought on the issues within...

The two main versions I've encountered for data pseudo-hiding
(encapsulation) in python are:

method 1:

_X - (single underscore) - just cosmetic, a convention to let someone
know that this data should be private.

Not quite.

In modules, names starting with one or more underscore (_X, __X, etc.) are
not copied over when you import the module using "from module import *".

In classes, instance._X is just a convention "this is private, don't touch
unless you really have to".

method 2:

__X - (double underscore) - mangles the name (in a predictable way).
Avoids name pollution.

Again, not quite: this only occurs for attributes, not names in modules.

How often does either tend to get used? Personally, I'd be a little
worried about using method 1, because namespace clashes could happen. Is
this overly paranoid?

You are no more likely to have instance._attribute clash as you are to
have instance.attribute clash.

In fact, since each class is its own namespace, it is only an issue if you
are subclassing. And that is an argument for better documentation: if you
tell people your class uses semi-private attribute _X, and they still
accidentally over-write it, that is their fault exactly as if they
accidentally over-wrote public methods like .append().

Also, I presume that rather than people writing their own manual getter
and setter methods, they tend to use either overloading on __getattr__
and __setattr__, or the Property class (which itself uses aforementioned
methods). Overloading __getattr__ etc. seems more attractive to me, as
then I can capture access to unknown names, and raise an exception!

You don't need to overload __getattr__ to raise an exception when you
access unknown names:

py> class Parrot:
.... canSpeak = True # note mixed case
....
py> p = Parrot()Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: Parrot instance has no attribute 'canspeak'

In any case, that sounds like you are just making work for yourself.
What are you doing, manually keeping a list of "allowed" attributes which
you check before hand?

# warning: untested
class Spanish_Inquisition():
ALLOWED = ['comfy_chair', 'shrubbery']
def __getattr__(self, name):
if name in self.ALLOWED:
return self.__dict__[name]
raise ValueError("No such attribute")

Yuck yuck yuck. Slow, unnecessary, and of course you might think you know
what attributes your class needs, but you can never predict when your
class's users will want to add attributes you never thought of.

There is, at least, an argument in favour of using that technique for
enforcing something like attribute declarations:

def __setattr__(self, name, value):
if name in self.ALLOWED:
self.__dict__[name] = value
else:
raise ValueError("That attribute hasn't been declared.")

although that just leads into the whole "bondage and domination language"
can of worms.

In any case, you already have a perfectly good list of attributes.
Actually, two lists, one for class attributes and one for instance
attributes:

instance.__class__.__dict__.keys()
instance.__dict__.keys()

Keeping two lots of the same data around is usually a recipe for trouble.
Just wait until you delete an attribute, and then forget to remove it from
your ALLOWED list, and watch the fun and games when you start getting
unexpected errors.

(I really don't like the idea of random attribute name typos going
unnoticed when accessing attributes in a class!)

This is no more a problem than getting random name typos when accessing
any objects in Python. In many people's experience, it is mostly -- but
not always -- those who don't use Python very much who worry about the
lack of declarations. In practice, if you are testing your code
sufficiently, you won't miss the lack of declarations. Declarations are
only good for picking up a tiny subset of bugs, and proper testing will
pick those same bugs -- and many more -- without the need for declaring
variables and/or attributes.

No doubt there will be some who disagree. Let me postscript my comments
with YMMV, and remind folks that even if declarations are the best thing
since the transistor, Python currently doesn't have them and all the
arguing in the world won't change that.

Alex Hunsley · Oct 31, 2005

Jorge said:
Search for: "alex martelli pychecker" on comp.lang.python... I don't have the
thread's name anymore. You'll probably find more than one thread with
that.

thanks! :]
lex

bruno at modulix · Oct 31, 2005

Alex said:
Ah, my mistake, not merely cosmetic then! Thanks.

Well, to be more exact, the point is not that _names can be overriden,
but that, due to name mangling, the following won't work, or, at least,
not as expected:

class Base(object):
def __dothis(self):
print "Base dothis"

def dothat(self):
print "Base dothat"

def run(self):
self.__dothis()
self.dothat()

class Child(Base):
def __dothis(self):
print "__%s_dothis" % self.__class__.__name__

def dothat(self):
print "%s dothat" % self.__class__.__name__

c = Child()
c.run()

(snip)

Sorry, I wasn't being clear. What I should have said is that I don't
like the idea of a typo in an assignment causing the assigning of the
wrong thing.
e.g. imagine a simple value-holding class:

class Values:
pass

v = Values()

v.conductoin = 10

... I meant to type 'conduction' in the source but spelt it wrong.
My value won't be there when elsewhere I refer to the correct attribute:
"conduction".

This kind of mistakes are usually not too hard to spot and not too hard
to correct. You'd have the same problem with a dict (and your Values
class is not much more than a dotted_syntax dict in disguise). BTW,
Pylint or Pychecker would catch this, and most unittests too.

Now if you have a case where this could really matter (like a user's
script), you can then define a more bondage_and_the_whip class with slots.

My overall experience with programming and Python is that keeping it
stupid simple where you can is usually the best approach. Your
__setattr__as_a_typo_catcher solution looks to me like a case of
arbitrary complexification. Better to go with the language than to fight
against it.

My 2 cents...

Alex Martelli · Oct 31, 2005

Alex Hunsley said:
There's no really specific questions in this post, but I'm looking for
people's thought on the issues within...

The two main versions I've encountered for data pseudo-hiding
(encapsulation) in python are:

method 1:

_X - (single underscore) - just cosmetic, a convention to let someone
know that this data should be private.

method 2:

__X - (double underscore) - mangles the name (in a predictable way).
Avoids name pollution.

How often does either tend to get used? Personally, I'd be a little
worried about using method 1, because namespace clashes could happen. Is
this overly paranoid?

Experienced programmers who know little Python tend to start with (2),
and mostly migrate to (1) with time (once they've had to hand-mangle
names a few times to work around (2)'s limitations for testing or
overrides they had not foreseen). Accidental name clashes between
superclasses and subclasses tend to be caught very rapidly by unit tests
that are at all decent, anyway.

Also, I presume that rather than people writing their own manual getter
and setter methods, they tend to use either overloading on __getattr__
and __setattr__, or the Property class (which itself uses aforementioned
methods). Overloading __getattr__ etc. seems more attractive to me, as
then I can capture access to unknown names, and raise an exception!

If you AVOID overriding __getattr__, THEN you'll automatically get
exceptions; __getattr__ is called only when, were it absent, the
exception would get raised. property does NOT use __getattr__ at all,
but rather each instance thereof is a descriptor and thus uses its own
__get__. __setattr__ has very different semantics and is appropriate
only in very peculiar circumstances.

(I really don't like the idea of random attribute name typos going
unnoticed when accessing attributes in a class!)

A common but unjustified paranoia. I've been coding almost exclusively
(say over 90% of my work) in Python for over 5 years, and also teaching,
consulting, mentoring &c based on Python, and all the possible
"typo"-level bugs that so terrify so many new-to-Python programmers are
simply irrelevant -- they're not common in the first place, pychecker
and the likes make short work of them, and unit tests (which ARE
indispensable in any language anyway) catch them easily just as they
catch the really nasty typos possible in any language such as typing -=
where one meant +=, < where one should have coded <= (the one most
likely tiny-bug in any language, since it can be a thinko ever more
easily than a typo -- it once took me three days debugging a Fortran
program for extremely subtle corner-case errors that boiled down to a
miscoding of .LT. where .LE. should have been), and the like.

If you manage to do a significant poll of the community, don't forget to
correlate respondents' opinions with each respondent depth and length of
experience with real-world Python use...

Alex

Alex Hunsley · Oct 31, 2005

Steven said:
Not quite.

In modules, names starting with one or more underscore (_X, __X, etc.) are
not copied over when you import the module using "from module import *".

And you can also control what gets exported by defining __all__ I
believe....

In classes, instance._X is just a convention "this is private, don't touch
unless you really have to".

ah, ok.

>[snip some more details]

Thanks for your helpful response, it's clarifying a few things for me!

What are you doing, manually keeping a list of "allowed" attributes which
you check before hand?

Heheh. No, I'm not actually 'doing' anything yet, I'm findout out what
is good practise in python land.

There is, at least, an argument in favour of using that technique for
enforcing something like attribute declarations:

def __setattr__(self, name, value):
if name in self.ALLOWED:
self.__dict__[name] = value
else:
raise ValueError("That attribute hasn't been declared.")

although that just leads into the whole "bondage and domination language"
can of worms.

In any case, you already have a perfectly good list of attributes.
Actually, two lists, one for class attributes and one for instance
attributes:

instance.__class__.__dict__.keys()
instance.__dict__.keys()

[snip]

okeydoke.
thanks for the advice!
alex

Steven Bethard · Oct 31, 2005

Alex said:
The two main versions I've encountered for data pseudo-hiding
(encapsulation) in python are:

method 1:

_X - (single underscore) - just cosmetic, a convention to let someone
know that this data should be private.

method 2:

__X - (double underscore) - mangles the name (in a predictable way).
Avoids name pollution.

Method 2 is also (though to a lesser degree) just cosmetic -- it doesn't
prevent all name clashes even if you're reasonable enough not to name
anything in the _X__xxx pattern. I gave an example of this in an
earlier thread on this topic[1]. The basic problem is that
double-underscore mangling doesn't include the module name, so two
classes in different modules with the same class names can easily mess
with each others' "private" attributes.

STeVe

[1]http://groups.google.com/group/comp.lang.python/msg/f03183a2c01c8ecf?hl=en&

Automatic delegation in Python 3	3	Sep 8, 2010
Conceptual flaw in pxdom?	10	May 17, 2009
Read Only attributes, auto properties and getters and setters	5	Feb 12, 2009
style guideline for naming variables?	2	Mar 17, 2006
python-dev Summary for 2005-04-16 through 2005-04-30	7	May 15, 2005
Reference overloading helps inheritance	6	Feb 7, 2004
ANN: wxPython 2.7.1.1 released	0	Oct 19, 2006
jQuery Attribute Summit--Latest Coverage	16	Dec 19, 2009

data hiding/namespace pollution

Alex Hunsley

bruno at modulix

Alex Hunsley

Jorge Godoy

Alex Hunsley

Alex Hunsley

Jorge Godoy

Steven D'Aprano

Alex Hunsley

bruno at modulix

Alex Martelli

Alex Hunsley

Steven Bethard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads