Why does python not have a mechanism for data hiding?

P

Paul Boddie

So much to respond to here, so let's get it over with in one post...


Well, there was no need to exclude try...except...finally in the first
place - Guido doubted the semantics of such a thing until he saw it in
Java, apparently. This is arguably one of the few features that has
incurred little cost with its addition, in contrast to many other
changes made to Python these days. Which brings us to the ternary
expression - something which will presumably see unprecedented usage
by those who jump on every new piece of syntax and "wear it out" in
short order.

Interestingly, Python did have an "access" keyword at some point
(before 1.4, I believe), so it isn't as if no-one had thought about
this before.

[...]

But is "enforced restriction" essential in OOP? Note that languages
like C++ provide such features in order to satisfy constraints which
are not directly relevant to Python.

[...]

I've always considered the underscore conventions in Python an ugly
hack in an otherwise elegant language. I often avoid them even when
they technically belong just because I don't like the way they make my
code look.

To be honest, I don't really like putting double underscores before
names just to make sure that instances of subclasses won't start
trashing attributes of a class I've written. (I actually don't care
about people accessing attributes directly from outside instances
because I've seen enough "if only they hadn't made that private/
protected, I could do my work" situations in languages like Java.)
Indeed, I experienced a situation only yesterday where I found that I
had been using an instance attribute defined in a superclass (which
was arguably designed for subclassing), but was the author really
expected to prefix everything with double underscores? Of course, by
not enforcing the private nature of attributes, Python does permit
things like mix-in classes very easily, however.

If anything, Python lacks convenient facilities (besides informal
documentation) for describing the instance attributes provided by the
code of a class and of superclasses. Of course, the dynamic
possibilities in Python makes trivial implementations of such
facilities difficult, but having access to such details would make
errors like those described above less likely. I imagine that pylint
probably helps out in this regard, too.

[...]
I am also bothered a bit by the seeming inconsistency of the rules for
the single underscore. When used at file scope, they make the variable
or function invisible outside the module, but when used at class
scope, the "underscored" variables or functions are still fully
visible. For those who claim that the client should be left to decide
what to use, why is the client prohibited from using underscored
variables at file scope?

I don't remember why this is. I'll leave you to track down the
rationale for this particular behaviour. ;-)
I may be full of beans here, but I would like to make a suggestion.
Why not add a keyword such as "private," and let it be the equivalent
of "protected" in C++? That would make member variables and functions
visible in inherited classes. The encapsulation wouldn't be as
airtight as "private" in C++, so clients could still get access if
they really need it in a pinch, but they would know they are bending
the rules slightly. I realize that single underscores do that already,
but they are just unsightly.

If anything, I'd prefer having private attributes than some kind of
protected attributes which don't actually offer useful protection
(from accidental naming collisions).

Paul
 
V

Ville Vainio

I may be full of beans here, but I would like to make a suggestion.
Why not add a keyword such as "private," and let it be the equivalent
of "protected" in C++? That would make member variables and functions

Better yet, have '@@foo' be translated to self.__foo. It would cut
some whining about 'self' as well, which would probably make it a more
worthwhile change (private attributes is a feature I don't really care
about much).

Obviously nobody should hold their breath and wait for this to happen.
This is a problem that needs to be solved on API doc generation / IDE
autocompleter layer. If everybody agreed on how to tag published vs.
private methods in docstrings or wherever, we would be mostly set.
 
B

Bruno Desthuilliers

Joe P. Cool a écrit :
I saw this "don't need it" pattern in discussions about the ternary
"if..else" expression and about "except/finally on the same block
level".
Now Python has both. Actually it is very useful to be able to
distinguish
between inside and outside.

Doing it the Python way makes the distinction pretty obvious :
implementation stuff names all start with an underscore. You just can
not miss it.
This is obvious for real world things e.g.
your
TV. Nobody likes to open the rear cover to switch the channel.

Nope, but when the switch is broken and the TV not under warranty no
more, I'm glad I can still open the rear cover and hack something by
myself, despite usual (and hard to miss) "no user serviceable parts
inside" and "warranty void if unsealed" warning stickers.

Similar
arguments apply to software objects. "data hiding" is a harsh name, I
would
call it "telling what matters".

Which is exactly what we do using naming convention.
Please don't sell a missing feature as a philosophy. Say you don't
need/want
it. But don't call it philosophy.

Please understand that not having access restriction is a design choice,
not a technical inability (and FWIW the rationale behind that design
choice has been debated to hell and back). So yes, it's is actually a
'philosophic' problem.
It's *your* *decision* which uses will be available. Your explanation
appears
to me as a fear to decide.

Nope, just a matter of experience.
Littering your class definition with dozens of underscores is exactly
the
line noise we love to criticize in Perl.

At least it makes what's 'inside' and what's 'outside' very obvious,
isn't it ?
Nearly every introduction to OOP?

Nearly every introduction to OOP is crap. Nearly every introduction to
OOP also introduces classes and inheritance as "basic principle" of OO.
Almost none introduction to OOP actually talk about *objects* - instead,
they introduce mostly what Stroustrup and Gostling understood of OOP.
Please don't tell me that
encapsulation
does not mean "enforced restriction".

What about : "encapsulation does not mean *language enforced*
restriction" then ?

To me, encapsulation means that - as a client - you do not *need* to
care about implementation details, not that you can not get at it. And
while data hiding is indeed a possible mean to enforce some kind of
encapsulation, it's not quite the same thing.
 
G

Gabriel Genellina

I don't remember why this is. I'll leave you to track down the
rationale for this particular behaviour. ;-)

There is no rationale because this is not how it works...
You can:
- import a module and access *all* of their names, even names prefixed with an underscore
- import any name from a module, even if it starts with an underscore
- import * from a module, and it will import all names listed in __all__, even if they start with an underscore

Only in that last case, and when the module doesn't define __all__, the list of names to be imported is built from all the global names excluding the ones starting with an underscore. And it seems the most convenient default. If one wants to access any "private" module name, any of the first two alternatives will do.
 
R

Russ P.

There is no rationale because this is not how it works...
You can:
- import a module and access *all* of their names, even names prefixed with an underscore
- import any name from a module, even if it starts with an underscore
- import * from a module, and it will import all names listed in __all__, even if they start with an underscore

Only in that last case, and when the module doesn't define __all__, the list of names to be imported is built from all the global names excluding the ones starting with an underscore. And it seems the most convenient default. If one wants to access any "private" module name, any of the first two alternatives will do.


Well, that's interesting, but it's not particularly relevant to the
original point. By default, underscored variables at file scope are
not made visible by importing the module in which they appear. But
underscored member variables of a class *are* made visible to the
client by default. That's seems at least slightly inconsistent to me.

The issue here, for me at least, is not whether the data or methods
should be absolutely hidden from the client. I'm perfectly willing to
say that the client should have a back door -- or even a side door --
to get access to "private" data or methods.

But I also believe that some standard way should be available in the
language to tell the client (and readers of the code) which methods
are *intended* for internal use only. And that method should be based
on more than a naming convention. Why? Because (1) I don't like
leading underscores in my identifiers, and (2) I think I should be
free to choose my identifiers independently from their properties.

Is this a major issue? No. Is it a significant issue. Yes, I think so.

Here's another suggestion. Why not use "priv" as shorthand for
"private"? Then,

priv height = 24

at file scope would make "height" invisible outside the module by
default. And the same line in a class definition would give "height"
the equivalent of "protected" status in C++.

I think "height" looks cleaner than "_height". And isn't clean code a
fundamental aspect of Python?
 
A

Arnaud Delobelle

Russ P. said:
Well, that's interesting, but it's not particularly relevant to the
original point. By default, underscored variables at file scope are
not made visible by importing the module in which they appear. But
underscored member variables of a class *are* made visible to the
client by default. That's seems at least slightly inconsistent to me.

Apart from the fact that modules and classes are very different ideas,
look at this code:

========================================
import random

class A(object):
def __init__(self, val):
self._val = val

class B(A):
def f(self, other):
return random.choice([self, other])._val

a = A(1)
b = B(2)

b.f(a)
========================================

So you want the last line to return 2 if b is chosen and raise a
ValueError if a is chosen. Good luck implementing that!

The issue here, for me at least, is not whether the data or methods
should be absolutely hidden from the client. I'm perfectly willing to
say that the client should have a back door -- or even a side door --
to get access to "private" data or methods.

But I also believe that some standard way should be available in the
language to tell the client (and readers of the code) which methods
are *intended* for internal use only. And that method should be based
on more than a naming convention. Why? Because (1) I don't like
leading underscores in my identifiers, and (2) I think I should be
free to choose my identifiers independently from their properties.

Python is a dynamic language, attributes can be added to objects at
any time in their life. Consider:

class A(object):
private x # Suspend belief for a minute...
def __init__(self, x):
self.x = x

a = A()
a.x = 2 # What behaviour do you propose here?
Is this a major issue? No. Is it a significant issue. Yes, I think so.

Here's another suggestion. Why not use "priv" as shorthand for
"private"? Then,

priv height = 24

at file scope would make "height" invisible outside the module by
default. And the same line in a class definition would give "height"
the equivalent of "protected" status in C++.

I think "height" looks cleaner than "_height". And isn't clean code a
fundamental aspect of Python?

I didn't use to be very fond of it either but I have to admit that it
conveys very useful information.
 
J

Joe P. Cool

| Please don't sell a missing feature as a philosophy.

I won't if you don't claim that a feature is missing because you don't like
its implementation.  To most of us, replacing the nearly never used '__'
with a keyword would be a auful change.

With "missing" I simply meant "not there".

| Please don't tell me that encapsulation does not mean "enforced
restriction".

There are obviously degrees of enforced restriction.  Take your TV example.
Most users will respect a 'do not open' request by the manufacturer
(Python's '_'.).

I program in Python quite often and I also use the '_' notation. But I
always found the inside/outside distinction very natural and
fundamental. The '_' notation is too informal and can arbitrarily be
overloaded with other meanings. I don't understand why it is so
important to avoid one keyword and paying for that by typing numerous
'_'. Underscore meaning is somewhat blurred - sometimes private,
sometimes protected. And it is line noise - like @$& in Perl.
If you want something equivalent to epoxy embedding, design it, implement
it, and share it (if you want).  Perhaps one could make an Opaque extension
class, perhaps even usable as a mixin class rather than as a single-base
class.

Perhaps I'll give it a try - as soon as I have found out what you
mean :)
 
G

George Sakkis

Here's another suggestion. Why not use "priv" as shorthand for
"private"? Then,

priv height = 24

at file scope would make "height" invisible outside the module by
default. And the same line in a class definition would give "height"
the equivalent of "protected" status in C++.

On a slightly different question: can a data hiding mechanism be
implemented as an add-on by a third party library without a change in
the core language, just like zope.interface does for interfaces ? If
not, why not ?
I think "height" looks cleaner than "_height". And isn't clean code a
fundamental aspect of Python?

Note that even in languages that do implement data hiding, people
often use a naming convention to denote hidden members, e.g. an "m_"
prefix (though I find this uglier than plain underscores).

George
 
G

Gabriel Genellina

I am also bothered a bit by the seeming inconsistency of the rules for
the single underscore. When used at file scope, they make the variable
or function invisible outside the module, but when used at class
scope, the "underscored" variables or functions are still fully
visible. For those who claim that the client should be left to decide
what to use, why is the client prohibited from using underscored
variables at file scope?

There is no rationale because this is not how it works... [snip
explanation of how import works]

Well, that's interesting, but it's not particularly relevant to the
original point. By default, underscored variables at file scope are
not made visible by importing the module in which they appear. But
underscored member variables of a class *are* made visible to the
client by default. That's seems at least slightly inconsistent to me.

To make things clear: _variables ARE visible when you import a module:

C:\TEMP>type module.py
_variable = 123

(invoke python)
py> import module
py> module._variable
123
py> dir(module)
['__builtins__', '__doc__', '__file__', '__name__', '_variable']
py>
py> from module import _variable
py> _variable
123

Only when you use "from module import *" _variable isn't imported.
(new python session):

py> from module import *
py> _variable
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name '_variable' is not defined

That last form should not be used normally, except when playing with the
interactive interpreter.
The issue here, for me at least, is not whether the data or methods
should be absolutely hidden from the client. I'm perfectly willing to
say that the client should have a back door -- or even a side door --
to get access to "private" data or methods.

But I also believe that some standard way should be available in the
language to tell the client (and readers of the code) which methods
are *intended* for internal use only. And that method should be based
on more than a naming convention. Why? Because (1) I don't like
leading underscores in my identifiers, and (2) I think I should be
free to choose my identifiers independently from their properties.

Is this a major issue? No. Is it a significant issue. Yes, I think so.

Ok, that's what you'd like Python to be. Unfortunately your view appears
not to be shared by the language developers.
Here's another suggestion. Why not use "priv" as shorthand for
"private"? Then,

priv height = 24

at file scope would make "height" invisible outside the module by
default. And the same line in a class definition would give "height"
the equivalent of "protected" status in C++.

Python data model is centered on namespaces, and it's hard to tell whether
certain attribute is being accessed from "inside the class" or from
"outside the class" in order to allow or deny access.
It's not like static languages where the whole class definition has a
certain lexical scope and it can't be modified afterwards; the set of
allowed attributes is fixed at compile time. In Python you can
add/remove/alter attributes (including methods) dynamically. You can
create classes without using the class statement. The same method may be
shared by many unrelated classes. If you can devise a practical and
efficient mechanism to determine access right in all those varying
circumstances, please post it to the python-ideas list for further
discussion. (In the meantime we'll continue to use a naming convention for
us consenting adults.)
I think "height" looks cleaner than "_height". And isn't clean code a
fundamental aspect of Python?

I like the fact that the mere attribute name conveys useful information
abut its intended usage. And I don't care about the _, it doesn't look
ugly to me. But that's just my personal opinion.
 
A

alex23

But I also believe that some standard way should be available in the
language to tell the client (and readers of the code) which methods
are *intended* for internal use only. And that method should be based
on more than a naming convention. Why? Because (1) I don't like
leading underscores in my identifiers, and (2) I think I should be
free to choose my identifiers independently from their properties.

Have you considered using the Bridge pattern to separate your
interface from your implementation?

class Implementation:
def private_method(self):
raise NotImplementedError

class Interface:
def __init__(self):
self.imp = Implementation()
def public_method(self):
self.imp.private_method()

No offensive _methods on the interface, and the implementation is
still open for consenting adults to tinker with.
 
R

Russ P.

To make things clear: _variables ARE visible when you import a module:

C:\TEMP>type module.py
_variable = 123

(invoke python)
py> import module
py> module._variable
123
py> dir(module)
['__builtins__', '__doc__', '__file__', '__name__', '_variable']
py>
py> from module import _variable
py> _variable
123

Only when you use "from module import *" _variable isn't imported.
(new python session):

py> from module import *
py> _variable
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name '_variable' is not defined

Hmmm... that seems a bit strange. Why should "_variable" be visible
when you use "import module" but not when you use "from module import
*"?
That last form should not be used normally, except when playing with the
interactive interpreter.

OK, I have a confession to make. I use "from module import *" almost
exclusively. But then, I'm different from most folks. I can drink six
beers and still drive safely, for example. The key is to drive faster
so I can get past dangerous situations quicker.
 
T

Terry Reedy

| Hmmm... that seems a bit strange. Why should "_variable" be visible
| when you use "import module"

Because you only add one name to the current namespace -- bound to a
module -- and there is no reason to exclude attributes of that object.

|but not when you use "from module import *"?

Because you are already adding m names and adding n more names may cause
problems -- which is what lead to the addition of the __all__ mechanism
some years back. Sorry, I forget the details because they don't concern
me.

| > That last form should not be used normally, except when playing with
the
| > interactive interpreter.
|
| OK, I have a confession to make. I use "from module import *" almost
| exclusively. But then, I'm different from most folks. I can drink six
| beers and still drive safely, for example. The key is to drive faster
| so I can get past dangerous situations quicker.
 
A

Antoon Pardon

From whom are you trying to hide your attributes?

In Python, the philosophy "we're all consenting adults here" applies.
You shouldn't pretend to know, at the time you write it, all the uses
to which your code will be put. Barriers such as enforced "private"
attributes will only cause resentment when people, despite your
anticipations, *need* to access them and are then forced to hack their
way around them.

I don't find this argument very compelling.

You can't anticipate all functionality people would like your function
to have. Acces to information in a (private) attribute is just one of
those possible functionallities. People will resent you if you don't
provide functionality they think fits logically in your package.
If you want the users of your code to know that an attribute should
not be used as a public API for the code, use the convention of naming
the attribute with a single leading underscore. This is a string
signal that the attribute is part of the implementation, not the
interface. The reader is then on notice that they should not rely on
that attribute; but they are not *prohibited* from using it if
necessary to their ends.

But they will resent you just as much if you decide to rewrite
your module in such a way that the attribute is no longer present
or is used now in a slightly different way, so that it break code.
 
A

Antoon Pardon

Hi,

first, python is one of my fav languages, and i'll definitely keep
developing with it. But, there's 1 one thing what I -really- miss:
data hiding. I know member vars are private when you prefix them with
2 underscores, but I hate prefixing my vars, I'd rather add a keyword
before it.

Python advertises himself as a full OOP language, but why does it miss
one of the basic principles of OOP? Will it ever be added to python?

Thanks in advance,
Lucas

If you really need it, you can do data hiding in python. It just
requires a bit more work.

----------------------------- Hide.py ---------------------------------
class Rec(object):
def __init__(__, **kwargs):
for key,value in kwargs.items():
setattr(__, key, value)

def __getitem__(self, key):
return getattr(self, key)

def __setitem__ (self, key, val):
setattr(self, key, val)

class Foo(object):

def __init__(self):

hidden = Rec(x=0, y=0)

def SetX(val):
hidden.x = val

def SetY(val):
hidden.y = val

def GetX():
return hidden.x

def GetY():
return hidden.y

self.SetX = SetX
self.SetY = SetY
self.GetX = GetX
self.GetY = GetY

--------------------------------------------------------------------------
$ python
Python 2.5.2 (r252:60911, Apr 17 2008, 13:15:05)
[GCC 4.2.3 (Debian 4.2.3-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute 'hidden'
 
C

Carl Banks

If you really need it, you can do data hiding in python. It just
requires a bit more work.

----------------------------- Hide.py ---------------------------------
class Rec(object):
def __init__(__, **kwargs):
for key,value in kwargs.items():
setattr(__, key, value)

def __getitem__(self, key):
return getattr(self, key)

def __setitem__ (self, key, val):
setattr(self, key, val)

class Foo(object):

def __init__(self):

hidden = Rec(x=0, y=0)

def SetX(val):
hidden.x = val

def SetY(val):
hidden.y = val

def GetX():
return hidden.x

def GetY():
return hidden.y

self.SetX = SetX
self.SetY = SetY
self.GetX = GetX
self.GetY = GetY

Red Herring.

1. This doesn't hide the variables; it just changes their spelling.
2. This also "hides" the variables from its own class.

In other words, it's a useless no-op.

In fact, I'd say this is even worse than useless. Creating accessor
functions is a sort of blessing for external use. Knowing that there
are accessor functions is likely to cause a user to show even less
restraint.


Carl Banks
 
A

Antoon Pardon

Antoon Pardon said:
If you really need it, you can do data hiding in python. It just
requires a bit more work.
--- $ python
Python 2.5.2 (r252:60911, Apr 17 2008, 13:15:05)
[GCC 4.2.3 (Debian 4.2.3-3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
From Hide import Foo
var = Foo()
var.GetX() 0
5
var.x
Traceback (most recent call last):
File said:
var.hidden.x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute 'hidden'

That sort of hiding isn't any more secure than the 'hiding' you get in C++.
So?
var.GetX.func_closure[0].cell_contents.x
5

All you've done is force the user who wants to bypass it to use a longer
expression, and if that's your criterion for 'hiding' then just use two
leading underscores.

That you can find a lock pick to get at an object doesn't contradict
that the object is locked away.

I think the intention of not having these variables accesable to
the application programmer is much stronger expressed than with
two leading underscores.

Even if the current implementation of the language makes it
relatively easy to get at the information if you really want
to.
 
A

Antoon Pardon

Red Herring.

1. This doesn't hide the variables; it just changes their spelling.
2. This also "hides" the variables from its own class.

In other words, it's a useless no-op.

In fact, I'd say this is even worse than useless. Creating accessor
functions is a sort of blessing for external use. Knowing that there
are accessor functions is likely to cause a user to show even less
restraint.

I think you completed missed the point.

This is just a proof of concept thing. In a real example there would
of course no Set en Get methods but just methods that in the course
of their execution would access or update the hidden attributes
 
C

Carl Banks

I think you completed missed the point.

I'm not sure I missed the point so much as I failed to read your mind.

This is just a proof of concept thing. In a real example there would
of course no Set en Get methods but just methods that in the course
of their execution would access or update the hidden attributes

Fair enough, but I don't see anything in your example that suggests a
way to discriminate between access from within the class and access
from outside the class, which is the crucial aspect of data hiding.


Carl Banks
 
C

Carl Banks

Fair enough, but I don't see anything in your example that suggests a
way to discriminate between access from within the class and access
from outside the class, which is the crucial aspect of data hiding.


And, if you want an example of something that does that, how about
this metaclass. It creates a class that checks the stack frame to see
if the caller was defined in the same class.

Issues:
Classes are prevented from defining their own __setattr__ and
__getattribute__.
Classes and subclasses should not use the same names for their private
variables.
Private attribute access is pretty slow, but that's obvious.
Pretty easy to thwart.


#----------------------------------
import sys
import itertools

class PrivateAccessError(Exception):
pass

class PrivateDataMetaclass(type):
def __new__(metacls,name,bases,dct):

function = type(lambda x:x)

privates = set(dct.get('__private__',()))

codes = set()
for val in dct.itervalues():
if isinstance(val,function):
codes.add(val.func_code)

getframe = sys._getframe
count = itertools.count

def __getattribute__(self,attr):
if attr in privates:
for i in count(1):
code = getframe(i).f_code
if code in codes:
break
if code.co_name != '__getattribute__':
raise PrivateAccessError(
"attribute '%s' is private" % attr)
return super(cls,self).__getattribute__(attr)

def __setattr__(self,attr,val):
if attr in privates:
for i in count(1):
code = getframe(i).f_code
if code in codes:
break
if code.co_name != '__setattr__':
raise PrivateAccessError(
"attribute '%s' is private" % attr)
return super(cls,self).__setattr__(attr,val)

dct['__getattribute__'] = __getattribute__
dct['__setattr__'] = __setattr__

cls = type.__new__(metacls,name,bases,dct)

return cls

#----------------------------------
import traceback

class A(object):
__metaclass__ = PrivateDataMetaclass
__private__ = ['internal']

def __init__(self,n):
self.internal = n

def inc(self):
self.internal += 1

def res(self):
return self.internal


class B(A):
__private__ = ['internal2']

def __init__(self,n,m):
super(B,self).__init__(n)
self.internal2 = m

def inc(self):
super(B,self).inc()
self.internal2 += 2

def res(self):
return self.internal2 + super(B,self).res()

def bad(self):
return self.internal2 + self.internal


a = A(1)
a.inc()

print "Should print 2:"
print a.res()
print

print "Should raise PrivateAccessError:"
try:
print a.internal
except PrivateAccessError:
traceback.print_exc()
print

b = B(1,1)
b.inc()

print "Should print 5:"
print b.res()
print

print "Should raise PrivateAccessError:"
try:
print b.internal2
except PrivateAccessError:
traceback.print_exc()
print

print "Should raise PrivateAccessError:"
try:
print b.bad()
except PrivateAccessError:
traceback.print_exc()
print
#----------------------------------



Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,681
Members
48,796
Latest member
Greg L.

Latest Threads

Top