Why does python not have a mechanism for data hiding?

A

Antoon Pardon

As I said they are public themselves for someone.

Isn't that contradictory: "Public for someone" I always
thought "public" meant accessible to virtually anyone.
Not to only someone.
 
N

NickC

I saw this "don't need it" pattern in discussions about the ternary
"if..else" expression and about "except/finally on the same block
level".
Now Python has both.

if/else was added solely because people kept coming up with ways of
embedding a pseudo conditional inside expressions and writing buggy
code in the process. All it really saves you in practice is a bit of
vertical whitespace, so, no, you still don't need it - but if you
insist on doing it, at least there's now an easy way to do it
correctly.

except/finally on the same block level was trivial to implement once
the reference interpreter switched to an AST based compiler for 2.5.
If you look at the AST, you'll find that it still only has TryExcept
and TryFinally, so again, you still don't need except/finally on the
same block level - all the syntax allows you to do is omit the second
try: line and its associated indentation.
Actually it is very useful to be able to
distinguish
between inside and outside. This is obvious for real world things e.g.
your
TV. Nobody likes to open the rear cover to switch the channel. Similar
arguments apply to software objects. "data hiding" is a harsh name, I
would
call it "telling what matters". The need for this becomes
indispensable in
really big software packages like the Eclipse framework with approx.
100000
classes. If you cannot tell the difference between inside and outside
you
are lost.


Please don't sell a missing feature as a philosophy. Say you don't
need/want
it. But don't call it philosophy.

Gosh, and here I thought treating programmers as non-idiots was
actually one of the guiding philosophies in the discussion on python-
dev. Good thing we have you here to tell us we're only imagining that.
It's *your* *decision* which uses will be available. Your explanation
appears
to me as a fear to decide.

Are you writing application code or library code? For application
code, you have a much greater idea of the uses for your code, so you
can be confident in your decision as to what should and should not be
visible. For library code, however, it's fairly common for a library
to provide something which is almost, but not quite, what the user
needs. Letting users poke around at their own risk is a nice courtesy
that can save them a lot of work in the long run.

So the decision to hide something is still made (by using an
underscore prefix), but an easy mechanism is provided for the library
user to override that decision.
Littering your class definition with dozens of underscores is exactly
the
line noise we love to criticize in Perl.

Using underscores in names (leading or otherwise) separated by
plaintext keywords is a far cry from multiple different symbols that
mean different things in different contexts and can be chained
together fairly arbitrarily.
Nearly every introduction to OOP? Please don't tell me that
encapsulation
does not mean "enforced restriction". If the language has no syntactic
support for encapsulation then it does not have encapsulation.

Module globals aren't visible outside the module without importing it.
Class attributes aren't visible outside the class without derefencing
it.
Instance attributes aren't visible outside an instance without
deferencing one.

*That* is the encapsulation/data hiding which OOP requires, and is the
kind which Python enforces. What you're asking for is encapsulation of
class and instance attributes based on the context in which the
dereferencing occurs (inside the class, inside a subclass of that
class, inside an instance of that class, inside an instance of a
subclass of that class, somewhere else entirely), and that has nothing
to do with the basics of OOP.

On the other hand, if you're so keen on this feature, perhaps you'd
like to make a concrete proposal regarding how you would like the
semantics to work in light of Python dynamic typing model. What will
it do when a method is invoked via the class dict rather than via
attribute retrieval? Can unbound methods access protected or private
attribute? How about descriptor get, set and delete methods? What
happens when a function is added to a class definition after creation
as a new method?

Cheers,
Nick.
 
N

NickC

I am also bothered a bit by the seeming inconsistency of the rules for
the single underscore. When used at file scope, they make the variable
or function invisible outside the module, but when used at class
scope, the "underscored" variables or functions are still fully
visible. For those who claim that the client should be left to decide
what to use, why is the client prohibited from using underscored
variables at file scope?

They aren't - the only thing that won't see the underscore prefixed
names is "from x import *". If you do "import x" instead, all the
underscored names will be accessible as attributes of the module.
 
P

Paul Rubin

NickC said:
if/else was added solely because people kept coming up with ways of
embedding a pseudo conditional inside expressions and writing buggy
code in the process. All it really saves you in practice is a bit of
vertical whitespace, so, no, you still don't need it - but if you
insist on doing it, at least there's now an easy way to do it
correctly.

Come on, it's more than vertical whitespace, it's extraneous variables
and sometimes even extraneous functions and function call overhead.
And Python is supposed to be unbureaucratic. People kept looking for
ways to write conditional expressions instead of spewing the logic
across multiple statements for a reason: the code is often cleaner
that way.
 
N

NickC

What is it about leading underscores that bothers me? To me, they are
like a small pebble in your shoe while you are on a hike. Yes, you can
live with it, and it does no harm, but you still want to get rid of it.

With leading underscores, you can see *at the point of dereference*
that the code is accessing private data. With a "this is private"
keyword you have no idea whether you're accessing private or public
data, because the two namespaces get conflated together.

I'll keep my pebble, thanks.

Cheers,
Nick.
 
N

NickC

True. It's extremely suited to what we do though.Minor difficulties
like this are vastly outweighed by advantages. The difficulties are
real though.

It's interesting to take a look at some of the work Brett Cannon has
done trying to come up with a sandbox for executing Python code that
actually manages to block access to dangerous functions like file() or
urllib.urlopen(). Powerful introspection capabilities and restricted
access to methods and attributes don't really play well together.

http://svn.python.org/view/python/branches/bcannon-objcap/securing_python.txt?rev=55685&view=markup

(I believe that work is on hiatus while he's been busy with other
projects, such as a more flexible Python-based reimplementation of the
import mechanism that would be make it possible to implement the
security restrictions needed to permit limited imports in a sandboxed
interpreter)
We need to *use* those names to display the spreadsheet once the
calculation has finished (and their code has run).


Splitting more of the functionality out is probably part of the best
solution.

Yeah, at this point your only hope is going to be making them go
through such wild contortions to get at the internal data they think
better of it. Actually blocking all access to something written in
Python is fairly tough (you generally need an extension class written
in non-Python code that hides access to certain attributes).

Cheers,
Nick.
 
N

NickC

Guido has been known to change his mind, which is an admirabele quality,
but it does show that at some point he rejected a good idea or accepted
a bad idea.

And sometimes the person that talked him into accepting the bad idea
in the first place ends up agreeing with him when he eventually
rejects it ;)

Cheers,
Nick.

P.S. Read the list of references in PEP 343 if you want to know what
I'm talking about *cough*
 
N

NickC

Come on, it's more than vertical whitespace, it's extraneous variables
and sometimes even extraneous functions and function call overhead.
And Python is supposed to be unbureaucratic. People kept looking for
ways to write conditional expressions instead of spewing the logic
across multiple statements for a reason: the code is often cleaner
that way.

True, but it really was the multitude of buggy workarounds for the
lack of a ternary expression that sealed the deal, rather than the
benefits of ternary expressions in their own right :)

Given that I personally use ternary expressions solely as the right
hand side of an assignment statement, the reduction in vertical
whitespace usage really is the only thing they gain me. I guess if you
embedded them as an argument to a function call or other more
complicated expression then there may be additional savings. I prefer
not to do that though, since such things can get quite difficult to
parse mentally when reading them later.

Cheers,
Nick.
 
A

Antoon Pardon

if/else was added solely because people kept coming up with ways of
embedding a pseudo conditional inside expressions and writing buggy
code in the process. All it really saves you in practice is a bit of
vertical whitespace, so, no, you still don't need it - but if you
insist on doing it, at least there's now an easy way to do it
correctly.

If I remember correctly it was added because one of the python
developers was bitten by a bug in the standard library code
that was caused by the use of the and-or emulation, mentioned
in the FAQ.

And although one indeed doesn't need this. There are a lot
of things in Python one doesn't need. Python could be limited
to single operator expressions. You don't need:

x = a * b + c

You can write it just like this:

x = a * b
x = x + c


And if you want a list comprehension like the following:

ls = [ x * x + 4 for x in xrange(10)]

You can of course write it as follows:

def sqrplus4(a):
rs = a * a
return rs + 4

ls = [sqrplus4(x) for x in xrange(10)]


Now of course noone would defend such a limitation on the grounds
that one doesn't need the general case and that the general case
will only save you some vertical space.

But when it came to the ternary operator that was exactly the
argument used, to defend the lack of it.
Gosh, and here I thought treating programmers as non-idiots was
actually one of the guiding philosophies in the discussion on python-
dev.

I have heard the argument: "Such a feature will be abused too easily"
and similar too many times to find this credible.
 
N

NickC

Those unit tests should *not*, though, exercise anything but the
public API, otherwise they're breaking encapsulation. Their assertion
should continue to be just as true after a refactoring of the internal
components as before.

Python must have bad unit tests then - the CPython test suite
explicitly tests private methods all the time.

There's actually an extremely good reason for doing it that way: when
the implementation of an internal method gets broken, the unit tests
flag it explicitly, rather than having to derive the breakage from the
breakage of 'higher level' unit tests (after all, you wouldn't factor
something out into its own method or function if you weren't using it
in at least a couple of different places).

Black box testing (testing only the public API) is certainly
important, but grey box and white box testing that either exploits
knowledge of the implementation when crafting interesting test cases,
or explicitly tests internal APIs can be highly beneficial in
localising faults quickly when something does break (and as any
experienced maintenance programmer will tell you, figuring out what
you actually broke is usually harder than fixing it after you find it).
 
A

Antoon Pardon

With leading underscores, you can see *at the point of dereference*
that the code is accessing private data.

But the leading underscore doesn't tell you whether it is your own
private date, which you can use a you see fit, or those of someone
else, which you have to be very carefull with.
 
C

cokofreedom

But the leading underscore doesn't tell you whether it is your own
private date, which you can use a you see fit, or those of someone
else, which you have to be very carefull with.

Well how is that different from public accessor and mutators of
private variables?
 
A

Antoon Pardon

Well how is that different from public accessor and mutators of
private variables?

Public accessor and mutators for private variables is a bad idea.
So I don't understand what point you are trying to make by suggesting
that the use of an underscore is just like it in this regard.
 
R

Roy Smith

Ben Finney said:
By definition, "private" functions are not part of the publicly
documented behaviour of the unit. Any behaviour exhibited by some
private component is seen externally as a behaviour of some public
component.

You know the difference between theory and reality? In theory, there is
none... Sometimes it's useful to test internal components. Imagine this
class:

class ArmegeddonMachine:
def pushTheButton(self):
"Destroy a random city"
city = self._pickCity()
self._destroy(city)

def _pickCity():
cities = ['New York', 'Moscow', 'Tokyo', 'Beijing', 'Mumbai']
thePoorSchmucks = random.choice(cities)
return 'New York'

def _destroy(self, city):
missle = ICBM()
missle.aim(city)
missle.launch()

The only externally visible interface is pushTheButton(), yet you don't
really want to call that during testing. What you do want to do is test
that a random city really does get picked.

You can do one of two things at this point. You can say, "But, that's not
part of the externally visible interface" and refuse to test it, or you can
figure out a way to test it. Up to you.
 
R

Russ P.

With leading underscores, you can see *at the point of dereference*
that the code is accessing private data. With a "this is private"
keyword you have no idea whether you're accessing private or public
data, because the two namespaces get conflated together.

That is true. But with the "priv" keyword you'll discover quickly
enough that you are trying to access private data (as soon as you run
the program). And even if a "priv" keyword is added, you are still
free to use the leading underscore convention if you wish.

The idea of being able to discern properties of an object by its name
alone is something that is not normally done in programming in
general. Yes, of course you should choose identifiers to be
descriptive of what they represent in the real world, but you don't
use names like "intCount," "floatWeight," or "MyClassMyObject" would
you? Why not? That would tell you the type of the object at the "point
of dereferencing," wouldn't it?
 
T

topher

That is true. But with the "priv" keyword you'll discover quickly
enough that you are trying to access private data (as soon as you run
the program). And even if a "priv" keyword is added, you are still
free to use the leading underscore convention if you wish.

The idea of being able to discern properties of an object by its name
alone is something that is not normally done in programming in
general. Yes, of course you should choose identifiers to be
descriptive of what they represent in the real world, but you don't
use names like "intCount," "floatWeight," or "MyClassMyObject" would
you? Why not? That would tell you the type of the object at the "point
of dereferencing," wouldn't it?

Sounds familiar.
http://en.wikipedia.org/wiki/Hungarian_notation
 
H

Hans Nowak

Then don't document it, or separate internal documentation (which is
never to pass through the wall) and public documentation (which your
users use). Nobody would (apart from your dev team and anyone told by
your dev team, which means you may fire the person for "lack of
discipline") know that there is such a thing and in consequence
wouldn't use it.

Don't tell your user not to use something, just don't tell them that
it exists and they won't use it.

I am not familiar with the actual software, but judging from "we expose the
spreadsheet object model to our users", I assume that users can discover the
undocumented attributes, using Python's introspection features, like dir(obj),
obj.__dict__, the inspect module, etc. So in this case, not telling them that
the attributes exist, will not stop them from finding out.
 
M

Marc 'BlackJack' Rintsch

Isn't that contradictory: "Public for someone" I always
thought "public" meant accessible to virtually anyone.
Not to only someone.

For the programmer who writes or uses the private API it isn't really
"private", he must document it or know how it works. And he should IMHO
write tests for it and expect "private" functions written by others to be
tested.

Ciao,
Marc 'BlackJack' Rintsch
 
E

Ethan Furman

Ben said:
Then what you're really testing is the interactions of the "push the
button" function with its external interface: you're asserting that
the "push the red button" function actually uses the result from "pick
a random city" as its target.

Thus, the "pick a random city" function is being defined by you as
*interface* for the "push the button" function. Interfaces do need to
be unit tested.

This is done by having the unit test substitute a test double for the
"pick a random city" function, rigging that double so that its
behaviour is deterministic, and asserting that the "push the button"
function uses that deterministically-generated result.

It's at this point, of course, that the "pick a random city" function
has come rather close to being public API. The designer needs to have
a fairly good reason not to simply expose the "pick a random city"
function in the API.




Note that the only thing I'm saying one shouldn't do is unit test the
private function *directly*, since the design decision has been made
that it's not part of the API. The *behaviour* of the function, as
exposed via the "push the button" piblic API, should certainly be unit
tested.

Any behaviour of that function that's *not* exhibited through the
behaviour of some public API should *not* be unit tested, and should
in fact be removed during refactoring -- which will not break the unit
test suite since no unit tests depend on it.

Alternatively, as above, the design decision can be made that, in
fact, this function *is* part of the public API since external things
are depending on it directly. Then it needs full direct unit test
coverage.

I must be missing something in this discussion. Perhaps it's the
appropriate point of view. At any rate, it seems to me that any and
every function should be tested to ensure proper results. It's my
understanding that unit testing (a.k.a. PyUnit) is designed for just
such a purpose.

So is this argument simply over *who* should be (unit) testing the
internals? I.e. The fellow that wrote the code library vs. the other
fellow that wants to use the library? Or is it actually, as it seems,
over the internals being tested at all?
 
R

Roy Smith

Ben Finney said:
Then what you're really testing is the interactions of the "push the
button" function with its external interface: you're asserting that
the "push the red button" function actually uses the result from "pick
a random city" as its target.

No, that's not what I'm testing at all. I want to test that the cities
really do get picked randomly. Notice the implementation I gave:

def _pickCity():
cities = ['New York', 'Moscow', 'Tokyo', 'Beijing', 'Mumbai']
thePoorSchmucks = random.choice(cities)
return 'New York'

There's a deliberate bug in there, i.e. it always returns 'New York', which
(as a resident of that city), I would find distressing in such an
application. If you plugged in some other function for _pickCity(), you'd
never discover that bug until it was too late.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top