Why less emphasis on private data?

P

Paul Boddie

Paul said:
Right, the problem is if those methods start changing the "private"
variable. I should have been more explicit about that.

class A:
def __init__(self):
self.__x = 3
def foo(self):
return self.__x

class B(A): pass

class A(B):
def bar(self):
self.__x = 5 # clobbers private variable of earlier class named A

Has this ever been reported as a bug in Python? I could imagine more
sophisticated "name mangling": something to do with the identity of the
class might be sufficient, although that would make the tolerated
"subversive" access to private attributes rather difficult.

Paul
 
P

Paul Rubin

Paul Boddie said:
Has this ever been reported as a bug in Python? I could imagine more
sophisticated "name mangling": something to do with the identity of the
class might be sufficient, although that would make the tolerated
"subversive" access to private attributes rather difficult.

If you mean the object id, I don't think you can use it for name
mangling, since the mangled names have to survive code marshalling
and you may end up with different object id's.

I've just never encountered any legitimate use for the "subversive"
access and if it's really necessary, it's better to do it through some
kind of well-designed reflection interface in the class, rather than
with a crock like name mangling.
 
D

Duncan Booth

Paul Boddie said:
Has this ever been reported as a bug in Python? I could imagine more
sophisticated "name mangling": something to do with the identity of the
class might be sufficient, although that would make the tolerated
"subversive" access to private attributes rather difficult.

Paul
If it worries you then you can always check for it and disallow any
hierarchies where it could be a problem. For that matter PyChecker ought to
be able to catch this situation (maybe it already does, I haven't looked).
def __new__(cls, name, bases, dct):
print "new",name
c = type.__new__(cls, name, bases, dct)
assert not name in [b.__name__ for b in c.__mro__[1:]]
return c


new A
new B
new A

Traceback (most recent call last):
File "<pyshell#24>", line 1, in <module>
class A(B): pass
File "<pyshell#17>", line 5, in __new__
assert not name in [b.__name__ for b in c.__mro__[1:]]
AssertionError
 
S

Steven D'Aprano

I have no idea how often if ever.

You've established that there's a name conflict when you do so, which
leads to bugs. So how often do you get bitten by that particular type of
bug?

I inherit from library classes all
the time, without trying to examine what superclasses they use. If my
subclass happens to have the same name as a superclass of some library
class (say Tkinter) this could happen. Whether it ever DOES happen, I
don't know, I could only find out by examining the implementation
details of every library class I ever use, and I could only prevent it
by remembering those details.

class MySubClass(SomeSuperclass):
try:
__my_private_attribute
except AttributeError:
__my_private_attribute = some_value
else:
raise ValueError("Name conflict with private attribute!")

Problem solved.

*wink*

That is an abstraction leak and is
dangerous and unnecessary. The name mangling scheme is a crock. How
often does anyone ever have a good reason for using it,

Exactly. I never use it.

The truth of the matter is, MyClass.__private is not private at all. It is
still a public attribute with a slightly unexpected name. In other words,
if you want to code defensively, you should simply assume that Python has
no private attributes, and code accordingly.

Problem solved.
 
H

Hendrik van Rooyen

Paul Rubin said:
If you want to write bug-free code, pessimism is the name of the game.

A healthy touch of paranoia does not come amiss either...

And even then things foul up in strange ways because your head
is never quite literal enough.

When you hear a programmer use the word "probability" -
then its time to fire him, as in programming even the lowest
probability is a certainty when you are doing millions of
things a second.

But this is off topic, really - I don't think that hiding things make
much difference, especially as the python hiding is not absolute.

- Hendrik
 
P

Paul Boddie

Steven said:
The truth of the matter is, MyClass.__private is not private at all. It is
still a public attribute with a slightly unexpected name. In other words,
if you want to code defensively, you should simply assume that Python has
no private attributes, and code accordingly.

Problem solved.

Well, it isn't really solved - it's more avoided than anything else.
;-)

Still, if one deconstructs the use of private data in various
programming languages, one can identify the following roles (amongst
others):

1. The prevention of access to data from program sections
not belonging to a particular component.
(The classic "keep out" mechanism.)
2. The enforcement of distinct namespaces within components.
(Making sure that subclass attributes and superclass attributes
can co-exist.)
3. To support stable storage layouts and binary compatibility.

Most Python adherents don't care too much about #1, and Python isn't
driven by the need for #3, mostly due to the way structures (modules,
classes, objects) are accessed by the virtual machine. However, one
thing which does worry some people is #2, and in a way it's the
forgotten but more significant benefit of private data.

Before I became completely aware of the significance of #2, I remember
using various standard library classes which are meant to be subclassed
and built upon, thinking that if I accidentally re-used an attribute
name then the operation of such classes would be likely to fail in
fairly bizarre ways. Of course, a quick browse of the source code for
sgmllib.SGMLParser informed me of the pitfalls, and I'm sure that
various tools could also be informative without the need to load
sgmllib.py into a text editor, but if I had been fully aware of the
benefits of private attributes and could have been sure that such
attributes had been used (again, a tool might have given such
assurances) then I wouldn't have needed to worry.

So I suppose that to "code accordingly" in the context of your advice
involves a manual inspection of the source code of superclasses or the
usage of additional tools. Yet I suppose that this isn't necessarily
unusual behaviour when working with large systems.

Paul
 
P

Paul Rubin

Steven D'Aprano said:
You've established that there's a name conflict when you do so, which
leads to bugs. So how often do you get bitten by that particular type of bug?

I don't know. Likely zero, possibly not. I'm sure I've written many
bugs that have never been detected by me or anyone else. I've
probably written bugs that crashed an application for some user but
they just cursed me out and never bothered to tell me about the crash.
Maybe I've even written bugs that leaked a user's private data without
the user noticing, but discovered by some attacker intercepting the
data who is cackling about the bug while keeping it secret. There's
no way for me to think I'll ever find out.

I'd much prefer to be able to say of any type of bug, "the number is
exactly zero as a known fact, because it's inherent in Python's design
that it's impossible to have that type of bug". Language designs
should aim to let programmers say things like that as often as possible.
class MySubClass(SomeSuperclass): ...
raise ValueError("Name conflict with private attribute!")
Problem solved.

No good, Python allows creating classes and attributes on the fly.
The superclass could create its private variable after the subclass is created.
The truth of the matter is, MyClass.__private is not private at all. It is
still a public attribute with a slightly unexpected name. In other words,
if you want to code defensively, you should simply assume that Python has
no private attributes, and code accordingly.

Problem solved.

Well, "problem moved", not "problem solved". Now you have the problem
of having to know the names of every attribute any related class might
use when you write your own class. That is why other languages have
private variables and Python has name mangling--to solve a real problem.
Except Python's solution is a leaky kludge.
 
N

Neil Cerutti

Interesting. I just tried that. mod1.py contains:

class B:
def foo(self): self.__x = 'mod1'

mod2.py contains:

class B:
def bar(self): self.__x = 'mod2'

And the test is:

from mod1 import B as B1
from mod2 import B as B2

class A(B1, B2): pass

a = A()
a.foo()
print a._B__x
a.bar()
print a._B__x

Sure enough, mod2 messes up mod1's private variable.

When faced with this situation, is there any way to proceed
besides using composition instead?
 
N

Neil Cerutti

void test(void)
{
static int i;
}


Do you agree that i is "private" to test ?

In C one uses the pointer to opaque struct idiom to hide data.
For example, the standard FILE pointer.
 
J

Jussi Salmela

Neil Cerutti kirjoitti:
In C one uses the pointer to opaque struct idiom to hide data.
For example, the standard FILE pointer.

To surlamolden: I don't know how you define private, but if one defines
in C an external static variable i.e. a variable outside any functions,
on the file level, the scope of the variable is that file only.

To hg: One does not need in C the static keyword to make a variable
defined inside a function i.e. a so called 'automatic variable' private
to that test. Automatic variables are private to their function by
definition. The static keyword makes the variable permanent i.e. it
keeps its value between calls but it is of course private also.

To Neil Cerutti: If a programmer in C has got a pointer to some piece of
memory, that piece is at the mercy of the programmer. There's no data
hiding at all in this case.

To whom it may concern: please stop comparing C and Python with regard
to privacy and safety. They are two different worlds altogether. Believe
me: I've been in this world for 2.5 years now after spending 19 years in
the C world.

Cheers,
Jussi
 
C

Chris Mellon

Private data in the C++ and Java OO worlds is so taught so much and
emphasized so often that people have started thinking of it as being
desirable for its own sake. But the primary motivation for it grew out
of the need to maintain compatible interfaces. These languages rely on
a great deal of shared information between provides and clients of
interfaces in order to work correctly - public/private interfaces are
simply a reflection of that requirement (and the fact that your
clients still need to see the stuff you declare as private is an
example of a leak in that abstraction).

Python doesn't have these problems, so the only use for private
information is to warn your clients away from access to certain names.
There's no need for compiler enforcement of that, as a convention is
just as effective.

The remaining arguments are generally outgrowths of "but my code is
SECRET", which just isn't true in general, even less true of Python,
and not really a healthy attitude anyway.
 
P

Paul Boddie

Chris said:
Private data in the C++ and Java OO worlds is so taught so much and
emphasized so often that people have started thinking of it as being
desirable for its own sake. But the primary motivation for it grew out
of the need to maintain compatible interfaces.

This is generally true, yes.

[...]
Python doesn't have these problems, so the only use for private
information is to warn your clients away from access to certain names.
There's no need for compiler enforcement of that, as a convention is
just as effective.

You'll have to be more clear, here. If you're writing a subclass of
some other class then any usage of private attributes in the superclass
potentially provides the benefit of a free choice in attribute names in
the subclass. If you wanted to warn people away from certain names, it
would be the public attributes that would require the warning, noting
that "your clients" in this context includes people extending classes
as well as those merely instantiating and using them.
The remaining arguments are generally outgrowths of "but my code is
SECRET", which just isn't true in general, even less true of Python,
and not really a healthy attitude anyway.

I don't care about secret attributes, and the namespace privacy aspect
doesn't bother me enough to use private attributes anyway, especially
since I'm the author of most of the superclasses I'm extending. But
either providing namespace privacy or convenient tools to mitigate
namespace sharing seems fairly important to me, at least.

Paul
 
N

Neil Cerutti

Neil Cerutti kirjoitti:

To Neil Cerutti: If a programmer in C has got a pointer to some
piece of memory, that piece is at the mercy of the programmer.
There's no data hiding at all in this case.

That's somewhat disingenuous. You get just as much data hiding
with an opaque data type in C as you get in C++ or Java.
 
T

time.swift

Wow, I got a lot more feedback than I expected!

I can see both sides of the argument, both on technical merits, and
more philosophical merits. When I first learned C++ I felt
setters/getters were a waste of my time and extra code. When I moved
to C# I still felt that, and with their 'Property" syntax I perhaps
felt it more. What changed my mind is when I started placing logic in
them to check for values and throw expections or (hopefully) correct
the data. That's probably reason one why I find it weird in Python

Reason two is, as the user of a class or API I *don't care* what is
going on inside. All I want visible is the data that I can change. The
'_' convention is nice.. I do see that. I guess my old OOP classes are
hard to forget about. I feel that the state of an object should be
"stable" and "valid" at all times, and if its going into an unstable
state - error then, not later. That's why I like being able to protect
parts of an instances state. If, as a provider of a calculation engine,
I let the user change the internal state of the engine, I have no
assurances that my product (the engine) is doing its job...

<shrugs>

I appreciate all the feed back and enjoyed reading the discussion. It
helps me understand why Python community has chosen the road they have.
- Thanks.
 
B

Bruno Desthuilliers

(e-mail address removed) a écrit :
Wow, I got a lot more feedback than I expected!

I can see both sides of the argument, both on technical merits, and
more philosophical merits. When I first learned C++ I felt
setters/getters were a waste of my time and extra code. When I moved
to C# I still felt that, and with their 'Property" syntax I perhaps
felt it more. What changed my mind is when I started placing logic in
them to check for values and throw expections or (hopefully) correct
the data. That's probably reason one why I find it weird in Python

Python does have properties too. The point is that you can as well start
with a plain attribute, then turn it into a computed one when (and if)
needed.
Reason two is, as the user of a class or API I *don't care* what is
going on inside.

Very true... until you have a legitimate reason to mess with
implementation because of a use case the author did not expect.
All I want visible is the data that I can change. The
'_' convention is nice.. I do see that. I guess my old OOP classes are
hard to forget about.

Access restriction is not a mandatory part of OO. Of course, objects are
supposed to be treated as "black boxes", but that's also true of a
binary executable, and nothing (technically) prevents you to open it
with an hex editor and hack it as you see fit... But then, you would not
complain about strange bugs, would you ?-)
I feel that the state of an object should be
"stable" and "valid" at all times,

That's fine. Just remember that, in Python, methods are attributes too,
and can be dynamically modified too. So when thinking about "object
state", don't assume it only implies "data" attributes. Heck, you can
even dynamically change the *class* of a Python object...
and if its going into an unstable
state - error then, not later. That's why I like being able to protect
parts of an instances state. If, as a provider of a calculation engine,
I let the user change the internal state of the engine, I have no
assurances that my product (the engine) is doing its job...

If you follow the convention, you are not responsible for what happens
to peoples messing with implementation. period. Just like you're not
responsible for what happens if someone hack your binary executable with
an hex editor.

Welcome to Python, anyway.
 
S

Steven D'Aprano

When you hear a programmer use the word "probability" -
then its time to fire him, as in programming even the lowest
probability is a certainty when you are doing millions of
things a second.

That is total and utter nonsense and displays the most appalling
misunderstanding of probability, not to mention a shocking lack of common
sense.
 
S

sturlamolden

Jussi said:
To surlamolden: I don't know how you define private, but if one defines
in C an external static variable i.e. a variable outside any functions,
on the file level, the scope of the variable is that file only.

Sure, in C you can hide instances inside an object image by declaring
them static. But the real virtue of static declarations is to assist
the compiler.

My definition of 'private' for this thread is the private attribute
provided by C++, Java and C#. When I program C I use another idiom,

/* THIS IS MINE, KEEP YOUR PAWS OFF */

and it works just as well. The same idiom works for Python as well.
 
A

Andrea Griffini

Steven said:
That is total and utter nonsense and displays the most appalling
misunderstanding of probability, not to mention a shocking lack of common
sense.

While I agree that the programming job itself is not
a program and hence the "consider any possibility"
simply doesn't make any sense I can find a bit of
truth in the general idea that *in programs* it is
dangerous to be deceived by probability.

When talking about correctness (that should be the
main concern) for a programmer "almost never" means
"yes" and "almost always" means "not" (probability
of course for example kicks in about efficency).

Like I said however this reasoning doesn't work
well applied to the programming process itself
(that is not a program... as programmers are not
CPUs; no matter what bigots of software engineering
approaches are hoping for).
Private variables are about the programming process,
not the program itself; and in my experience the
added value of C++ private machinery is very low
(and the added cost not invisible).
When working in C++ I like much more using
all-public abstract interfaces and module-level
all-public concrete class definitions (the so
called "compiler firewall" idiom).

Another thing on the same "line of though" of
private members (that should "help programmers")
but for which I never ever saw *anything but costs*
is the broken idea of "const correctness" of C++.
Unfortunately that is not something that can be
avoided completely in C++, as it roots in the core
of the language.

Andrea
 
H

Hendrik van Rooyen

That is total and utter nonsense and displays the most appalling
misunderstanding of probability, not to mention a shocking lack of common
sense.

Really?

Strong words.

If you don't understand you need merely ask, so let me elucidate:

If there is some small chance of something occurring at run time that can
cause code to fail - a "low probability" in all the accepted senses of the
word - and a programmer declaims - "There is such a low probability of
that occurring and its so difficult to cater for that I won't bother"
- then am I supposed to congratulate him on his wisdom and outstanding
common sense?

Hardly. - If anything can go wrong, it will. - to paraphrase Murphy's law.

To illustrate:
If there is one place in any piece of code that is critical and not protected,
even if its in a relatively rarely called routine, then because of the high
speed of operations, and the fact that time is essentially infinite, it WILL
fail, sooner or later, no matter how miniscule the apparent probability
of it occurring on any one iteration is.

How is this a misunderstanding of probability? - probability applies to any one
trial, so in a series of trials, when the number of trials is large enough - in
the
order of the inverse of the probability, then ones expectation must be that the
rare occurrence should occur...

There is a very low probability that any one gas molecule will collide with any
other one in a container - and "Surprise! Surprise! " there is nevertheless
something like the mean free path...

That kind of covers the math, albeit in a non algebraic way, so as not to
confuse what Newton used to call "Little Smatterers"...

Now how does all this show a shocking lack of common sense?

- Hendrik
 
D

Dennis Lee Bieber

Reason two is, as the user of a class or API I *don't care* what is
going on inside. All I want visible is the data that I can change. The
'_' convention is nice.. I do see that. I guess my old OOP classes are

Which is where Python is the cleanest of the batch...

The implementation "today" might be that instance.attribute is a
purely public data item. But "tomorrow" that same instance.attribute
might have been converted to a property with getter/setter methods
affecting a pseudo-private "_attribute" data item -- WITH NO CHANGE TO
THE EXISTING CLIENTS OF THE MODULE!

Try that with Java or C++ <G>
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,066
Latest member
VytoKetoReviews

Latest Threads

Top