Does Python really follow its philosophy of "Readability counts"?

L

Luis Zarrabeitia

Quoting Steven D'Aprano said:
Makes *no* sense? There's *no* good reason *at all* for the original
author to hide or protect internals?

My bad, sorry.
It makes sense... if the original author is an egotist who believes he must
control how I use that library. Or, if external forces make him do it (maybe
like, 'oh, if I change python, then I'm not using python anymore').
Let's be specific here. The list implementation in CPython is an array
with a hidden field storing the current length. If this hidden field was
exposed to Python code, you could set it to a value much larger than the
actual size of the array and cause buffer overflows, and random Python
code could cause core dumps (and possibly even security exploits).

In which case, my code would be broken. (Wait, let me be clear: in which case,
our team's code may be broken - but it was _our_ team's decision, knowing the risk).

If a variable is marked as... I don't like 'private', I'll call it
'implementation detail', I would not use it without good reason. Not even by
subclassing it. Why do you assume that I'd change list._length if I could? I
wouldn't.

Anyway, did you notice that your "counter-example" was a radical
change-the-way-python-works scenario? I also don't want to change the
interpreter's code on the fly. Now, if you take that as a confession that I
really, really, want enforced data hiding and that everything I've said is plain
wrong, so be it. After all, I treat python's interpreter as a black box, don't I?
So what you're saying is that the fundamental design of Python -- to be a
high-level language that manages memory for you while avoiding common
programming errors such as buffer overflows -- makes "no sense". Is that
what you intended?

Yes, that's what I intended, obviously. I'd like to have buffer overflows in python.
In case you don't understand irony: don't go putting words in my mouth. I'm not
putting words in yours.
As I see it, you have two coherent positions. On the one hand, you could
be like Mark Wooding, and say that Yes you want to risk buffer overflows
by messing with the internals -- in which case I'm not sure what you see
in Python, which protects so many internals from you. Or you can say that
you made a mistake, that there are *some* good reasons to protect/hide
internals from external access.

Or, I could have a third option: assume that I am a grownup who knows what he is
doing. After all, even with all those "protections" in list, I could just create
an extension module to shoot me in the foot anyway, if I really wanted to.
In the second case, the next question is, why should it only be code
written in C that is allowed that protection?

Bug? Not worth the effort of exposing those variables? I don't know...

[Btw, do you realize that C++'s private also don't provide that protection? I
have almost no experience with C++ and found it trivial to circumvent]

I don't think this is going anywhere. Now you are trying to push me to the
extremes, changing what I _said_ for your exaggerated interpretation of it just
so you could shoot it down, or force me to say that I want buffer overflows in
python. I believe that's called "strawman".

I stand by my words - but not by your "interpretation" of them:

Do you _really_ read from that sentence that I should dislike python because it
makes it a bit harder to get a buffer overflow with their native types?
 
P

Paul Rubin

Rhodri James said:
My experience with medium-sized organisations (50-100 people) is that
either you talk to Fred directly, or it doesn't happen. In particular
the more people (especially PHBs) that get involved, the slower the
change will come and the less like your original requirement it will look.

Usually there would be enough communication with Fred that Fred is be
aware of the problem and the amount of work needed to fix it (maybe
you've even submitted a patch that Fred can commit after review and
testing), but Fred has ten thousand other things that also need
getting done. The job of the PHB's is to stay on top of what issues
are important for the overall project and juggle the priorities of
individual tasks. They figure out whether developing some feature
pushes something else out of the way for the upcoming release, or gets
slid off to the next one, or whatever. When they do a good job, that
takes a big load off of the programmers. It is, to some extent, also
part of the PHB's job to "filter the traffic" and protect both Fred
and you from making too many interruptions for each other. This is
especially important if you're the type of programmer who tends to get
their hands in a lot of different areas of a project.
 
R

Rhodri James

It is, to some extent, also
part of the PHB's job to "filter the traffic" and protect both Fred
and you from making too many interruptions for each other. This is
especially important if you're the type of programmer who tends to get
their hands in a lot of different areas of a project.

In a perfect environment this is true. In an environment where the
PHBs are overstretched because their PHBs aren't up to much, that
filter function tends to become a full roadblock. Under those
circumstances you have three choices: 1) defeat the data hiding
by talking to Fred directly; 2) defeat the data hiding by hacking
away yourself and getting Fred's forgiveness later; 3) give up.

See, we're back on topic!
 
S

Steven D'Aprano

Please, point out where I said that!

I'm pretty sure that the only time I commented on this particular point
(in message <[email protected]>), I said:

[snip]

Yes, that was the quote I was thinking of.

While I realise I didn't spell it out, the semantics I had in mind where

foo.len = n

means

if n < 0:
raise ValueError, 'don\'t be stupid'
elif len(foo) < n:
foo += [None] * (n - len(foo))
else:
foo[n:] = []

But that's not "messing with the internals". That's the conceptual
equivalent of a Python getter/setter:


# Pseudo-code, untested and incomplete
class MyList(list):
def __init__(self):
self._length = 0
def _getlength(self):
return self._length
def _setlength(self, n):
if n < 0:
raise ValueError("don't be stupid")
elif len(self) < n:
self += [None] * (n - len(self))
else:
self[n:] = []
self._length = n
len = property(_getlength, _setlength)


alist = MyList()
alist.length = 1000 # safe

This is hardly what "messing with the internals" is! If your idea of
modifying hidden, implementation-specific details is "use a safe getter/
setter implementation that holds your hand and protects you from doing
anything stupid", then no wonder you object to data hiding. I'd object to
it to, if that's what I understood by it.

What I'm talking about is unsafe, direct access to the underlying C slots
with no hand-holding. You know: messing with the internals with no nice
safe interface between you and disaster:

alist._length = 2**128 # unsafe!


Safety is good. Escape hatches are good, too.

Something we can agree on.


In the second case, the next question is, why should it only be code
written in C that is allowed that protection?

Because Python code can't cause those sorts of problems without
resorting to the escape hatches (e.g., ctypes). And, very
significantly, because C code /needs/ that protection and Python
basically doesn't.

The basic difference is that C code is fundamentally brittle: if you
mess up its invariants, it can crash horribly and possibly allow its
brain to be taken over by evil people. Python code is fundamentally
robust. The worst that can happen[1] is that the interpreter raises an
exception. This makes it ideally suited to having a more relaxed
attitude to life. And that, in turn, makes it approachable, hackable
interactively, fun!

No, it's not the worst that can happen.

"I find it amusing when novice programmers believe their main job is
preventing programs from crashing. ... More experienced programmers
realize that correct code is great, code that crashes could use
improvement, but incorrect code that doesn't crash is a horrible
nightmare."

http://www.pphsg.org/cdsmith/types.html
 
R

Russ P.

My bad, sorry.
It makes sense... if the original author is an egotist who believes he must
control how I use that library.

If the original author provides you with the source code and the right
to modify it, he cannot possibly control how you use the library. You
can trivially disable any access controls. But for some reason that's
not enough for you.

Has it occurred to you that some users might actually *want* access
controls? Maybe some users want to actually use the library as the
author intended it to be used. What a bizarre concept!

Oh, but only a paranoid fool could possibly want access controls, eh?
Who's the egotist here?
 
S

Steven D'Aprano

My bad, sorry.
It makes sense... if the original author is an egotist who believes he
must control how I use that library.

Then I guess Guido must be such an egotist, because there's plenty of
internals in Python that you can't (easy) mess with.

Or, if external forces make him do
it (maybe like, 'oh, if I change python, then I'm not using python
anymore').

That parenthesised comment makes no sense to me. Python has changed
significantly since it was first released. Recently, print became a
function instead of a statement, and one of the motivations for this was
to allow people to change the behaviour of Python's print simply by
defining a new function. "Shadowing built-ins", as they call it, is a
feature, not a bug. I can't see any good reason for thinking that if you
change (say) the way Python prints, you don't have Python any more.

Even more fundamental changes have occurred, e.g. new style classes,
ABCs, nested scopes.


[...]
If a variable is marked as... I don't like 'private', I'll call it
'implementation detail', I would not use it without good reason. Not
even by subclassing it. Why do you assume that I'd change list._length
if I could? I wouldn't.

I didn't say you would change it on a whim. I said that *if* it were
exposed to Python code, you *could* change it. You might change it
because you thought you had a good reason to. You might change it by
accident. You might not realise the consequences of changing it. Who
knows? It doesn't matter what your motives are.

My point is that you claimed that there is no good reason at all for
hiding implementation details. Python is full of implementation details
which are quite effectively hidden from Python programmers. So there are
two possibilities:

(1) you are right that it "makes no sense" (your words) for the original
author (in this case, Guido) to hide those implementation details from
Python programmers; or

(2) you are wrong that it "makes no sense", because there is at least one
case where the original author (Guido again) did a sensible thing by
hiding implementation details.

In an effort to avoid going round and round in circles, let me explicitly
say that option (2) does not imply that it always makes sense to hide
implementation details.


Anyway, did you notice that your "counter-example" was a radical
change-the-way-python-works scenario?

No, my scenario is merely extending what you can already do with pure-
Python classes to built-in classes written in C. It would have a radical
effect (pure Python code could core dump easily) but it wouldn't be a
radical change. It might take as little as one new function.


[...]
Yes, that's what I intended, obviously. I'd like to have buffer
overflows in python. In case you don't understand irony: don't go
putting words in my mouth. I'm not putting words in yours.

And neither am I. I'm pointing out the logical implications of your
position. If you find those implications unpleasant, then perhaps you
should modify your position to be less extreme and more realistic.

Or, I could have a third option: assume that I am a grownup who knows
what he is doing.

This is totally orthogonal to what we're discussing. Whether you are a
grownup or a child, whether you have good reasons or bad reasons, you can
still make either of the two choices.

After all, even with all those "protections" in list,
I could just create an extension module to shoot me in the foot anyway,
if I really wanted to.

Yes you could, and you could hack the OS to manipulate data behind the
scenes, and you could build a hardware device to inject whatever data you
want directly into the memory. You can do any of those things. So what?

Data hiding isn't about some sort of mythical 100% certainty against any
imaginable failure mode. Data hiding is a tool like any other, and like
all tools, it has uses and misuses, and it works under some circumstances
and not others. Wrenches are excellent for tightening bolts even though
they don't work in weightlessness (the astronaut spins around instead),
and hammers are good for hammering nails even though they won't work on
the surface of Pluto (the metal will become brittle and shatter). Data
hiding is no different.

If you don't get 100% certainty that there will never be a failure no
matter what, what do you get? Just off the top of my head, it:

* makes it easier for an optimising compiler to give fast code if it
doesn't have to assume internal details can be changed;

* makes it easier to separate interface from implementation when you can
trust that the implementation actually isn't being used;

* gives the developer more freedom to change the implementation;

* makes it possible for meaningful correctness proofs;

* reduces the amount of interconnections between different parts of your
program by ensuring that all interaction goes through the interface
instead of the implementation;

* which in turn reduces the amount of testing you need to do;

and possibly others.


[...]
I don't think this is going anywhere. Now you are trying to push me to
the extremes, changing what I _said_ for your exaggerated interpretation
of it just so you could shoot it down, or force me to say that I want
buffer overflows in python. I believe that's called "strawman".

No, I'm not changing anything you said. I'm pointing out the implications
of what you said. Don't blame me for seeing what logical consequences
following from your statement.

I stand by my words - but not by your "interpretation" of them:


Do you _really_ read from that sentence that I should dislike python
because it makes it a bit harder to get a buffer overflow with their
native types?

Well, you tell me: does it make sense for Guido to have decided to make
it hard for pure Python developers to cause buffer overflows?

If your answer is Yes, it makes sense, then obviously your earlier
statement that it makes no sense is *wrong*, at least under some
circumstances. Then we can make progress: data hiding isn't *always* evil
and anti-freedom and useless, it's okay when Guido does it. Then we can
act like grownups and discuss under what other circumstances it is or
isn't good to use data hiding, instead of making sweeping generalisations
that it is never good and always useless.

If your answer is No, it makes no sense, Guido was wrong to hide
implementation details from Python developers, then I can't imagine what
you get out of the stifling, unpleasant B&D language Python. Perhaps you
like the syntax?
 
L

Luis Zarrabeitia

Quoting "Russ P. said:
If the original author provides you with the source code and the right
to modify it, he cannot possibly control how you use the library. You
can trivially disable any access controls. But for some reason that's
not enough for you.

No, I'm not satisfied with forking python just to use sys._getframe.
Has it occurred to you that some users might actually *want* access
controls? Maybe some users want to actually use the library as the
author intended it to be used. What a bizarre concept!

Huh?
Then... use it as the author intended. I am _not_ forcing you to use the
obj._protected attributes!

Even I run pylint against third party libraries just to assess if the risk of
them messing with someone else's internals is worth taking (as in the case of
inspect.currentframe, which is exactly the same as sys._getframe) or not (random
library downloaded from the net).
Oh, but only a paranoid fool could possibly want access controls, eh?
Who's the egotist here?

See? You too changed what I said. Somehow you managed to delete the _other_
situation I gave. Not worth correcting it.
 
L

Luis Zarrabeitia

Quoting Steven D'Aprano said:
Then I guess Guido must be such an egotist, because there's plenty of
internals in Python that you can't (easy) mess with.

Yeap, ignore the second part, and claim that I only said this.
That parenthesised comment makes no sense to me.

It was directly countering your 'list' example. _I_ don't want to change
_python_, nor python's assumptions and assurances. A standard python that can
segfault would be no python. Again, if you think that means that deep down I
like enforced data hiding, so be it.
[...]
If a variable is marked as... I don't like 'private', I'll call it
'implementation detail', I would not use it without good reason. Not
even by subclassing it. Why do you assume that I'd change list._length
if I could? I wouldn't.

I didn't say you would change it on a whim. I said that *if* it were
exposed to Python code, you *could* change it. You might change it
because you thought you had a good reason to. You might change it by
accident. You might not realise the consequences of changing it. Who
knows? It doesn't matter what your motives are.

Exactly, they don't matter to you, unless you happen to be running my code.
My point is that you claimed that there is no good reason at all for
hiding implementation details. Python is full of implementation details
which are quite effectively hidden from Python programmers. So there are
two possibilities:

I didn't say "at all". Those were your words, not mine.
I said that it makes no sense that the power lies on _you_ instead of on _my
team_. And, when I said that, I recall we were talking about the python
language, not C.
(1) you are right that it "makes no sense" (your words) for the original
author (in this case, Guido) to hide those implementation details from
Python programmers; or

Just to be clear: I think the opposite.
He made a language and interpreter, and it ensures that it will not segfault
because of incorrect pure python code. That is my blackbox. In doing that, he
made a language where I don't have to worry that much about enforcing access
restrictions. Again, if you think that means that I want enforced data hiding in
python, so be it.
(2) you are wrong that it "makes no sense", because there is at least one
case where the original author (Guido again) did a sensible thing by
hiding implementation details.

hiding the implementation details of a C implementation... not python.
In an effort to avoid going round and round in circles, let me explicitly
say that option (2) does not imply that it always makes sense to hide
implementation details.

Huh?
It makes sense to hide implementations details. I'd say it always makes sense.
What doesn't make sense is that someone fights so vehemently to stop me from
getting at them, on my code, on my systems.
[...]
Yes, that's what I intended, obviously. I'd like to have buffer
overflows in python. In case you don't understand irony: don't go
putting words in my mouth. I'm not putting words in yours.

And neither am I. I'm pointing out the logical implications of your
position. If you find those implications unpleasant, then perhaps you
should modify your position to be less extreme and more realistic.

But it is realistic. You put the words "at all", and you shifted the discussion
from Python to C, and from programs in python to python's implementation.

[snip the comments about the advantages of data hiding. We are not talking about
data hiding, we are talking about having data hiding enforced against me]
Well, you tell me: does it make sense for Guido to have decided to make
it hard for pure Python developers to cause buffer overflows?

Yes it does.
And this answers my question... You do consider the fact that I like python,
that I like that python is not C, and that I use python as a blackbox, as a
confirmation that I want enforced data hiding.

I was truthful when I said that: if you think so, then so be it, feel free to
think that I want it. We are obviously not on the same page here, we are not
even talking about the same language. I guess (just a guess) that in your view,
if I really didn't want enforced data hiding, I'd be programming directly in
machine code or maybe making my own CPUs. If your idea of enforced data hiding
includes that (it obviously includes the interpreter), then what I said was
wrong. I assumed we were talking about python and that I didn't need to
explicitly quantify my expressions.

And, FYI, when programming in java, C++ or C#, I do use "private" and
"protected" variables, not becasue I want to forbid others from using it, but
because it is [rightly?] assumed that everything marked as public is safe to use
- and I consider that a strong enough "external" reason to do it.
 
R

Russ P.

No, I'm not satisfied with forking python just to use sys._getframe.

Calling a one-word change a "fork" is quite a stretch, I'd say.
Huh?
Then... use it as the author intended. I am _not_ forcing you to use the
obj._protected attributes!

But what if I want an automatic check to verify that I am using it as
the author intended? Is that unreasonable? Think of enforced access
restriction as an automatic "assert" every time an attribute is
accessed that it is not a private attribute.

I may want this automatic verification in my own code just for peace
of mind. More importantly, a project manager may want it to verify
that no one on the development team is accessing private attributes.
Sure, he could do that with code reviews, but code reviews are far
more expensive (and less reliable in some ways) than a simple check
enforced by the language itself.

Without enforced access protection, depending on code reviews to
detect the use of private attributes is a bit like depending on
security guards to keep doors closed without putting locks on the
doors. You don't need a lock on a door if you can afford to post a
security guard there full time, but doesn't it make more sense to put
a lock on the door and have a security guard check it only
occasionally?
 
R

Russ P.

I didn't say "at all". Those were your words, not mine.
I said that it makes no sense that the power lies on _you_ instead of on _my
team_. And, when I said that, I recall we were talking about the python
language, not C.

Once again, if you have the source code for the library (and the right
to modify it), how does the "power" lie with the library implementer
rather than you the user?

You say you don't want to "fork" the library. Let's stipulate for the
sake of argument that a one-line change is indeed a "fork." Think
about what you are saying. You are saying that you should dictate how
the producer of the library should implement it because you don't want
to be bothered to "fork" it. If you don't like his design decisions,
shouldn't the onus be on *you* to make the trivial change necessary to
get access to what you want?

Imagine a person who repairs computers. He is really annoyed that he
constantly has to remove the cover to get at the guts of the computer.
So he insists that computers cases should be made without covers.
After all, manufacturers put covers on computers only because they
don't trust us and think we're too "stupid" to safely handle an
uncovered computer box.

That is logically equivalent to your position on enforced access
restrictions in software.
And, FYI, when programming in java, C++ or C#, I do use "private" and
"protected" variables, not becasue I want to forbid others from using it, but
because it is [rightly?] assumed that everything marked as public is safe to use
- and I consider that a strong enough "external" reason to do it.

You could just use leading underscores and note their meaning in the
documentation. If that's such a great approach, why not do it? Yes, I
know, it's not a widely used convention in those other languages. Fair
enough. But you could still do it if it's such a good idea.
 
L

Luis Zarrabeitia

Quoting "Russ P. said:
Once again, if you have the source code for the library (and the right
to modify it), how does the "power" lie with the library implementer
rather than you the user?

You say you don't want to "fork" the library. Let's stipulate for the
sake of argument that a one-line change is indeed a "fork."

It is. For starters, I'd lose the information of "this attribute was intended to
be internal and I'm accessing it anyway".
Think
about what you are saying. You are saying that you should dictate how
the producer of the library should implement it because you don't want
to be bothered to "fork" it.

No. I am not dictating _anything_. The beauty of it, you don't have to do
_anything_ for this to happen.

Now, you may say that I'm trying to force you to relax and do nothing instead of
complaining because the language I use doesn't put enough restrictions on me.
If you don't like his design decisions,
shouldn't the onus be on *you* to make the trivial change necessary to
get access to what you want?

Or contacting him about it and maybe send him a patch, sure, why not. But this
has nothing to do with enforced data hiding. Having obj._public_var is just as
badly designed as having "private public_var".
Imagine a person who repairs computers. He is really annoyed that he
constantly has to remove the cover to get at the guts of the computer.
So he insists that computers cases should be made without covers.
After all, manufacturers put covers on computers only because they
don't trust us and think we're too "stupid" to safely handle an
uncovered computer box.

That is logically equivalent to your position on enforced access
restrictions in software.

Do you realize that most computer cases are trivially easy to open? (Nevermind
that there are other reasons... dust, protection against physical damage, etc.
My PC is locked enough to protect them, but opened enough so I can "play" with
it whenever I need)
And, FYI, when programming in java, C++ or C#, I do use "private" and
"protected" variables, not becasue I want to forbid others from using it, but
because it is [rightly?] assumed that everything marked as public is safe to use
- and I consider that a strong enough "external" reason to do it.

You could just use leading underscores and note their meaning in the
documentation. If that's such a great approach, why not do it? Yes, I
know, it's not a widely used convention in those other languages. Fair
enough.

It is not a widely used convention, and that is reason enough for me. It's quite
a contradiction to say in the code "this thing is safe to use" and then document
it as "highly unsafe - do not touch". With Java and C# I'm more lenient (and
work more with explicit interfaces rather than just the public/protected/private
thing).

BTW, the actual 'private' case for most languages is a different beast: it is
used to prevent namespace pollution/name clashes. I can't easily simulate those
with public attributes in C#/Java/C++ (but I concede that their 'privates' do a
better job at this than python's self.__local)
But you could still do it if it's such a good idea.

I think someone commented in this thread about a case where he had to do exactly
that.

[copying from your other reply]
But what if I want an automatic check to verify that I am using it as
the author intended? Is that unreasonable? Think of enforced access
restriction as an automatic "assert" every time an attribute is
accessed that it is not a private attribute.

I think that was a reply to a message where I said that I used pylint run those
checks on third party libraries. And I obviously can do the same with my own
code. I don't have threading now, so I can't check if I really said that. If I
didn't, well, I'm saying it now.

Now, as Paul Robin pointed out, those statics checks done by pylint can't catch
a runtime workaround using eval, exec or getattr/setattr. But neither can C++.

By the way, I urge you to try to write code that pylint doesn't complain about.
It's easy to not be satisfied with the checks it provides if you haven't used it.
 
R

Russ P.

It is. For starters, I'd lose the information of "this attribute was intended to
be internal and I'm accessing it anyway".

Not really. When you get a new version of the library and try to use
it, you will quickly get a reminder about the change (assuming your
tests provide sufficient converage, and also assuming that the
attribute is not made public in the new version). So you don't really
even need to keep track of the change.
No. I am not dictating _anything_. The beauty of it, you don't have to do
_anything_ for this to happen.

You are trying to dictate that the library implementer not be allowed
to use enforced access restriction. And, in the larger sense, you are
trying to dictate that access restrictions not be enforced in Python.
Now, you may say that I'm trying to force you to relax and do nothing instead of
complaining because the language I use doesn't put enough restrictions on me.

And you are trying to put restrictions on anyone who might prefer to
enforce access restrictions. If you don't allow them to do that, you
are restricting them.
Or contacting him about it and maybe send him a patch, sure, why not. But this
has nothing to do with enforced data hiding. Having obj._public_var is just as
badly designed as having "private public_var".

Sure, go ahead and contact him. If he agrees that a private attribute
should be public, then the problem is solved. But if he does not
agree, he should not be forced to bend to your desire.
Do you realize that most computer cases are trivially easy to open? (Nevermind

That was exactly my point. Deleting the word "private" (or whatever)
is also trivially easy if you have access to the source code.
 
L

Luis Zarrabeitia

Quoting "Russ P. said:
Not really. When you get a new version of the library and try to use
it, you will quickly get a reminder about the change (assuming your
tests provide sufficient converage, and also assuming that the
attribute is not made public in the new version). So you don't really
even need to keep track of the change.

See? With every new version that doesn't change the behaviour, I have to modify
the source just to see if the tests run. That _is_ a fork. And that's assuming
the bright case where I have the source.
You are trying to dictate that the library implementer not be allowed
to use enforced access restriction. And, in the larger sense, you are
trying to dictate that access restrictions not be enforced in Python.

Now, please, explain to me, why are you so interested on preventing me from
using the internals on my computer? If you want controls in the code that runs
on your system, you can.
Sure, go ahead and contact him. If he agrees that a private attribute
should be public, then the problem is solved. But if he does not
agree, he should not be forced to bend to your desire.

Wait, if I change my project to ignore the data hiding (enforced or not), am I
forcing the author to bend for my desire? Please explain your reasoning.

Or better yet... don't. I will just give up, right now. This is no longer about
"security", "good practices", "software engineering", "bug catching" or "formal
proofs" as you've tried to paint it before. This is about you wanting to control
how others use your code. And while it may be your legal right, that isn't the
discussion I thought I was getting into.
 
M

Mark Wooding

Russ P. said:
Calling a one-word change a "fork" is quite a stretch, I'd say.

I wouldn't. I've forked a project P if I've made a different version of
it which isn't going to be reflected upstream. Now I've got to maintain
my fork, merging in changes from upstream as they happen, and upgrading
all the things which use my new version; if I want to distribute my
program M to other people, they'll also need my forked version of
whatever. Now suppose that two programs A and B both require one-word
changes in P: there's a combinatorial explosion of little patches which
need to be managed.

A fork is a fork, regardless of how big the change is. The problem with
a fork is the maintenance problem it entails.

Besides, if I want to do some hacky debugging in ipython, should I
really have to recompile and reinstall piles of libraries?
But what if I want an automatic check to verify that I am using it as
the author intended? Is that unreasonable?

You mean that you can't /tell/ whether you typed mumble._seekrit?
You're very strange. It's kind of hard to do by accident. I'd have
thought that you could do that with grep, err...

git grep '\._' | sed 's/self\._//g' | grep '\._'

ought to do as a rough start.

If you can't trust your programmers to make it clear when they're doing
something dubious, I think you have bigger problems.

-- [mdw]
 
M

Mark Wooding

Russ P. said:
Imagine a person who repairs computers. He is really annoyed that he
constantly has to remove the cover to get at the guts of the computer.
So he insists that computers cases should be made without covers.

Poor analogy. He gets fed up that the computers he's meant to be
servicing are arriving in sealed containers which require specialist
tools to open.
After all, manufacturers put covers on computers only because they
don't trust us and think we're too "stupid" to safely handle an
uncovered computer box.

It's more to do with keeping dust out, keeping air circulating, and
keeping fingers away from sharp edges. Fortunately most computers are
actually shipped in cases one can remove easily, using household
tools -- or even no tools at all. Why, anyone would think that you were
supposed to be able to grub about in there!
That is logically equivalent to your position on enforced access
restrictions in software.

It is now that I've fixed it.

-- [mdw]
 
R

Russ P.

I wouldn't.  I've forked a project P if I've made a different version of
it which isn't going to be reflected upstream.  Now I've got to maintain
my fork, merging in changes from upstream as they happen, and upgrading
all the things which use my new version; if I want to distribute my
program M to other people, they'll also need my forked version of
whatever.  Now suppose that two programs A and B both require one-word
changes in P: there's a combinatorial explosion of little patches which
need to be managed.

A fork is a fork, regardless of how big the change is.  The problem with
a fork is the maintenance problem it entails.

Not really. A "fork" is something that *diverges* from the original.
That means the differences *grow* over time. In this case, the
differences will not grow over time (unless you access more private
attributes).

As I pointed out before, you don't even need to keep track of the
changes you made. You will be automatically reminded as soon as you
get a new version of the library and try to use it (again, assuming
that your tests provide sufficient coverage and the attribute is not
changed to public).
You mean that you can't /tell/ whether you typed mumble._seekrit?
You're very strange.  It's kind of hard to do by accident.  I'd have

If I have a team of 200 programmers, I can't easily tell if one of
them did that somewhere. Why do people like you have such a hard time
understanding that I'm not talking here about smallish programs with
one or a few developers?

And even with only one programmer, he might access "mumble._seekrit"
for a debugging test, then forget to take it out.
thought that you could do that with grep, err...

        git grep '\._' | sed 's/self\._//g' | grep '\._'

ought to do as a rough start.

If you can't trust your programmers to make it clear when they're doing
something dubious, I think you have bigger problems.

Yes, I think I have bigger problems. But I like the challenge. I don't
think I'd be happy working on small problems, but to each his own.
 
M

Mark Wooding

Steven D'Aprano said:
Then I guess Guido must be such an egotist, because there's plenty of
internals in Python that you can't (easy) mess with.

Time for some reflection. (Apposite word, as it turns out.)

For the avoidance of doubt, I shall grant (and not grudgingly):

* Abstraction is a useful tool in building complex systems.

* Separating an interface from its implementation reduces the
cognitive burden on people trying to reason about the system
(including when doing design, developing clients, or trying to do
more formal kinds of reasoning).

* It also makes maintenance of the implementation easier: in the cases
where this it's possible to improve the implementation without
changing the interface, clients can benefit without having to be
changed.

I think that one of the reasons this conversation is going on for so
long is that we haven't really talked much about what kinds of `messing'
we're talking about.

I think that, most of the time when I'm inconvenienced by some
abstraction, it's because it's hiding something that I wanted to see --
in a read-only fashion. The implementation knows some fact that, for
whatever reason, it's unwilling to reveal to me. I understand that, in
some future version, the implementation might change and this fact might
not be available then, or that it's an artifact of the way the
implementation works in some environment -- but for whatever reason
(debugging is a typical one as was pointed out upthread) it turns out
that I'm actually interested in this fact. Revealing it to me can't
actually hurt the invariants of the system, though I need to be somewhat
careful about how long I assume it's correct. Of course, that should be
entirely my responsibility.

It's this common problem of wanting to dig out some piece of information
which I'm really worried about. And `enforced data hiding' just slams
the door in my face. I'm not best pleased by the idea.

Anyway, in this regard, the CPython implementation is pretty much a
paragon of virtue. It lets one get at almost everything one could want
and a whole lot else besides.
Yes you could, and you could hack the OS to manipulate data behind the
scenes, and you could build a hardware device to inject whatever data
you want directly into the memory. You can do any of those things. So
what?

Data hiding isn't about some sort of mythical 100% certainty against
any imaginable failure mode. Data hiding is a tool like any other, and
like all tools, it has uses and misuses, and it works under some
circumstances and not others.

If you don't get 100% certainty that there will never be a failure no
matter what, what do you get? Just off the top of my head, it:

How much of these do you /lose/ by having a somehat more porous
interface?
* makes it easier for an optimising compiler to give fast code if it
doesn't have to assume internal details can be changed;

Irrelevant for read-only inspection. For making modifications, this
might be a valid point, though (a) I'm unaware of any compilers
sufficiently aggressive to make very effective use of this, and (b) I'm
probably willing to accommodate the compiler by being sufficiently
careful about my hacking. That is: go ahead, use a fancy compiler, and
I'll cope as best I can.
* makes it easier to separate interface from implementation when you
can trust that the implementation actually isn't being used;

Irrelevant for read-only inspection. For making modifications: you
carry on assuming that the interface is being used as you expect, and
I'll take on the job of reasoning about invariants and making sure that
everything continues to work.
* gives the developer more freedom to change the implementation;

For read-only inspection, I might lose if you stop providing the
information I want; I'll need to change my code, but you don't need to
care. Probably if your implementation has changed that much, the
information isn't relevant any more anyway.

Besides, if your implementation changes break my code, I get to keep
both pieces, and you get to laugh. What's the big deal?
* makes it possible for meaningful correctness proofs;

Irrelevant for read-only inspection. For making modifications, I'll
take on the responsibility for amending the proofs as necessary.
* reduces the amount of interconnections between different parts of your
program by ensuring that all interaction goes through the interface
instead of the implementation;

For read-only inspection, I'm not sure this matter much -- if your
implementation knows a fact that I want, then either I'll get it through
your interface or dredge it out of your implementation's guts, but the
module coupling's there either way. (If there was a better way to
obtain that fact, then I should just have used the better way instead --
but in the case where it's a fact about your implementation's state
there probably isn't a better way.) Similarly for modifications,
actually: if I have a need to change your implementation's state
somehow, I can do that through the interface or under the table, but
there's a coupling either way.
* which in turn reduces the amount of testing you need to do;

See above.
Well, you tell me: does it make sense for Guido to have decided to
make it hard for pure Python developers to cause buffer overflows?

Yes.

That said, I'm glad that it's /possible/ to write unsafe programs in
Python. It means that the right escape-hatches are present.

What I'm really complaining about are the kinds of interfaces -- which I
see all to often in languages where people have embraced this kind of
mandatory `information hiding' overenthusiastically -- where (a) the
right features aren't there to begin with, and (b) the escape hatches
are either messing or /extremely/ inconvenient. Java programs often
seem to be like this.

But CPython bends over backwards to provide useful information about its
state: all those wacky attributes on functions and code objects and so
on. Without this kind of care, I'm pretty sure that mandatory hiding is
far worse as a cure than people diddling about inside other modules'
implementation details is as a disease.

I'm expecting you to argue that programmers would be too sensible to
hide interesting information behind their mandatory-data-hiding, and I
should just give them some credit. Maybe: but the situation is
different. Firstly, we wouldn't be asking for this feature if we were
willing to gave programmers some credit for acting responsibly when they
dig about in another module's innards. Secondly, while it's certainly
possible to mess up when poking about, the damage is fairly localized;
if I'm overprotective about my mandatory hiding, I can screw other
people.


I guess that if overriding the controls was as easy as

with naughty_hacking:
## stuff ...

I wouldn't complain. (But I think the effect ought to be scoped
/lexically/ rather than dynamically, so that grep works properly.)

-- [mdw]
 
R

Russ P.

You mean that you can't /tell/ whether you typed mumble._seekrit?
You're very strange.  It's kind of hard to do by accident.

But what if you type "mumble._seekrit" in several places, then the
library implementer decides to give in to your nagging and makes it
"public" by changing it to "mumble.seekrit". Now suppose you forget to
make the corresponding change somewhere in your code, such as

mumble._seekrit = zzz

You will get no warning at all. You will just be inadvertently
creating a new "private" attribute -- and the assignment that you
really want will not get done.

For that matter, the library implementer himself could make the same
mistake and get no warning.

When you think about it, you soon realize that the leading underscore
convention violates the spirit if not the letter of one of the first
principles of programming 101: if you have a constant parameter that
appears in several places, assign the literal value in one place
rather than repeating it everywhere. Then if you need to change the
value, you only need to change it in one place. That reduces effort,
but more importantly it reduces the potential for error.

The same principle applies to "declaring" an attribute private. If
that "declaration" is encoded in every occurrence of its identifier,
then if you decide to change it to public, you need to change the
identifier at each and every location. But if a "private" or "priv"
keyword were available, you would only need to make the change in one
location.
 
S

Steven D'Aprano

But what if you type "mumble._seekrit" in several places, then the
library implementer decides to give in to your nagging and makes it
"public" by changing it to "mumble.seekrit". Now suppose you forget to
make the corresponding change somewhere in your code, such as

mumble._seekrit = zzz

You will get no warning at all. You will just be inadvertently creating
a new "private" attribute -- and the assignment that you really want
will not get done.

For that matter, the library implementer himself could make the same
mistake and get no warning.

When you think about it, you soon realize that the leading underscore
convention violates the spirit if not the letter of one of the first
principles of programming 101: if you have a constant parameter that
appears in several places, assign the literal value in one place rather
than repeating it everywhere. Then if you need to change the value, you
only need to change it in one place. That reduces effort, but more
importantly it reduces the potential for error.

How is this scenario different from an API change where public_method()
gets changed to method()? Surely this is just a downside to Python's lack
of declarations, rather than specific to Python's lack of data hiding?
 
P

Paul Rubin

Steven D'Aprano said:
How is this scenario different from an API change where
self.some_attribute gets changed to self.attribute?

That would be a backward incompatible change to a published interface,
something that should not be done without a good reason, and which was
mostly avoided through the whole Python 2.x series (incompatible
changes were saved for Python 3.0). Changing an undocumented and
supposedly private interface is something different entirely.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top