Why does python not have a mechanism for data hiding?

B

BJörn Lindqvist

I think you're missing the point.

As I see it, the primary value of data hiding is that it provides
useful information on which data and methods are intended for the
client and which are intended for internal use. It's like putting a
front panel on a TV set with the main controls intended for the
viewer.

Here's my two cents. First of all, a TV is a bad analogy compared to
reusable software libraries. Really bad analogy. A TV is a horribly
complicated device which has to be dumbed down because otherwise it
would be to hard to use for ordinary people.

A software developers relation to a third party library is more
similar to a TV repair man trying to repair a TV than to a random
person watching TV. For a repair man, the front panel is just useless
and in the way.

Oh, and to continue on the TV analogy, one of the reason why a TV is
complicated is because its interface is totally different from its
implementation. Channels are just a bad abstraction for tuning the
receiver to different frequencies and for switching inputs. Merely
using a TV doesn't teach you anything about how it actually works.

KISS: Keep It Simple Stupid. And it is always simpler to not implement
the gunk needed for data hiding than to do it. By keeping things
simple you keep your code easy to implement, easy to understand and
easy to reuse.

Data hiding sacrifices implementation simplicity supposedly to make
the interface simpler and to keep backwards compatibility. It allows
you to change implementation details without affecting the
interface. But do you really want to do that? Consider this silly Java
example:

class Foo {
private int bar;
public int getBar() {
return bar;
}
};

Then for some reason you decide that hm, "bar" is not a good attribute
name so you change it to "babar". And you can do that without changing
the public interface! Woho! So now you have a public getter named
"getBar" that returns an attribute named "babar". That's in reality
just bad and whoever is maintaining the implementation is going to be
annoyed that the getters name doesn't match the attribute name.

What would have happened without data hiding? Renaming the public
attribute "bar" to "babar" probably cause some grief for someone
reusing your library, but you would keep your implementation pure.

What about semantic changes? Data hiding doesn't protect you against
that, so you'll have to change your interface anyway. The interface
for a car hasn't changed much in the last 100 years, but the
implementation has. How easy is it to repair a car nowadays compared
to 30 years ago?

And data hiding as a documentation aid is just a sham. "These methods
are public so you can call them, these aren't so hands off!" A reuser
of your library *will* want to know what happens on the inside, by
trying to make stuff impossible to reach you are just making that kind
of information much harder to come by.

The better method is to just write proper docstrings that tell the
user what the methods do and when they can be called.

Another good way to see how useless data hiding is, is to try and unit
test a very encapsulated library. You'll see that it is almost
impossible to write good unit tests unless you publicly export
almost everything in the code. At which point you come to realize that
all the data hiding was for naught.
 
L

Lie

Actually, 'data hiding', although vastly overused by the static crowd
can be a reasonable thing to want.

For example, at Resolver Systems we expose the spreadsheet object
model to our users. It hasa public, documented, API - plus a host of
undocumented internally used methods.

We would really *much* rather hide these, because anything our
customers start using (whether documented or not) we will probably
have to continue supporting and maintaining.

The 'we told you not to use that' approach, when applied to paying
customers doesn't really work... all they see is that you broke their
spreadsheet code by changing your API.

You can make members truly private by proxying, but it is a bit
ungainly.

Then don't document it, or separate internal documentation (which is
never to pass through the wall) and public documentation (which your
users use). Nobody would (apart from your dev team and anyone told by
your dev team, which means you may fire the person for "lack of
discipline") know that there is such a thing and in consequence
wouldn't use it.

Don't tell your user not to use something, just don't tell them that
it exists and they won't use it.
 
G

George Sakkis

Well, the designers of C++, Java, and Ada, to name just three very
popular languages (well, two) seem to think it makes sense. But maybe
you know more than they know.

And even more (well, almost all) languages use explicit delimiters for
defining blocks instead of indentation, so what's your point ?
 
A

Antoon Pardon

And even more (well, almost all) languages use explicit delimiters for
defining blocks instead of indentation, so what's your point ?

Hmm, difficult to react to this. On the one hand I have had people
argue that block delimiting in python is explicit too. So in that
case python doesn't differ from those other languages.

On the other hand if we accept that blocks are delimited implicitely
in python then it seems python doesn't follow its own zen:

Explicit is better than implicit
 
L

Lie

Here's my two cents. First of all, a TV is a bad analogy compared to
reusable software libraries. Really bad analogy. A TV is a horribly
complicated device which has to be dumbed down because otherwise it
would be to hard to use for ordinary people.

I think it's actually quite a good analogy, a class may get quite
complicated and it may do completely different things that the public
interface seems to imply. Anyway, an analogy is an analogy, don't
expect it to be exactly the same as the case itself, expect it to
illustrate the point well enough and ignore the differences not being
illustrated.

TV is a good analogy since it illustrated the point quite well, that
there are some things user may freely interact, some that users should
not mess with, and things that is strictly not for you. Nevertheless,
with the proper knowledge and proper tools, any users could open the
case and get the special screwdriver to open the TV, if all else
fails, he could always get a hammer to break the casing and gone his
way through.

Python does not enforce data-hiding because it expect people that gone
his way to ignore the warning and get the special screwdriver to be
knowledgeable enough to mess with it. C/C++ expects people to use
hammer to break through their casings, and in the end, since the
casings has already been broken, the device may never look as
beautiful as before. In python, the device may appear to look just as
beautiful.
A software developers relation to a third party library is more
similar to a TV repair man trying to repair a TV than to a random
person watching TV. For a repair man, the front panel is just useless
and in the way.

No, for a knowledgable man (a TV repairman), he'd try first to fix the
TV without opening the case (such as seeing whether the power cable is
actually plugged), and if those attempts fails (or if he already know
where the damage is from the beginning), he'd then open the screws.
The public interface isn't "useless and in the way".
Oh, and to continue on the TV analogy, one of the reason why a TV is
complicated is because its interface is totally different from its
implementation. Channels are just a bad abstraction for tuning the
receiver to different frequencies and for switching inputs. Merely
using a TV doesn't teach you anything about how it actually works.

Why couldn't a class have interface that's completely different thing
than the implementation.
KISS: Keep It Simple Stupid. And it is always simpler to not implement
the gunk needed for data hiding than to do it. By keeping things
simple you keep your code easy to implement, easy to understand and
easy to reuse.

Data hiding sacrifices implementation simplicity supposedly to make
the interface simpler and to keep backwards compatibility. It allows
you to change implementation details without affecting the
interface. But do you really want to do that? Consider this silly Java
example:

    class Foo {
        private int bar;
        public int getBar() {
            return bar;
        }
    };

Then for some reason you decide that hm, "bar" is not a good attribute
name so you change it to "babar". And you can do that without changing
the public interface! Woho! So now you have a public getter named
"getBar" that returns an attribute named "babar". That's in reality
just bad and whoever is maintaining the implementation is going to be
annoyed that the getters name doesn't match the attribute name.

What would have happened without data hiding? Renaming the public
attribute "bar" to "babar" probably cause some grief for someone
reusing your library, but you would keep your implementation pure.

What about semantic changes? Data hiding doesn't protect you against
that, so you'll have to change your interface anyway. The interface
for a car hasn't changed much in the last 100 years, but the
implementation has. How easy is it to repair a car nowadays compared
to 30 years ago?

And data hiding as a documentation aid is just a sham. "These methods
are public so you can call them, these aren't so hands off!" A reuser
of your library *will* want to know what happens on the inside, by
trying to make stuff impossible to reach you are just making that kind
of information much harder to come by.

And we expect those people is ready that the car may blow off right in
their face since they have violated the lines. If they broke the lines
and still think that we're guilty for his burnt face, that's their
problem.
 
G

Giuseppe Ottaviano

Hmm, difficult to react to this. On the one hand I have had people
argue that block delimiting in python is explicit too. So in that
case python doesn't differ from those other languages.

On the other hand if we accept that blocks are delimited implicitely
in python then it seems python doesn't follow its own zen:

Explicit is better than implicit

So also duck typing is against python's philosophy? :)
 
L

Lie

Actually, 'data hiding', although vastly overused by the static crowd
can be a reasonable thing to want.

For example, at Resolver Systems we expose the spreadsheet object
model to our users. It hasa public, documented, API - plus a host of
undocumented internally used methods.

We would really *much* rather hide these, because anything our
customers start using (whether documented or not) we will probably
have to continue supporting and maintaining.

The 'we told you not to use that' approach, when applied to paying
customers doesn't really work... all they see is that you broke their
spreadsheet code by changing your API.

The problem is you're not hard enough, you let yourself to be enslaved
by your customer. If they have a problem because they used a private
interface, that's their problem, they have to fix it at their side or
go away and use a competing product[1]. Even if they're paying
customers they're not your master or your God, even if they're a
larger company than yours.

Python has an extremely good design because the BDFL doesn't just
listen to everyone and create a product that tries to please
everybody, no, he listens to those that have good ideas and tells the
stupid ideas to go away and he applies a subjective decision which
more often than not leads to a better python.

[1] In most cases, they would become silent at this point and fix
their code, because they know there is nothing they can do to change
your decision. It's often more expensive to move to competing products
so they'd either use old versions or fix those places where they've
used private interface, and avoid using private interface in the
future.
You can make members truly private by proxying, but it is a bit
ungainly.

Michael Foordhttp://www.ironpythoninaction.com/
(snip)
 
G

greg

Others have already answered this directly, but I'd like to mention
that languages I know of which have this feature also have a feature
for getting around it. (e.g. C++ and friend classes) I don't know
about you, but I don't want features in the language that make me want
to circumvent them. Do you?

I curious as how 'private' fits with the Open Source philosophy?
Sure, I can (and do) hide stuff with the double underscore technique,
but anyone using my code can open it up and add an "accessor" method
anytime they want, so nothing is really hidden. I think the
"consenting adults" approach is the best one could hope for with FOSS.
 
B

Bruno Desthuilliers

Russ P. a écrit :
I think you're missing the point.

As I see it, the primary value of data hiding is that it provides
useful information on which data and methods are intended for the
client and which are intended for internal use. It's like putting a
front panel on a TV set with the main controls intended for the
viewer.

People seem to be preoccupied with whether or not the back panel of
the TV is locked, but that is not the main issue. Sure, you probably
want to make the back panel removable, but you don't want the viewer
opening it up to change the channel, and you certainly don't want to
put all the internal adjustments for factory technicians together with
the controls for the end user.

As far as I am concerned, the current Python method of using
underscores to distinguish between internal and external methods and
data is an ugly hack that goes completely against the elegance of the
language in other areas.

As far as I'm concerned, it's JustFine(tm). I don't have to ask myself
if an attribute is part of the API or not, I know it immediatly.
It is like a TV set with no back cover and
the volume and channel controls intermingled with the factory
controls. The underscores are just an afterthought like a red dot or
something used to tell the TV viewer what to fiddle with.

Your opinion. But beware of leaky TV-Set-metaphor abstractions
Python is a very nice language overall, but as far as I am concerned
the underscore convention is a blemish. I just wish people wouldn't
get so infatuated with the language that they cannot see the obvious
staring them in the face.

I definitively don't have problem with this naming convention, which I'd
find useful ever with a language having enforced access restrictions. If
that's the only - or worse - wart you find in Python, then it must
surely be a pretty good language !-)
 
S

sturlamolden

first, python is one of my fav languages, and i'll definitely keep
developing with it. But, there's 1 one thing what I -really- miss:
data hiding. I know member vars are private when you prefix them with
2 underscores, but I hate prefixing my vars, I'd rather add a keyword
before it.

Python has no data hiding because C++ has (void *).

Python underscores does some name mangling, but does not attempt any
data hiding.

Python and C has about the same approach to data hiding. It is well
tried, and works equally well in both languages:

# this is mine, keep your filthy paws off!!!

Irresponsible programmers should not be allowed near a computer
anyway. If you use data hiding to protect your code from yourself,
what you really need is some time off to reconsider your career.
 
R

Richard Levasseur

Here's my two cents. First of all, a TV is a bad analogy compared to
reusable software libraries. Really bad analogy. A TV is a horribly
complicated device which has to be dumbed down because otherwise it
would be to hard to use for ordinary people.

A software developers relation to a third party library is more
similar to a TV repair man trying to repair a TV than to a random
person watching TV. For a repair man, the front panel is just useless
and in the way.

Oh, and to continue on the TV analogy, one of the reason why a TV is
complicated is because its interface is totally different from its
implementation. Channels are just a bad abstraction for tuning the
receiver to different frequencies and for switching inputs. Merely
using a TV doesn't teach you anything about how it actually works.

KISS: Keep It Simple Stupid. And it is always simpler to not implement
the gunk needed for data hiding than to do it. By keeping things
simple you keep your code easy to implement, easy to understand and
easy to reuse.

Data hiding sacrifices implementation simplicity supposedly to make
the interface simpler and to keep backwards compatibility. It allows
you to change implementation details without affecting the
interface. But do you really want to do that? Consider this silly Java
example:

class Foo {
private int bar;
public int getBar() {
return bar;
}
};

Then for some reason you decide that hm, "bar" is not a good attribute
name so you change it to "babar". And you can do that without changing
the public interface! Woho! So now you have a public getter named
"getBar" that returns an attribute named "babar". That's in reality
just bad and whoever is maintaining the implementation is going to be
annoyed that the getters name doesn't match the attribute name.

What would have happened without data hiding? Renaming the public
attribute "bar" to "babar" probably cause some grief for someone
reusing your library, but you would keep your implementation pure.

What about semantic changes? Data hiding doesn't protect you against
that, so you'll have to change your interface anyway. The interface
for a car hasn't changed much in the last 100 years, but the
implementation has. How easy is it to repair a car nowadays compared
to 30 years ago?

And data hiding as a documentation aid is just a sham. "These methods
are public so you can call them, these aren't so hands off!" A reuser
of your library *will* want to know what happens on the inside, by
trying to make stuff impossible to reach you are just making that kind
of information much harder to come by.

The better method is to just write proper docstrings that tell the
user what the methods do and when they can be called.

Another good way to see how useless data hiding is, is to try and unit
test a very encapsulated library. You'll see that it is almost
impossible to write good unit tests unless you publicly export
almost everything in the code. At which point you come to realize that
all the data hiding was for naught.

I really like this message and find it very true. Writing unit tests
for private data is nigh impossible. You end up either creating
accessors, or passing in parameters via the constructor (resulting in
a huge constructor). Personally, I'd rather have better test coverage
than data hiding.

Second, private vars with third party libs suck, and are nothing but
an infuriating frustration. I'm currently dealing with about 3 or 4
different libs, one of them uses private variables and its a huge
headache. I have to access some of those private vars occasionally to
make my thing work. The other libs i'm using don't have any private
vars (__) (only a couple protected ones, _), and its a breeze. The
docs say "this does x" or there's a comment that says "don't use this
unless you really know what you're doing," and I respect their
warnings.

When I was fooling around with sqlalchemy, it made heavy use of
protected vars but had a straight forward public api. Unfortunately,
writing plugins for it required access to some of those protected
vars. It wouldn't be possible if they were strictly controlled and
restricted by the language itself. Whenever I'd use those protected
vars, I expected an odd behavior or two. When using private vars, I
don't expect it to work at all, and really, refrain from using them
unless i've grokked the source.

My point is that I currently like the private/protected/public scheme
python has going on. It lets me fix or alter things if I have to, but
also provides a warning that I shouldn't be doing this.

As for customers using the internals and worrying about an upgrade
breaking them, it seems likes a silly issue, at least in python. If
there are internals that the customer would be playing with, then it
should be exposed publically, since they want it that way to begin
with. If they're using defunct variables or methods, you use
properties and __getattr__ to maintain backwards compatibility for a
version or two.
 
S

sturlamolden

I think you completed missed the point.

This is just a proof of concept thing. In a real example there would
of course no Set en Get methods but just methods that in the course
of their execution would access or update the hidden attributes

I have to agree with Banks here, you have not provided an example of
data hiding. It does not discriminate between attribute access from
within and from outside the class. You just assume that the attribute
named 'hidden' will be left alone. Also naming it hidden is stupid as
it is visible.

What you need is a mechanism that will thrown an exception whenever an
attribue is accessed from outside the class, but not from inside.

The mechanism must also be impossible to override with additional
code.

If Ada is what you want, Ada is what you should use.
 
R

Russ P.

And even more (well, almost all) languages use explicit delimiters for
defining blocks instead of indentation, so what's your point ?

You are comparing a syntactic convention with a more fundmaental
aspect of the language. But beyond that, I dislike braces as
delimiters for the same reason I dislike leading underscores: both are
unnecessary syntactic noise. And the whole idea of encoding properties
of an object in its name just seems tacky to me.

What is it about leading underscores that bothers me? To me, they are
like a small pebble in your shoe while you are on a hike. Yes, you can
live with it, and it does no harm, but you still want to get rid of it.
 
R

Russ P.

I really like this message and find it very true. Writing unit tests
for private data is nigh impossible. You end up either creating
accessors, or passing in parameters via the constructor (resulting in
a huge constructor). Personally, I'd rather have better test coverage
than data hiding.

Second, private vars with third party libs suck, and are nothing but
an infuriating frustration. I'm currently dealing with about 3 or 4
different libs, one of them uses private variables and its a huge
headache. I have to access some of those private vars occasionally to
make my thing work. The other libs i'm using don't have any private
vars (__) (only a couple protected ones, _), and its a breeze. The
docs say "this does x" or there's a comment that says "don't use this
unless you really know what you're doing," and I respect their
warnings.

When I was fooling around with sqlalchemy, it made heavy use of
protected vars but had a straight forward public api. Unfortunately,
writing plugins for it required access to some of those protected
vars. It wouldn't be possible if they were strictly controlled and
restricted by the language itself. Whenever I'd use those protected
vars, I expected an odd behavior or two. When using private vars, I
don't expect it to work at all, and really, refrain from using them
unless i've grokked the source.

My point is that I currently like the private/protected/public scheme
python has going on. It lets me fix or alter things if I have to, but
also provides a warning that I shouldn't be doing this.

As for customers using the internals and worrying about an upgrade
breaking them, it seems likes a silly issue, at least in python. If
there are internals that the customer would be playing with, then it
should be exposed publically, since they want it that way to begin
with. If they're using defunct variables or methods, you use
properties and __getattr__ to maintain backwards compatibility for a
version or two.

If you think that private data and methods should not be allowed
because they complicate unit testing, then I suggest you take a look
at how unit testing is done is C++, Java, and Ada. They seem to do
just fine. Also, I have stated several times now that "back door"
access should be allowed. That should satisfy any need for access to
"private" data in unit testing.

But I think there is a more fundamental issue here. You complain about
problems with software that uses data encapsulation. So two
possibilities exist here: either the designers of the code were not
smart enough to understand what data or methods the client would need,
or the client is not smart enough to understand what they need. Maybe
the solution is smarter programmers and clients rather than a dumber
language.
 
A

alex23

If you think that private data and methods should not be allowed
because they complicate unit testing, then I suggest you take a look
at how unit testing is done is C++, Java, and Ada. They seem to do
just fine.

Nice to put the burden of evidence back onto everyone else, but doing
a bit of searching I found the following "answers" to the question of
unit-testing private functions & methods:
I suggest that tests should be written only for the public methods.
You can use a debugger, probably Carbide. That way you can see
all the variables. Otherwise, write the values to a log or EMCT.
You can make the logging only happen for debug builds if you don't
want the logging in the production code. If you really need to
see the private variables from your code, declare them public in
debug builds.
Problem is testing private functions. Some can be fixed by
promoting private to protected, inheriting the class adding
testing in the class. Others get refactored out the classes
they reside in and get put into their own functor classes[...]

So the basic answers I'm seeing that "do just fine" are:

1. Don't test private functions.
2. Add functionality _to_ the private functions for testing.
3. Change the interface for the purpose of testing.

All of which seem exceptionally inefficient and run counter to the
whole purpose of unit testing.
But I think there is a more fundamental issue here. You complain about
problems with software that uses data encapsulation. So two
possibilities exist here: either the designers of the code were not
smart enough to understand what data or methods the client would need,
or the client is not smart enough to understand what they need. Maybe
the solution is smarter programmers and clients rather than a dumber
language.

This is the most ludicrous argument I've ever heard. Of _course_ we
can't predict every possible usage of our code that others might want
it for. If someone can easily extend code that I've written to improve
or increase its functionality, why would I want to prevent them from
doing so?

Then again, I tend to think of other programmers as "peers" rather
than clients. YMMV.
 
R

Russ P.

It seems you have a different idea of what unit testing is for from
me.

Isn't the entire point of encapsulation to separate internal
components from the external interface?

Why would a unit test, the whole purpose of which is to assert some
aspect of the external behaviour of the unit of code, care about how
that code unit is implemented internally?

If changing the internal, encapsulated components of a unit causes its
external behaviour to change, that's a bug; either in the change made
(it shouldn't have altered the external behaviour), or in the unit
test asserting the wrong thing (it shouldn't be asserting anything
about internal state of the code).

--
\ “Try to become not a man of success, but try rather to become |
`\ a man of value.” —Albert Einstein |
_o__) |
Ben Finney

Thank you. Let me just add that, as I said before, I think "private"
data (if it were added to Python) should be accessible through some
sort of "indirect" mechanism akin to the double-leading-underscore
rule. Then, even if it *is* needed for unit testing, it can be
accessed.

As for unit testing in C++, Java, and Ada, I confess I know nothing
about it, but I assume it gets done. Considering that Ada is used to
manage and control fighter jets, cruise missiles, and nuclear
arsenals, let's hope it gets done right.
 
M

Marc 'BlackJack' Rintsch

It seems you have a different idea of what unit testing is for from
me.

For me it's about finding bugs where documentation and implementation
disagree. And if you document private functions it makes sense to me to
also test if they work as documented. Because the official API relies on
the correct implementation of the private parts it uses under the hood.
Isn't the entire point of encapsulation to separate internal
components from the external interface?

Why would a unit test, the whole purpose of which is to assert some
aspect of the external behaviour of the unit of code, care about how
that code unit is implemented internally?

One part of writing unit tests is invoking functions with arguments that
you think are "corner cases". For example test if a function that takes a
list doesn't bomb out when you feed the empty list into it. Or if it
handles all errors correctly.

If a function `f()` calls internally `_g()` and that function might even
call other private functions, then you have to know how `f()` works
internally to create input that checks if error handling in `_g()` works
correctly. So it goes against your understanding of unit tests.

What do you do in such a situation? Build something from untested private
parts and just test the assembled piece? I prefer to test the private
functions too. After all the private functions are not private to the
everybody, there *are* functions that rely on them working correctly.

Ciao,
Marc 'BlackJack' Rintsch
 
A

Antoon Pardon

Python has an extremely good design because the BDFL doesn't just
listen to everyone and create a product that tries to please
everybody, no, he listens to those that have good ideas and tells the
stupid ideas to go away and he applies a subjective decision which
more often than not leads to a better python.

I agree that Guido van Rossum has done an excellent job. That doesn't
mean he has to be painted as unfailable in which the ideais he accepts
are good ideas and those he rejects are bad ideas almost by definition.

Guido has been known to change his mind, which is an admirabele quality,
but it does show that at some point he rejected a good idea or accepted
a bad idea.
 
A

Antoon Pardon

I have to agree with Banks here, you have not provided an example of
data hiding. It does not discriminate between attribute access from
within and from outside the class. You just assume that the attribute
named 'hidden' will be left alone. Also naming it hidden is stupid as
it is visible.

No I don't assume that hidden wil be left alone. hidden is a free
variable in a closure and thus simply can't be accessed except by
local functions that were made accessible (and some mechanism
dependant on the CPython implementation).
What you need is a mechanism that will thrown an exception whenever an
attribue is accessed from outside the class, but not from inside.

And my example does this. It threw an AttributeError
The mechanism must also be impossible to override with additional
code.

Which as far as I know it is.
 
M

Marc 'BlackJack' Rintsch

Marc 'BlackJack' Rintsch said:
It seems you [alex23] have a different idea of what unit testing
is for from me.

For me it's about finding bugs where documentation and
implementation disagree.

Where "documentation" is "specified externally-visible behaviour of
the unit", I agree with this.
And if you document private functions

By definition, "private" functions are not part of the publicly
documented behaviour of the unit. Any behaviour exhibited by some
private component is seen externally as a behaviour of some public
component.

But only indirectly, and it's often harder to predict the corner cases
that might trigger bugs or to test error testing in dependent private
functions. Private functions offer an API that's public to someone, so
they ought to be documented and tested.
If they affect the behaviour of some public component, that's where
the documentation should be.

As I said they are public themselves for someone.
Only to the extent that the documented behaviour of the API is
affected. The main benefit of marking something as "not public" is
that one *is* free to change its behaviour, so long as the public API
is preserved.

One more reason to test the individual private functions because a change
of such a function shouldn't make it necessary to change the unit tests of
the public API.
This sounds like part of the externally-visible behaviour of the code
unit; i.e. something that belongs in the public API. I agree that this
is the domain of a unit test.


No, you don't need to know how it works internally; you need only know
what guarantees it must keep for its external behaviour.

How do you know the "corner cases" then? Often it is interesting how a
function that takes integers copes with zero, so that might be a test.
It's easy if you test a function directly but you need to know the
internals if you must find arguments that lead to a dependent function
called with zero. Contrived example:

def _g(i):
return (42 / i) if i else 0

def f(x):
return _g(x + 23)

Here ``f(-23)`` is a special corner case that should be tested somehow.
And I think it is better tested as explicit test of `_g()` than with a
test of `f()`. Testing for "corner cases" needs some knowledge about the
implementation, but that shouldn't be "transitive". The tests for `f()`
should assume that `_g()` itself has promised in its documentation was
already covered by a unit test.
If someone wants to alter the `_g()` function, or remove it entirely
while preserving the correct behaviour of `f()`, that should have no
effect on the external behaviour of `f()`.

That is to say, the knowledge of the "internals" of `f()` in your
example is actually knowledge of something that should be documented
as part of the external behaviour of `f()` — that, or it's not
relevant to the behaviour of `f()` and shouldn't be unit tested in
order that encapsulation is preserved.

`f()`'s documentation should mention that "it works for all integers
including -23" with -23 explicitly mentioned?
Assert the corner-case behaviour of `f()`, through unit tests that
operate on `f()` without caring about its internals.

And this way missing many potential bugs?
Then for *those* interfaces, unit tests can be devised that make
assertions about those interfaces.

Now you lost me. So essentially you don't test private functions unless
they are used somewhere, then they should be tested too. As private
functions that are not used, shouldn't be there in the first place, every
function private or public should be tested, right!?

Ciao,
Marc 'BlackJack' Rintsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,609
Members
45,253
Latest member
BlytheFant

Latest Threads

Top