Python style: to check or not to check args and data members

J

Joel Hedlund

Sep 1, 2006

#1

Hi!

The question of type checking/enforcing has bothered me for a while, and
since this newsgroup has a wealth of competence subscribed to it, I
figured this would be a great way of learning from the experts. I feel
there's a tradeoff between clear, easily readdable and extensible code
on one side, and safe code providing early errors and useful tracebacks
on the other. I want both! How do you guys do it? What's the pythonic
way? Are there any docs that I should read? All pointers and opinions
are appreciated!

I've also whipped up some examples in order to put the above questions
in context and for your amusement.

Briefly:

class MyClass(object):
def __init__(self, int_member = 0):
self.int_member = int_member
def process_data(self, data):
self.int_member += data

The attached files are elaborations on this theme, with increasing
security and, alas, rigidity and bloat. Even though
maximum_security_module.py probably will be the safest to use, the
coding style will bloat the code something awful and will probably make
maintenance harder (please prove me wrong!). Where should I draw the line?

These are the attached modules:

* nocheck_module.py:
As the above example, but with docs. No type checking.

* property_module.py
Type checking of data members using properties.

* methodcheck_module.py
Type checking of args within methods.

* decorator_module.py
Type checking of args using method decorators.

* maximum_security_module.py
Decorator and property type checking.

Let's pretend I'm writing a script, I import one of the above modules
and then execute the following code

....
my_object = MyClass(data1)
my_object.process_data(data2)

and then let's pretend dataX is of a bad type, say for example str.

nocheck_module.py
=================
Now, if data2 is bad, we get a suboptimal traceback (possibly to
somewhere deep within the code, and probably with an unrelated error
message). However, the first point of failure will in fact be included
in the traceback, so this error should be possible to find with little
effort. On the other hand, if data1 is bad, the exception will be raised
somewhere past the point of first failure. The traceback will be
completely off, and the error message will still be bad. Even worse: if
both are bad, we won't even get an exception. We will trundle on with
corrupted data and take no notice. Very clear code, though. Easily
extensible.

property_module.py
==================
Here we catch that data1 failure. Tracebacks may still be inconcise with
uninformative error messages, however they will not be as bad as in
nocheck_module.py. Bloat. +7 or more lines of boilerplate code for each
additional data member. Quite clear code. Readily extensible.

methodcheck_module.py
=====================
Good, concise tracebacks with exact error messages. Lots of bloat and
obscured code. Misses errors where data members are changed directly.
Very hard to read and extend.

decorator_module.py
===================
Good, concise tracebacks with good error messages. Some bloat. Misses
errors where data members are changed directly. Clear, but somewhat hard
to extend. Decorators for *all* methods?! This cannot be the purpose of
python!?

maximum_security_method.py
==========================
Good, concise tracebacks with good error messages. No errors missed (I
think?

. Bloat. Lots of decorators and boilerplate property code all
over the place (thankfully not within functional code, though). Is this
how it's supposed to be done?

And if you've read all the way down here I thank you so very much for
your patience and perseverance. Now I'd like to hear your thoughts on
this! Where should the line be drawn? Should I just typecheck data from
unreliable sources (users/other applications) and stick with the
barebone strategy, or should I go all the way? Did I miss something
obvious? Should I read some docs? (Which?) Are there performance issues
to consider?

Thanks again for taking the time.

Cheers!
/Joel Hedlund

"""Example module without method argument type checking.

Pros:
Pinpointed tracebacks with very exact error messages.

Cons:
Lots of boilerplate typechecking code littered all over the place,
obscuring functionality at the start of every function.
Bloat will accumulate rapidly. +2 lines of boilerplate code per method and
argument.
If I at some point decide that floats are also ok, I'll need to crawl all
over the code with a magnifying glass and a pair of tweezers.
We don't catch errors of the type
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)

"""

class MyClass(object):
"""My example class."""
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.

IN:
int_member = 0: <int>
Set the value for the data member. Must be int.

"""
# Boilerplate typechecking code.
if not isinstance(int_member, int):
raise TypeError("int_member must be int")
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member

def process_data(self, data):
"""Do some data processing.

IN:
data: <int>
New information that should be incorporated. Must be int.

"""
# Boilerplate typechecking code.
if not isinstance(data, int):
raise TypeError("data must be int")
# Data processing starts here. May for example contain addition:
self.int_member += data

# Test code. Decomment to play.

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)

"""Example module without type checking.

Pros:
Clean, easily readable and extensible code that gets down to business
fast. If I at some point decide that floats are also ok, I only need to
update the docs and all is well.
No bloat.

Cons:
Type restrictions are not enforced. This means that if type errors occur,
the exception may be raised far from the point of first failure, and
possibly with long, inconcise tracebacks with uninformative error messages.

"""

class MyClass(object):
"""My example class."""
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.

IN:
int_member = 0: <int>
Set the value for the data member. Must be int.

"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member

def process_data(self, data):
"""Do some data processing.

IN:
data: <int>
New information that should be incorporated. Must be int.

"""
# Data processing starts here. May for example contain addition:
self.int_member += data

# Test code. Decomment to play.

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)

"""Example module using properties for data member type checking.

Pros:
Quite clean, readable and extensible code that gets down to business fast.
Data member type restrictions are enforced. If I at some point decide that
floats are also ok, I only need to update the docs and a few more lines.

Cons:
Method argument types are not enforced, which means that tracebacks may
still be inconcise with uninformative error messages. Not as bad as in
nocheck_module.py though.
Bloat. +7 or more lines of boilerplate code for each added data member (can
this be done neater?). But at least the bloat is outside functional code.

"""

class MyClass(object):
"""My example class."""
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.

IN:
int_member = 0: <int>
Set the value for the data member. Must be int.

"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member

def _get_int_member(self):
return self.__int_member
def _set_int_member(self, value):
if not isinstance(value, int):
raise TypeError("int_member must be type int")
self.__int_member = value
int_member = property(_get_int_member, _set_int_member)
del _get_int_member, _set_int_member

def process_data(self, data):
"""Do some data processing.

IN:
data: <int>
New information that should be incorporated. Must be int.

"""
# Data processing starts here. May for example contain addition:
self.int_member += data

# Test code. Decomment to play.

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)

"""Example module without type checking.

Pros:
Clean, easily readable and extensible code that gets down to business
fast.
Pinpointed tracebacks with good error messages.
If I at some point decide that floats are also ok, I only need to
update the docs and change the decorators to
@method_argtypes((int, float)).

Cons:
With many args and allowed types, the type definitions on the decorator
lines will be hard to correlate to the args that they refer to (probably
not impossible to workaround though...?).
We still don't catch errors of the type
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)
A decorator for each method everywhere? That can't be the purpose of
python!? There has to be a better way?!

"""

def method_argtypes(*typedefs):
"""Rudimentary typechecker decorator generator.

If you're really interested in this stuff, go check out Michele
Simionato's decorator module instead. It rocks. Google is your friend.

IN:
*typedefs: <type> or <tuple <type>>
The allowed types for each arg to the method, self excluded.
Will be used with isinstance(), so valid typedefs include
int or (int, float).

"""
def argchecker(fcn):
import inspect
names = inspect.getargspec(fcn)[0][1:]
def check_args(*args):
for arg, value, allowed_types in zip(names, args[1:], typedefs):
if not isinstance(value, allowed_types):
one_of = ''
if hasattr(allowed_types, '__len__'):
one_of = "one of "
msg = ".%s() argument %r must be %s%s"
msg %= fcn.__name__, arg, one_of, allowed_types
raise TypeError(msg)
return fcn(*args)
return check_args
return argchecker

class MyClass(object):
"""My example class."""
@method_argtypes(int)
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.

IN:
int_member = 0: <int>
Set the value for the data member. Must be int.

"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member

@method_argtypes(int)
def process_data(self, data):
"""Do some data processing.

IN:
data: <int>
New information that should be incorporated. Must be int.

"""
# Data processing starts here. May for example contain addition:
self.int_member += data

# Test code. Decomment to play.

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)

"""Example module without type checking.

Pros:
Clean, easily readable and extensible code that gets down to business
fast.
Pinpointed tracebacks with good error messages.
Now we catch errors of the type
a = MyClass()
a.int_member = 'moo!"
a.process_data(1)

Cons:
With many args and allowed types, the type definitions on the decorator
lines will be hard to correlate to the args that they refer to (probably
not impossible to workaround though...?).
A decorator for each method everywhere? That can't be the purpose of
python!? There has to be a better way?!
Property bloat. +7 or more lines of boilerplate code for each added data
member (can this be done neater?).
If I at some point decide that floats are also ok, I only need to
update the docs, decorators and properties... hmm...

"""

def method_argtypes(*typedefs):
"""Rudimentary typechecker decorator generator.

If you're really interested in this stuff, go check out Michele
Simionato's decorator module instead. It rocks. Google is your friend.

IN:
*typedefs: <type> or <tuple <type>>
The allowed types for each arg to the method, self excluded.
Will be used with isinstance(), so valid typedefs include
int or (int, float).

"""
def argchecker(fcn):
import inspect
names = inspect.getargspec(fcn)[0][1:]
def check_args(*args):
for arg, value, allowed_types in zip(names, args[1:], typedefs):
if not isinstance(value, allowed_types):
one_of = ''
if hasattr(allowed_types, '__len__'):
one_of = "one of "
msg = ".%s() argument %r must be %s%s"
msg %= fcn.__name__, arg, one_of, allowed_types
raise TypeError(msg)
return fcn(*args)
return check_args
return argchecker

class MyClass(object):
"""My example class."""
@method_argtypes(int)
def __init__(self, int_member = 0):
"""Instantiate a new MyClass object.

IN:
int_member = 0: <int>
Set the value for the data member. Must be int.

"""
# Initiallization starts here. May for example contain assignment.
self.int_member = int_member

def _get_int_member(self):
return self.__int_member
def _set_int_member(self, value):
if not isinstance(value, int):
raise TypeError("int_member must be type int")
self.__int_member = value
int_member = property(_get_int_member, _set_int_member)
del _get_int_member, _set_int_member

@method_argtypes(int)
def process_data(self, data):
"""Do some data processing.

IN:
data: <int>
New information that should be incorporated. Must be int.

"""
# Data processing starts here. May for example contain addition:
self.int_member += data

# Test code. Decomment to play.

#a = MyClass('moo')
#a = MyClass(9)
#a.int_member = 'moo'
#a.process_data('moo')
#a.process_data(9)

R

Robert Kern

Sep 1, 2006

#2

Joel said:
Hi!

The question of type checking/enforcing has bothered me for a while, and
since this newsgroup has a wealth of competence subscribed to it, I
figured this would be a great way of learning from the experts. I feel
there's a tradeoff between clear, easily readdable and extensible code
on one side, and safe code providing early errors and useful tracebacks
on the other. I want both! How do you guys do it? What's the pythonic
way? Are there any docs that I should read? All pointers and opinions
are appreciated!

Short answer: Use Traits. Don't invent your own mini-Traits.

(Disclosure: I work for Enthought.)

http://code.enthought.com/traits/

Unfortunately, I think the standalone tarball on that page, uh, doesn't stand
alone right now. We're cleaning up the interdependencies over the next two
weeks. Right now, your best bet is to get the whole enthought package:

http://code.enthought.com/ets/

Talk to us on enthought-dev if you need any help.

https://mail.enthought.com/mailman/listinfo/enthought-dev

Now back to Traits itself:

Traits does quite a bit more than "type-checking," and I think that is its
least-useful feature that it provides for Python users. Types are very
frequently exactly the wrong thing you want to check for. They allow inputs that
you would like to be invalid and disallow inputs that would have worked just
fine if you had relied on duck-typing. In general terms, Traits does
value-checking; it's just that some of the traits definitions check values by
validating their types.

You have to be careful with type-checking, because it can introduce fragility
without enhancing safety. But sometimes you are working with other code that
necessarily has type requirements (like extension code), and moving the
requirements forward a bit helps build usable interfaces.

Your examples would look like this with Traits:

from enthought.traits.api import HasTraits, Int, method

class MyClass(HasTraits):
"""My example class.
"""

int_member = Int(0, desc="I am an integer")

method(None, Int)
def process_data(self, data):
"""Do some data processing.
"""

self.int_member += 1

a = MyClass(int_member=9)
a = MyClass(int_member='moo')
"""
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/Users/kern/svn/enthought-lib/enthought/traits/trait_handlers.py", line
172, in error
raise TraitError, ( object, name, self.info(), value )
enthought.traits.trait_errors.TraitError: The 'int_member' trait of a MyClass
instance must be a value of type 'int', but a value of moo was specified.
"""

# and similar errors for
# a.int_member = 'moo'
# a.process_data('moo')

The method() function predates 2.4 and has not yet been converted to a
decorator. We don't actually use it much.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

B

Bruno Desthuilliers

Sep 1, 2006

#3

Joel Hedlund a écrit :

Hi!

The question of type checking/enforcing has bothered me for a while, (snip)

I've also whipped up some examples in order to put the above questions
in context and for your amusement. (snip)
These are the attached modules:

* nocheck_module.py:
As the above example, but with docs. No type checking.

* property_module.py
Type checking of data members using properties.

* methodcheck_module.py
Type checking of args within methods.

* decorator_module.py
Type checking of args using method decorators.

* maximum_security_module.py
Decorator and property type checking.

You forgot two other possible solutions (that can be mixed):
- using custom descriptors
- using FormEncode

J

Joel Hedlund

Sep 1, 2006

#4

Short answer: Use Traits. Don't invent your own mini-Traits.

Thanks for a quick and informative answer! I'll be sure to read up on the
subject. (And also: thanks Bruno for your contributions!)

> Types are very frequently exactly the wrong thing you want to check for.

I see what you mean. Allowing several data types may generate unwanted side
effects (integer division when expecting real division, for example).

I understand that Traits can do value checking which is superior to what I
presented, and that they can help me move validation away from functional
code, which is always desirable. But there is still the problem of setting
an approprate level of validation.

Should I validate data members only? This is quite easily done using Traits
or some other technique and keeps validation bloat localized in the code.
This is in line with the DRY principle and makes for smooth extensibility,
but the tracebacks will be less useful.

Or should I go the whole way and validate at every turn (all data members,
every arg in every method, ...)? This makes for very secure code and very
useful tracebacks, but does not feel very DRY to me... Are the benefits
worth the costs? Do I build myself a fortress of unmaintainability this way?
Will people laugh at my modules?

Or taken to the other extreme: Should I simply duck-type everything, and
only focus my validation efforts to external data (from users, external
applications and other forces of evil). This solution makes for extremely
clean code, but the thought of potential silent data corruption makes me
more than a little queasy.

What level do you go for?

Thanks!
/Joel

J

Joel Hedlund

Sep 1, 2006

#5

Bruno >> Your email address seem to be wrong. I tried to reply to you
directly in order to avoid thread bloat but my mail bounced.

Thanks for the quick reply though. I've skimmed through some docs on your
suggestions and I'll be sure to read up on them properly later. But as I
said to Robert Kern in this thread, this does not really seem resolve the
problem of setting an approprate level of validation.

How do you do it? Please reply to the group if you can find the time.

Cheers!
/Joel Hedlund

B

Bruno Desthuilliers

Sep 1, 2006

#6

Joel said:
Thanks for a quick and informative answer! I'll be sure to read up on
the subject. (And also: thanks Bruno for your contributions!)

I see what you mean. Allowing several data types may generate unwanted
side effects (integer division when expecting real division, for example).

I understand that Traits can do value checking which is superior to what
I presented, and that they can help me move validation away from
functional code, which is always desirable. But there is still the
problem of setting an approprate level of validation.

Should I validate data members only? This is quite easily done using
Traits or some other technique and keeps validation bloat localized in
the code. This is in line with the DRY principle and makes for smooth
extensibility, but the tracebacks will be less useful.

Or should I go the whole way and validate at every turn (all data
members, every arg in every method, ...)? This makes for very secure

....and inflexible...

code and very useful tracebacks, but does not feel very DRY to me... Are
the benefits worth the costs? Do I build myself a fortress of
unmaintainability this way? Will people laugh at my modules?

I'm not sure that trying to fight against the language is a sound
approach, whatever the language. If dynamic typing gives you the creep,
then use a statically typed language - possibly with type-inference to
keep as much genericity as possible.

Or taken to the other extreme: Should I simply duck-type everything, and
only focus my validation efforts to external data (from users, external
applications and other forces of evil).

IMHO and according to my experience : 99% yes (there are few corner
cases where it makes sens to ensure args correctness - which may or not
imply type-checking). Packages like FormEncode are great for data
conversion/validation. Once you have trusted data, the only possible
problem is within your code.

This solution makes for
extremely clean code, but the thought of potential silent data
corruption makes me more than a little queasy.

I've rarely encoutered "silent" data corruption with Python - FWIW, I
once had such a problem, but with a lower-level statically typed
language (integer overflow), and I was a very newbie programmer by that
time. Usually, one *very quickly* notices when something goes wrong. Now
if you're really serious, unit tests is the way to go - they can check
for much more than just types.

My 2 cents.

B

Bruno Desthuilliers

Sep 1, 2006

#7

Joel Hedlund wrote:

Bruno >> Your email address seem to be wrong.

let's say "disguised" !-)

I tried to reply to you
directly in order to avoid thread bloat but my mail bounced.

I don't think it's a good idea anyway - this thread is on topic here and
may be of interest to others too IMHO.

And while we're at it : please avoid top-posting.

Thanks for the quick reply though. I've skimmed through some docs on
your suggestions and I'll be sure to read up on them properly later. But
as I said to Robert Kern in this thread, this does not really seem
resolve the
problem of setting an approprate level of validation.

The "appropriate" level of validation depends on the context. There's
just no one-size-fits-all solution here. The only guideline I could come
with is too be paranoïd about what comes from the outside world and
mostly confident about what comes from other parts of the application.

J

Joel Hedlund

Sep 1, 2006

#8

And while we're at it : please avoid top-posting.

Yes, that was sloppy. Sorry.

/Joel

J

Joel Hedlund

Sep 1, 2006

#9

I'm not sure that trying to fight against the language is a sound

approach, whatever the language.

That's the very reason I posted in the first place. I feel like I'm fighting
the language, and since python at least to me seems to be so well thought
out in all other aspects, the most obvious conclusion must be that I'm
thinking about this the wrong way. And that's why I need your input!

IMHO and according to my experience : 99% yes (there are few corner
cases where it makes sens to ensure args correctness - which may or not
imply type-checking). Packages like FormEncode are great for data
conversion/validation. Once you have trusted data, the only possible
problem is within your code.

That approach is quite in line with the "blame yourself" methodology, which
seems to work in most other circumstances. Sort of like, developers who feed
bad data into my code have only themselves to blame! I can dig that.

Hmmm... So. I should build grimly paranoid parsers for external data, use
duck-typed interfaces everywhere on the inside, and simply callously
disregard developers who are disinclined to read documentation? I could do that.

> if you're really serious, unit tests is the way to go - they can check
> for much more than just types.

Yes, I'm very much serious indeed. But I haven't done any unit testing. I'll
have to check into that. Thanks!

My 2 cents.

Thankfully recieved and collecting interest as we speak.

Cheers!
/Joel

B

Bruno Desthuilliers

Sep 1, 2006

#10

Joel said:
That's the very reason I posted in the first place. I feel like I'm
fighting the language, and since python at least to me seems to be so
well thought out in all other aspects, the most obvious conclusion must
be that I'm thinking about this the wrong way. And that's why I need
your input!

The first thing I tried to do when I discovered Python (coming from
statically typed langages) was to try to forcefit it into static typing.
Then I realized that there was a whole lot of non-trivial Python apps
and libs that did just work, which made me think about the real
usefulness of static typing. Which is mainly optimisation hints for the
machine. As you probably noticed, declarative static typing imposes much
boilerplate and somewhat arbitrary restrictions, and I still wait for a
proof that it leads to more robust programs - FWIW, MVHO is that it
usually leads to more complex - hence potentially less robust - code.

That approach is quite in line with the "blame yourself" methodology,
which seems to work in most other circumstances. Sort of like,
developers who feed bad data into my code have only themselves to blame!

As long as your code is correctly documented, yes. All attempts to write
idiot-proof librairy code as failed so far AFAICT, so just let idiots
suffer from their idiocy and focus on providing good tools to normal
programmers. My own philosophie of course...

I can dig that.

Hmmm... So. I should build grimly paranoid parsers for external data,

Most of the time, you'll find they already exists. FormEncode is not
just for html forms - it's a general, powerful and flexible (but alas
very badly documented) bidirectional data converter/validator.

use duck-typed interfaces everywhere on the inside,

Talking about interfaces, you may want to have a look at PyProtocols
(PEAK) and Zope3 Interfaces.

and simply callously
disregard developers who are disinclined to read documentation?

As long as you provide a usable documentation, misuse of your code is
not your problem anymore (unless of course you're the one misusing it !-).

I could
do that.

Yes, I'm very much serious indeed. But I haven't done any unit testing.

Then you probably want to read the relevant chapter in DiveIntoPython.

HTH

J

Joel Hedlund

Sep 1, 2006

#11

I still wait for a

proof that it leads to more robust programs - FWIW, MVHO is that it
usually leads to more complex - hence potentially less robust - code.

MVHO? I assume you are not talking about Miami Valley Housing Opportunities
here, but bloat probably leads to bugs, yes.

Talking about interfaces, you may want to have a look at PyProtocols
(PEAK) and Zope3 Interfaces.

Ooh. Neat.

As long as you provide a usable documentation, misuse of your code is
not your problem anymore (unless of course you're the one misusing it !-).

But hey, then I'm still just letting idiots suffer from their idiocy, and
since that's part of our greater plan anyway I guess that's ok :-D

Then you probably want to read the relevant chapter in DiveIntoPython.

You are completely correct. Thanks for the tip.

Thanks for your help! It's been real useful. Now I'll sleep better at night.

Cheers!
/Joel

B

Bruno Desthuilliers

Sep 1, 2006

#12

Joel said:
MVHO? I assume you are not talking about Miami Valley Housing
Opportunities here,

Nope --> My Very Humble Opinion

P

Paddy

Sep 1, 2006

#13

Joel said:
Hmmm... So. I should build grimly paranoid parsers for external data, use
duck-typed interfaces everywhere on the inside, and simply callously
disregard developers who are disinclined to read documentation? I could do that.

Yes, I'm very much serious indeed. But I haven't done any unit testing. I'll
have to check into that. Thanks!

You might try doctests, they can be easier to write and fit into the
unit test framework if needed.
http://en.wikipedia.org/wiki/Doctest

- Paddy.

J

Joel Hedlund

Sep 1, 2006

#14

You might try doctests, they can be easier to write and fit into the

unit test framework if needed.

While I firmly believe in keeping docs up to date, I don't think that
doctests alone can solve the problem of maintaining data integrity in
projects with more comlex interfaces (which is what I really meant to
talk about. Sorry if my simplified examples led you to believe
otherwise). For simple, deterministic functions like math.pow I think
it's great, but for something like BaseHTTPServer... probably not. The
__doc__'s required would be truly fascinating to behold. And probably
voluminous and mostly unreadable for humans. Or is there something that
I've misunderstood?

/Joel

P

Paddy

Sep 1, 2006

#15

Joel said:
While I firmly believe in keeping docs up to date, I don't think that
doctests alone can solve the problem of maintaining data integrity in
projects with more comlex interfaces (which is what I really meant to
talk about. Sorry if my simplified examples led you to believe
otherwise). For simple, deterministic functions like math.pow I think
it's great, but for something like BaseHTTPServer... probably not. The
__doc__'s required would be truly fascinating to behold. And probably
voluminous and mostly unreadable for humans. Or is there something that
I've misunderstood?

/Joel

Oh, I was just addressing your bit about not knowing unit tests.
Doctests can be quicker to put together and have only a small learning
curve.
On the larger scale, I too advocate extensive checking of 'tainted'
data from 'external' sources, then assuming 'clean' data is as expected
and doing no explicit further data checks, after all, you've got to
trust your development team/yourself.

- Pad.

J

Joel Hedlund

Sep 2, 2006

#16

Oh, I was just addressing your bit about not knowing unit tests.

Doctests can be quicker to put together and have only a small learning
curve.

OK, I see what you mean. And you're right. I'm struggling mightily right
now with trying to come up with sane unit tests for a bunch of
generalized parser classes that I'm about to implement, and which are
supposed to play nice with each other... Gah! But I'll get there
eventually...

On the larger scale, I too advocate extensive checking of 'tainted'
data from 'external' sources, then assuming 'clean' data is as expected
and doing no explicit further data checks, after all, you've got to
trust your development team/yourself.

Right.

Thanks for helpful tips and insights, and for taking the time!

Cheers!
/Joel

P

Paul Rubin

Sep 3, 2006

#17

Bruno Desthuilliers said:
I've rarely encoutered "silent" data corruption with Python - FWIW, I
once had such a problem, but with a lower-level statically typed
language (integer overflow), and I was a very newbie programmer by that
time. Usually, one *very quickly* notices when something goes wrong.

The same thing can happen in Python, and the resulting bugs can be
pretty subtle. I noticed the following example as the result of
another thread, which was about how to sort an 85 gigabyte file.
Try to put a slice interface on a file-based object and you can
hit strange integer-overflow bugs once the file gets larger than 2GB:

Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information. slice(0, 5559060566555523L, None) # OK ...

So we expect slicing with large args to work properly. But then:
... def __getitem__(self, s):
... print s
...

>>> a = A()
>>> a[0:3**33] slice(0, 2147483647, None) # oops!!!!
>>>

Click to expand...

Click to expand...

B

Bruno Desthuilliers

Sep 3, 2006

#18

Paul Rubin a écrit :

Bruno Desthuilliers said:
Bruno Desthuilliers said:

I've rarely encoutered "silent" data corruption with Python - FWIW, I
once had such a problem, but with a lower-level statically typed
language (integer overflow), and I was a very newbie programmer by that
time. Usually, one *very quickly* notices when something goes wrong.

Click to expand...

The same thing can happen in Python, and the resulting bugs can be
pretty subtle. I noticed the following example as the result of
another thread, which was about how to sort an 85 gigabyte file.
Try to put a slice interface on a file-based object and you can
hit strange integer-overflow bugs once the file gets larger than 2GB:

Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
[GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.slice(0, 5559060566555523L, None) # OK ...

So we expect slicing with large args to work properly. But then:
... def __getitem__(self, s):
... print s
...

a = A()
a[0:3**33] slice(0, 2147483647, None) # oops!!!!

Click to expand...

Click to expand...

Looks like a Python bug, not a programmer error. And BTW, it doesn't
happens with >=2.4.1

Python 2.4.1 (#1, Jul 23 2005, 00:37:37)
[GCC 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)] on
linux2
Type "help", "copyright", "credits" or "license" for more information..... def __getitem__(self, s):
.... print s
....

>>> A()[0:3**33] slice(0, 5559060566555523L, None)
>>>

Click to expand...

Click to expand...

A

Alex Martelli

Sep 3, 2006

#19

note that A is new-style...

....while x is old-style.

Here's a small script to explore the problem...:

import sys

class oldstyle:
def __getitem__(self, index): print index,

class newstyle(object, oldstyle): pass

s = slice(0, 3**33)

print sys.version[:5]
print 'slice:', s
print 'old:',
oldstyle()
oldstyle()[:3**33]
print
print 'new:',
newstyle()
newstyle()[:3**33]
print

Running this on 2.3.5, 2.4.3, 2.5c1, 2.6a0, the results are ALWAYS:

2.5c1
slice: slice(0, 5559060566555523L, None)
old: slice(0, 5559060566555523L, None) slice(0, 2147483647, None)
slice(None, 5559060566555523L, 1)
new: slice(0, 5559060566555523L, None) slice(None, 5559060566555523L,
None) slice(None, 5559060566555523L, 1)

[[except for the version ID, of course, which changes across runs;-)]]

So: no difference across Python releases -- bug systematically there
when slicing oldstyle classes, but only when slicing them with
NON-extended slice syntax (all is fine when slicing with extended syntax
OR when passing a slice object directly; indeed, dis.dis shows that
using extended syntax builds the slice then passes it, while slicing
without a step uses the SLICE+2 opcode instead).

If you add a (deprecated, I believe) __getslice__ method, you'll see the
same bug appear in newstyle classes too (again, for non-extended slicing
syntax only).

A look at ceval.c shows that apply_slice (called by SLICE+2 &c) uses
_PyEval_SliceIndex and PySequence_GetSlice if the LHO has sq_slice in
tp_as_sequence, otherwise PySlice_New and PyObject_GetItem. And the
relevant signature is...:

_PyEval_SliceIndex(PyObject *v, Py_ssize_t *pi)

(int instead of Py_ssize_t in older versions of Python), so of course
the "detour" through this function MUST truncate the value (to 32 or 64
bits depending on the platform).

The reason the bug shows up in classic classes even without an explicit
__getslice__ is of course that a classic class ``has all the slots''
(from the C-level viewpoint;-) -- only way to allow the per-instance
behavior of classic-instances...

My inclination here would be to let the bug persist, just adding an
explanation of it in the documentation about why one should NOT use
classic classes and should NOT define __getslice__. Any fix might
perhaps provoke wrong behavior in old programs that define and use
__getslice__ and/or classic classes and "count" on the truncation; the
workaround is easy (only use fully-supported features of the language,
i.e. newstyle classes and __getitem__ for slicing). But I guess we can
(and probably should) move this debate to python-dev;-).

Alex

Python 2.7.x - problem with obejct.__init__() not accepting args and *kwargs	5	May 15, 2013
Python descriptor protocol (for more or less structured data)	3	Jul 30, 2013
How to pass class instance to a method?	14	Nov 25, 2012
Class decorator to capture the creation and deletion of objects	0	Feb 25, 2014
Style question - defining immutable class data members	1	Mar 14, 2009
How to round trip python and sqlite dates	1	Nov 17, 2013
my first class: Args	6	Aug 27, 2004
Python point location of intersect between two lines	0	Feb 28, 2018

Joel Hedlund

Robert Kern

Bruno Desthuilliers

Joel Hedlund

Joel Hedlund

Bruno Desthuilliers

Bruno Desthuilliers

Joel Hedlund

Joel Hedlund

Bruno Desthuilliers

Joel Hedlund

Bruno Desthuilliers

Paddy

Joel Hedlund

Paddy

Joel Hedlund

Paul Rubin

Bruno Desthuilliers

Alex Martelli

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads