Python style: to check or not to check args and data members

Discussion in 'Python' started by Joel Hedlund, Sep 1, 2006.

  1. Joel Hedlund

    Joel Hedlund Guest

    Hi!

    The question of type checking/enforcing has bothered me for a while, and
    since this newsgroup has a wealth of competence subscribed to it, I
    figured this would be a great way of learning from the experts. I feel
    there's a tradeoff between clear, easily readdable and extensible code
    on one side, and safe code providing early errors and useful tracebacks
    on the other. I want both! How do you guys do it? What's the pythonic
    way? Are there any docs that I should read? All pointers and opinions
    are appreciated!

    I've also whipped up some examples in order to put the above questions
    in context and for your amusement. :)

    Briefly:

    class MyClass(object):
    def __init__(self, int_member = 0):
    self.int_member = int_member
    def process_data(self, data):
    self.int_member += data

    The attached files are elaborations on this theme, with increasing
    security and, alas, rigidity and bloat. Even though
    maximum_security_module.py probably will be the safest to use, the
    coding style will bloat the code something awful and will probably make
    maintenance harder (please prove me wrong!). Where should I draw the line?

    These are the attached modules:

    * nocheck_module.py:
    As the above example, but with docs. No type checking.

    * property_module.py
    Type checking of data members using properties.

    * methodcheck_module.py
    Type checking of args within methods.

    * decorator_module.py
    Type checking of args using method decorators.

    * maximum_security_module.py
    Decorator and property type checking.

    Let's pretend I'm writing a script, I import one of the above modules
    and then execute the following code

    ....
    my_object = MyClass(data1)
    my_object.process_data(data2)

    and then let's pretend dataX is of a bad type, say for example str.

    nocheck_module.py
    =================
    Now, if data2 is bad, we get a suboptimal traceback (possibly to
    somewhere deep within the code, and probably with an unrelated error
    message). However, the first point of failure will in fact be included
    in the traceback, so this error should be possible to find with little
    effort. On the other hand, if data1 is bad, the exception will be raised
    somewhere past the point of first failure. The traceback will be
    completely off, and the error message will still be bad. Even worse: if
    both are bad, we won't even get an exception. We will trundle on with
    corrupted data and take no notice. Very clear code, though. Easily
    extensible.

    property_module.py
    ==================
    Here we catch that data1 failure. Tracebacks may still be inconcise with
    uninformative error messages, however they will not be as bad as in
    nocheck_module.py. Bloat. +7 or more lines of boilerplate code for each
    additional data member. Quite clear code. Readily extensible.

    methodcheck_module.py
    =====================
    Good, concise tracebacks with exact error messages. Lots of bloat and
    obscured code. Misses errors where data members are changed directly.
    Very hard to read and extend.

    decorator_module.py
    ===================
    Good, concise tracebacks with good error messages. Some bloat. Misses
    errors where data members are changed directly. Clear, but somewhat hard
    to extend. Decorators for *all* methods?! This cannot be the purpose of
    python!?

    maximum_security_method.py
    ==========================
    Good, concise tracebacks with good error messages. No errors missed (I
    think? :) . Bloat. Lots of decorators and boilerplate property code all
    over the place (thankfully not within functional code, though). Is this
    how it's supposed to be done?


    And if you've read all the way down here I thank you so very much for
    your patience and perseverance. Now I'd like to hear your thoughts on
    this! Where should the line be drawn? Should I just typecheck data from
    unreliable sources (users/other applications) and stick with the
    barebone strategy, or should I go all the way? Did I miss something
    obvious? Should I read some docs? (Which?) Are there performance issues
    to consider?

    Thanks again for taking the time.

    Cheers!
    /Joel Hedlund

    """Example module without method argument type checking.

    Pros:
    Pinpointed tracebacks with very exact error messages.

    Cons:
    Lots of boilerplate typechecking code littered all over the place,
    obscuring functionality at the start of every function.
    Bloat will accumulate rapidly. +2 lines of boilerplate code per method and
    argument.
    If I at some point decide that floats are also ok, I'll need to crawl all
    over the code with a magnifying glass and a pair of tweezers.
    We don't catch errors of the type
    a = MyClass()
    a.int_member = 'moo!"
    a.process_data(1)

    """

    class MyClass(object):
    """My example class."""
    def __init__(self, int_member = 0):
    """Instantiate a new MyClass object.

    IN:
    int_member = 0: <int>
    Set the value for the data member. Must be int.

    """
    # Boilerplate typechecking code.
    if not isinstance(int_member, int):
    raise TypeError("int_member must be int")
    # Initiallization starts here. May for example contain assignment.
    self.int_member = int_member

    def process_data(self, data):
    """Do some data processing.

    IN:
    data: <int>
    New information that should be incorporated. Must be int.

    """
    # Boilerplate typechecking code.
    if not isinstance(data, int):
    raise TypeError("data must be int")
    # Data processing starts here. May for example contain addition:
    self.int_member += data

    # Test code. Decomment to play. :)

    #a = MyClass('moo')
    #a = MyClass(9)
    #a.int_member = 'moo'
    #a.process_data('moo')
    #a.process_data(9)

    """Example module without type checking.

    Pros:
    Clean, easily readable and extensible code that gets down to business
    fast. If I at some point decide that floats are also ok, I only need to
    update the docs and all is well.
    No bloat.

    Cons:
    Type restrictions are not enforced. This means that if type errors occur,
    the exception may be raised far from the point of first failure, and
    possibly with long, inconcise tracebacks with uninformative error messages.

    """

    class MyClass(object):
    """My example class."""
    def __init__(self, int_member = 0):
    """Instantiate a new MyClass object.

    IN:
    int_member = 0: <int>
    Set the value for the data member. Must be int.

    """
    # Initiallization starts here. May for example contain assignment.
    self.int_member = int_member

    def process_data(self, data):
    """Do some data processing.

    IN:
    data: <int>
    New information that should be incorporated. Must be int.

    """
    # Data processing starts here. May for example contain addition:
    self.int_member += data

    # Test code. Decomment to play. :)

    #a = MyClass('moo')
    #a = MyClass(9)
    #a.int_member = 'moo'
    #a.process_data('moo')
    #a.process_data(9)

    """Example module using properties for data member type checking.

    Pros:
    Quite clean, readable and extensible code that gets down to business fast.
    Data member type restrictions are enforced. If I at some point decide that
    floats are also ok, I only need to update the docs and a few more lines.

    Cons:
    Method argument types are not enforced, which means that tracebacks may
    still be inconcise with uninformative error messages. Not as bad as in
    nocheck_module.py though.
    Bloat. +7 or more lines of boilerplate code for each added data member (can
    this be done neater?). But at least the bloat is outside functional code.

    """

    class MyClass(object):
    """My example class."""
    def __init__(self, int_member = 0):
    """Instantiate a new MyClass object.

    IN:
    int_member = 0: <int>
    Set the value for the data member. Must be int.

    """
    # Initiallization starts here. May for example contain assignment.
    self.int_member = int_member

    def _get_int_member(self):
    return self.__int_member
    def _set_int_member(self, value):
    if not isinstance(value, int):
    raise TypeError("int_member must be type int")
    self.__int_member = value
    int_member = property(_get_int_member, _set_int_member)
    del _get_int_member, _set_int_member

    def process_data(self, data):
    """Do some data processing.

    IN:
    data: <int>
    New information that should be incorporated. Must be int.

    """
    # Data processing starts here. May for example contain addition:
    self.int_member += data

    # Test code. Decomment to play. :)

    #a = MyClass('moo')
    #a = MyClass(9)
    #a.int_member = 'moo'
    #a.process_data('moo')
    #a.process_data(9)

    """Example module without type checking.

    Pros:
    Clean, easily readable and extensible code that gets down to business
    fast.
    Pinpointed tracebacks with good error messages.
    If I at some point decide that floats are also ok, I only need to
    update the docs and change the decorators to
    @method_argtypes((int, float)).

    Cons:
    With many args and allowed types, the type definitions on the decorator
    lines will be hard to correlate to the args that they refer to (probably
    not impossible to workaround though...?).
    We still don't catch errors of the type
    a = MyClass()
    a.int_member = 'moo!"
    a.process_data(1)
    A decorator for each method everywhere? That can't be the purpose of
    python!? There has to be a better way?!

    """

    def method_argtypes(*typedefs):
    """Rudimentary typechecker decorator generator.

    If you're really interested in this stuff, go check out Michele
    Simionato's decorator module instead. It rocks. Google is your friend.

    IN:
    *typedefs: <type> or <tuple <type>>
    The allowed types for each arg to the method, self excluded.
    Will be used with isinstance(), so valid typedefs include
    int or (int, float).

    """
    def argchecker(fcn):
    import inspect
    names = inspect.getargspec(fcn)[0][1:]
    def check_args(*args):
    for arg, value, allowed_types in zip(names, args[1:], typedefs):
    if not isinstance(value, allowed_types):
    one_of = ''
    if hasattr(allowed_types, '__len__'):
    one_of = "one of "
    msg = ".%s() argument %r must be %s%s"
    msg %= fcn.__name__, arg, one_of, allowed_types
    raise TypeError(msg)
    return fcn(*args)
    return check_args
    return argchecker

    class MyClass(object):
    """My example class."""
    @method_argtypes(int)
    def __init__(self, int_member = 0):
    """Instantiate a new MyClass object.

    IN:
    int_member = 0: <int>
    Set the value for the data member. Must be int.

    """
    # Initiallization starts here. May for example contain assignment.
    self.int_member = int_member

    @method_argtypes(int)
    def process_data(self, data):
    """Do some data processing.

    IN:
    data: <int>
    New information that should be incorporated. Must be int.

    """
    # Data processing starts here. May for example contain addition:
    self.int_member += data

    # Test code. Decomment to play. :)

    #a = MyClass('moo')
    #a = MyClass(9)
    #a.int_member = 'moo'
    #a.process_data('moo')
    #a.process_data(9)

    """Example module without type checking.

    Pros:
    Clean, easily readable and extensible code that gets down to business
    fast.
    Pinpointed tracebacks with good error messages.
    Now we catch errors of the type
    a = MyClass()
    a.int_member = 'moo!"
    a.process_data(1)

    Cons:
    With many args and allowed types, the type definitions on the decorator
    lines will be hard to correlate to the args that they refer to (probably
    not impossible to workaround though...?).
    A decorator for each method everywhere? That can't be the purpose of
    python!? There has to be a better way?!
    Property bloat. +7 or more lines of boilerplate code for each added data
    member (can this be done neater?).
    If I at some point decide that floats are also ok, I only need to
    update the docs, decorators and properties... hmm...

    """

    def method_argtypes(*typedefs):
    """Rudimentary typechecker decorator generator.

    If you're really interested in this stuff, go check out Michele
    Simionato's decorator module instead. It rocks. Google is your friend.

    IN:
    *typedefs: <type> or <tuple <type>>
    The allowed types for each arg to the method, self excluded.
    Will be used with isinstance(), so valid typedefs include
    int or (int, float).

    """
    def argchecker(fcn):
    import inspect
    names = inspect.getargspec(fcn)[0][1:]
    def check_args(*args):
    for arg, value, allowed_types in zip(names, args[1:], typedefs):
    if not isinstance(value, allowed_types):
    one_of = ''
    if hasattr(allowed_types, '__len__'):
    one_of = "one of "
    msg = ".%s() argument %r must be %s%s"
    msg %= fcn.__name__, arg, one_of, allowed_types
    raise TypeError(msg)
    return fcn(*args)
    return check_args
    return argchecker

    class MyClass(object):
    """My example class."""
    @method_argtypes(int)
    def __init__(self, int_member = 0):
    """Instantiate a new MyClass object.

    IN:
    int_member = 0: <int>
    Set the value for the data member. Must be int.

    """
    # Initiallization starts here. May for example contain assignment.
    self.int_member = int_member

    def _get_int_member(self):
    return self.__int_member
    def _set_int_member(self, value):
    if not isinstance(value, int):
    raise TypeError("int_member must be type int")
    self.__int_member = value
    int_member = property(_get_int_member, _set_int_member)
    del _get_int_member, _set_int_member

    @method_argtypes(int)
    def process_data(self, data):
    """Do some data processing.

    IN:
    data: <int>
    New information that should be incorporated. Must be int.

    """
    # Data processing starts here. May for example contain addition:
    self.int_member += data

    # Test code. Decomment to play. :)

    #a = MyClass('moo')
    #a = MyClass(9)
    #a.int_member = 'moo'
    #a.process_data('moo')
    #a.process_data(9)
     
    Joel Hedlund, Sep 1, 2006
    #1
    1. Advertising

  2. Joel Hedlund

    Robert Kern Guest

    Joel Hedlund wrote:
    > Hi!
    >
    > The question of type checking/enforcing has bothered me for a while, and
    > since this newsgroup has a wealth of competence subscribed to it, I
    > figured this would be a great way of learning from the experts. I feel
    > there's a tradeoff between clear, easily readdable and extensible code
    > on one side, and safe code providing early errors and useful tracebacks
    > on the other. I want both! How do you guys do it? What's the pythonic
    > way? Are there any docs that I should read? All pointers and opinions
    > are appreciated!


    Short answer: Use Traits. Don't invent your own mini-Traits.

    (Disclosure: I work for Enthought.)

    http://code.enthought.com/traits/

    Unfortunately, I think the standalone tarball on that page, uh, doesn't stand
    alone right now. We're cleaning up the interdependencies over the next two
    weeks. Right now, your best bet is to get the whole enthought package:

    http://code.enthought.com/ets/

    Talk to us on enthought-dev if you need any help.

    https://mail.enthought.com/mailman/listinfo/enthought-dev


    Now back to Traits itself:

    Traits does quite a bit more than "type-checking," and I think that is its
    least-useful feature that it provides for Python users. Types are very
    frequently exactly the wrong thing you want to check for. They allow inputs that
    you would like to be invalid and disallow inputs that would have worked just
    fine if you had relied on duck-typing. In general terms, Traits does
    value-checking; it's just that some of the traits definitions check values by
    validating their types.

    You have to be careful with type-checking, because it can introduce fragility
    without enhancing safety. But sometimes you are working with other code that
    necessarily has type requirements (like extension code), and moving the
    requirements forward a bit helps build usable interfaces.

    Your examples would look like this with Traits:


    from enthought.traits.api import HasTraits, Int, method

    class MyClass(HasTraits):
    """My example class.
    """

    int_member = Int(0, desc="I am an integer")

    method(None, Int)
    def process_data(self, data):
    """Do some data processing.
    """

    self.int_member += 1


    a = MyClass(int_member=9)
    a = MyClass(int_member='moo')
    """
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    File "/Users/kern/svn/enthought-lib/enthought/traits/trait_handlers.py", line
    172, in error
    raise TraitError, ( object, name, self.info(), value )
    enthought.traits.trait_errors.TraitError: The 'int_member' trait of a MyClass
    instance must be a value of type 'int', but a value of moo was specified.
    """

    # and similar errors for
    # a.int_member = 'moo'
    # a.process_data('moo')


    The method() function predates 2.4 and has not yet been converted to a
    decorator. We don't actually use it much.

    --
    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
     
    Robert Kern, Sep 1, 2006
    #2
    1. Advertising

  3. Joel Hedlund a écrit :
    > Hi!
    >
    > The question of type checking/enforcing has bothered me for a while,

    (snip)
    >
    > I've also whipped up some examples in order to put the above questions
    > in context and for your amusement. :)

    (snip)
    > These are the attached modules:
    >
    > * nocheck_module.py:
    > As the above example, but with docs. No type checking.
    >
    > * property_module.py
    > Type checking of data members using properties.
    >
    > * methodcheck_module.py
    > Type checking of args within methods.
    >
    > * decorator_module.py
    > Type checking of args using method decorators.
    >
    > * maximum_security_module.py
    > Decorator and property type checking.


    You forgot two other possible solutions (that can be mixed):
    - using custom descriptors
    - using FormEncode
     
    Bruno Desthuilliers, Sep 1, 2006
    #3
  4. Joel Hedlund

    Joel Hedlund Guest

    > Short answer: Use Traits. Don't invent your own mini-Traits.

    Thanks for a quick and informative answer! I'll be sure to read up on the
    subject. (And also: thanks Bruno for your contributions!)

    > Types are very frequently exactly the wrong thing you want to check for.


    I see what you mean. Allowing several data types may generate unwanted side
    effects (integer division when expecting real division, for example).

    I understand that Traits can do value checking which is superior to what I
    presented, and that they can help me move validation away from functional
    code, which is always desirable. But there is still the problem of setting
    an approprate level of validation.

    Should I validate data members only? This is quite easily done using Traits
    or some other technique and keeps validation bloat localized in the code.
    This is in line with the DRY principle and makes for smooth extensibility,
    but the tracebacks will be less useful.

    Or should I go the whole way and validate at every turn (all data members,
    every arg in every method, ...)? This makes for very secure code and very
    useful tracebacks, but does not feel very DRY to me... Are the benefits
    worth the costs? Do I build myself a fortress of unmaintainability this way?
    Will people laugh at my modules?

    Or taken to the other extreme: Should I simply duck-type everything, and
    only focus my validation efforts to external data (from users, external
    applications and other forces of evil). This solution makes for extremely
    clean code, but the thought of potential silent data corruption makes me
    more than a little queasy.

    What level do you go for?

    Thanks!
    /Joel

    Robert Kern wrote:
    > Joel Hedlund wrote:
    >> Hi!
    >>
    >> The question of type checking/enforcing has bothered me for a while, and
    >> since this newsgroup has a wealth of competence subscribed to it, I
    >> figured this would be a great way of learning from the experts. I feel
    >> there's a tradeoff between clear, easily readdable and extensible code
    >> on one side, and safe code providing early errors and useful tracebacks
    >> on the other. I want both! How do you guys do it? What's the pythonic
    >> way? Are there any docs that I should read? All pointers and opinions
    >> are appreciated!

    >
    > Short answer: Use Traits. Don't invent your own mini-Traits.
    >
    > (Disclosure: I work for Enthought.)
    >
    > http://code.enthought.com/traits/
    >
    > Unfortunately, I think the standalone tarball on that page, uh, doesn't stand
    > alone right now. We're cleaning up the interdependencies over the next two
    > weeks. Right now, your best bet is to get the whole enthought package:
    >
    > http://code.enthought.com/ets/
    >
    > Talk to us on enthought-dev if you need any help.
    >
    > https://mail.enthought.com/mailman/listinfo/enthought-dev
    >
    >
    > Now back to Traits itself:
    >
    > Traits does quite a bit more than "type-checking," and I think that is its
    > least-useful feature that it provides for Python users. Types are very
    > frequently exactly the wrong thing you want to check for. They allow inputs that
    > you would like to be invalid and disallow inputs that would have worked just
    > fine if you had relied on duck-typing. In general terms, Traits does
    > value-checking; it's just that some of the traits definitions check values by
    > validating their types.
    >
    > You have to be careful with type-checking, because it can introduce fragility
    > without enhancing safety. But sometimes you are working with other code that
    > necessarily has type requirements (like extension code), and moving the
    > requirements forward a bit helps build usable interfaces.
    >
    > Your examples would look like this with Traits:
    >
    >
    > from enthought.traits.api import HasTraits, Int, method
    >
    > class MyClass(HasTraits):
    > """My example class.
    > """
    >
    > int_member = Int(0, desc="I am an integer")
    >
    > method(None, Int)
    > def process_data(self, data):
    > """Do some data processing.
    > """
    >
    > self.int_member += 1
    >
    >
    > a = MyClass(int_member=9)
    > a = MyClass(int_member='moo')
    > """
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > File "/Users/kern/svn/enthought-lib/enthought/traits/trait_handlers.py", line
    > 172, in error
    > raise TraitError, ( object, name, self.info(), value )
    > enthought.traits.trait_errors.TraitError: The 'int_member' trait of a MyClass
    > instance must be a value of type 'int', but a value of moo was specified.
    > """
    >
    > # and similar errors for
    > # a.int_member = 'moo'
    > # a.process_data('moo')
    >
    >
    > The method() function predates 2.4 and has not yet been converted to a
    > decorator. We don't actually use it much.
    >
     
    Joel Hedlund, Sep 1, 2006
    #4
  5. Joel Hedlund

    Joel Hedlund Guest

    Bruno >> Your email address seem to be wrong. I tried to reply to you
    directly in order to avoid thread bloat but my mail bounced.

    Thanks for the quick reply though. I've skimmed through some docs on your
    suggestions and I'll be sure to read up on them properly later. But as I
    said to Robert Kern in this thread, this does not really seem resolve the
    problem of setting an approprate level of validation.

    How do you do it? Please reply to the group if you can find the time.

    Cheers!
    /Joel Hedlund

    Bruno Desthuilliers wrote:
    > Joel Hedlund a écrit :
    >> Hi!
    >>
    >> The question of type checking/enforcing has bothered me for a while,

    > (snip)
    >> I've also whipped up some examples in order to put the above questions
    >> in context and for your amusement. :)

    > (snip)
    >> These are the attached modules:
    >>
    >> * nocheck_module.py:
    >> As the above example, but with docs. No type checking.
    >>
    >> * property_module.py
    >> Type checking of data members using properties.
    >>
    >> * methodcheck_module.py
    >> Type checking of args within methods.
    >>
    >> * decorator_module.py
    >> Type checking of args using method decorators.
    >>
    >> * maximum_security_module.py
    >> Decorator and property type checking.

    >
    > You forgot two other possible solutions (that can be mixed):
    > - using custom descriptors
    > - using FormEncode
    >
    >
     
    Joel Hedlund, Sep 1, 2006
    #5
  6. Joel Hedlund wrote:
    >> Short answer: Use Traits. Don't invent your own mini-Traits.

    >
    > Thanks for a quick and informative answer! I'll be sure to read up on
    > the subject. (And also: thanks Bruno for your contributions!)
    >
    >> Types are very frequently exactly the wrong thing you want to check for.

    >
    > I see what you mean. Allowing several data types may generate unwanted
    > side effects (integer division when expecting real division, for example).
    >
    > I understand that Traits can do value checking which is superior to what
    > I presented, and that they can help me move validation away from
    > functional code, which is always desirable. But there is still the
    > problem of setting an approprate level of validation.
    >
    > Should I validate data members only? This is quite easily done using
    > Traits or some other technique and keeps validation bloat localized in
    > the code. This is in line with the DRY principle and makes for smooth
    > extensibility, but the tracebacks will be less useful.
    >
    > Or should I go the whole way and validate at every turn (all data
    > members, every arg in every method, ...)? This makes for very secure


    ....and inflexible...

    > code and very useful tracebacks, but does not feel very DRY to me... Are
    > the benefits worth the costs? Do I build myself a fortress of
    > unmaintainability this way? Will people laugh at my modules?


    I'm not sure that trying to fight against the language is a sound
    approach, whatever the language. If dynamic typing gives you the creep,
    then use a statically typed language - possibly with type-inference to
    keep as much genericity as possible.

    > Or taken to the other extreme: Should I simply duck-type everything, and
    > only focus my validation efforts to external data (from users, external
    > applications and other forces of evil).


    IMHO and according to my experience : 99% yes (there are few corner
    cases where it makes sens to ensure args correctness - which may or not
    imply type-checking). Packages like FormEncode are great for data
    conversion/validation. Once you have trusted data, the only possible
    problem is within your code.

    > This solution makes for
    > extremely clean code, but the thought of potential silent data
    > corruption makes me more than a little queasy.


    I've rarely encoutered "silent" data corruption with Python - FWIW, I
    once had such a problem, but with a lower-level statically typed
    language (integer overflow), and I was a very newbie programmer by that
    time. Usually, one *very quickly* notices when something goes wrong. Now
    if you're really serious, unit tests is the way to go - they can check
    for much more than just types.

    My 2 cents.
    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    Bruno Desthuilliers, Sep 1, 2006
    #6
  7. Joel Hedlund wrote:

    <OT>
    > Bruno >> Your email address seem to be wrong.


    let's say "disguised" !-)

    > I tried to reply to you
    > directly in order to avoid thread bloat but my mail bounced.


    I don't think it's a good idea anyway - this thread is on topic here and
    may be of interest to others too IMHO.

    And while we're at it : please avoid top-posting.
    <OT>


    > Thanks for the quick reply though. I've skimmed through some docs on
    > your suggestions and I'll be sure to read up on them properly later. But
    > as I said to Robert Kern in this thread, this does not really seem
    > resolve the
    > problem of setting an approprate level of validation.


    The "appropriate" level of validation depends on the context. There's
    just no one-size-fits-all solution here. The only guideline I could come
    with is too be paranoïd about what comes from the outside world and
    mostly confident about what comes from other parts of the application.

    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    Bruno Desthuilliers, Sep 1, 2006
    #7
  8. Joel Hedlund

    Joel Hedlund Guest

    > And while we're at it : please avoid top-posting.

    Yes, that was sloppy. Sorry.

    /Joel
     
    Joel Hedlund, Sep 1, 2006
    #8
  9. Joel Hedlund

    Joel Hedlund Guest

    > I'm not sure that trying to fight against the language is a sound
    > approach, whatever the language.


    That's the very reason I posted in the first place. I feel like I'm fighting
    the language, and since python at least to me seems to be so well thought
    out in all other aspects, the most obvious conclusion must be that I'm
    thinking about this the wrong way. And that's why I need your input!

    >> > Or taken to the other extreme: Should I simply duck-type everything, and
    >> > only focus my validation efforts to external data (from users, external
    >> > applications and other forces of evil).

    >
    > IMHO and according to my experience : 99% yes (there are few corner
    > cases where it makes sens to ensure args correctness - which may or not
    > imply type-checking). Packages like FormEncode are great for data
    > conversion/validation. Once you have trusted data, the only possible
    > problem is within your code.


    That approach is quite in line with the "blame yourself" methodology, which
    seems to work in most other circumstances. Sort of like, developers who feed
    bad data into my code have only themselves to blame! I can dig that. :)

    Hmmm... So. I should build grimly paranoid parsers for external data, use
    duck-typed interfaces everywhere on the inside, and simply callously
    disregard developers who are disinclined to read documentation? I could do that.

    > if you're really serious, unit tests is the way to go - they can check
    > for much more than just types.


    Yes, I'm very much serious indeed. But I haven't done any unit testing. I'll
    have to check into that. Thanks!

    > My 2 cents.


    Thankfully recieved and collecting interest as we speak.

    Cheers!
    /Joel
     
    Joel Hedlund, Sep 1, 2006
    #9
  10. Joel Hedlund wrote:
    >> I'm not sure that trying to fight against the language is a sound
    >> approach, whatever the language.

    >
    > That's the very reason I posted in the first place. I feel like I'm
    > fighting the language, and since python at least to me seems to be so
    > well thought out in all other aspects, the most obvious conclusion must
    > be that I'm thinking about this the wrong way. And that's why I need
    > your input!


    The first thing I tried to do when I discovered Python (coming from
    statically typed langages) was to try to forcefit it into static typing.
    Then I realized that there was a whole lot of non-trivial Python apps
    and libs that did just work, which made me think about the real
    usefulness of static typing. Which is mainly optimisation hints for the
    machine. As you probably noticed, declarative static typing imposes much
    boilerplate and somewhat arbitrary restrictions, and I still wait for a
    proof that it leads to more robust programs - FWIW, MVHO is that it
    usually leads to more complex - hence potentially less robust - code.

    >>> > Or taken to the other extreme: Should I simply duck-type
    >>> everything, and
    >>> > only focus my validation efforts to external data (from users,
    >>> external
    >>> > applications and other forces of evil).

    >
    >> IMHO and according to my experience : 99% yes (there are few corner
    >> cases where it makes sens to ensure args correctness - which may or not
    >> imply type-checking). Packages like FormEncode are great for data
    >> conversion/validation. Once you have trusted data, the only possible
    >> problem is within your code.

    >
    > That approach is quite in line with the "blame yourself" methodology,
    > which seems to work in most other circumstances. Sort of like,
    > developers who feed bad data into my code have only themselves to blame!


    As long as your code is correctly documented, yes. All attempts to write
    idiot-proof librairy code as failed so far AFAICT, so just let idiots
    suffer from their idiocy and focus on providing good tools to normal
    programmers. My own philosophie of course...

    > I can dig that. :)
    >
    > Hmmm... So. I should build grimly paranoid parsers for external data,


    Most of the time, you'll find they already exists. FormEncode is not
    just for html forms - it's a general, powerful and flexible (but alas
    very badly documented) bidirectional data converter/validator.

    > use duck-typed interfaces everywhere on the inside,


    Talking about interfaces, you may want to have a look at PyProtocols
    (PEAK) and Zope3 Interfaces.

    > and simply callously
    > disregard developers who are disinclined to read documentation?


    As long as you provide a usable documentation, misuse of your code is
    not your problem anymore (unless of course you're the one misusing it !-).

    > I could
    > do that.
    >
    >> if you're really serious, unit tests is the way to go - they can check
    >> for much more than just types.

    >
    > Yes, I'm very much serious indeed. But I haven't done any unit testing.


    Then you probably want to read the relevant chapter in DiveIntoPython.

    HTH
    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    Bruno Desthuilliers, Sep 1, 2006
    #10
  11. Joel Hedlund

    Joel Hedlund Guest

    > I still wait for a
    > proof that it leads to more robust programs - FWIW, MVHO is that it
    > usually leads to more complex - hence potentially less robust - code.


    MVHO? I assume you are not talking about Miami Valley Housing Opportunities
    here, but bloat probably leads to bugs, yes.

    > Talking about interfaces, you may want to have a look at PyProtocols
    > (PEAK) and Zope3 Interfaces.


    Ooh. Neat.

    > As long as you provide a usable documentation, misuse of your code is
    > not your problem anymore (unless of course you're the one misusing it !-).


    But hey, then I'm still just letting idiots suffer from their idiocy, and
    since that's part of our greater plan anyway I guess that's ok :-D

    > Then you probably want to read the relevant chapter in DiveIntoPython.


    You are completely correct. Thanks for the tip.

    Thanks for your help! It's been real useful. Now I'll sleep better at night.

    Cheers!
    /Joel
     
    Joel Hedlund, Sep 1, 2006
    #11
  12. Joel Hedlund wrote:
    >> I still wait for a
    >> proof that it leads to more robust programs - FWIW, MVHO is that it
    >> usually leads to more complex - hence potentially less robust - code.

    >
    > MVHO? I assume you are not talking about Miami Valley Housing
    > Opportunities here,


    Nope --> My Very Humble Opinion

    --
    bruno desthuilliers
    python -c "print '@'.join(['.'.join([w[::-1] for w in p.split('.')]) for
    p in ''.split('@')])"
     
    Bruno Desthuilliers, Sep 1, 2006
    #12
  13. Joel Hedlund

    Paddy Guest

    Joel Hedlund wrote:
    >
    > Hmmm... So. I should build grimly paranoid parsers for external data, use
    > duck-typed interfaces everywhere on the inside, and simply callously
    > disregard developers who are disinclined to read documentation? I could do that.
    >
    > > if you're really serious, unit tests is the way to go - they can check
    > > for much more than just types.

    >
    > Yes, I'm very much serious indeed. But I haven't done any unit testing. I'll
    > have to check into that. Thanks!
    >


    You might try doctests, they can be easier to write and fit into the
    unit test framework if needed.
    http://en.wikipedia.org/wiki/Doctest

    - Paddy.
     
    Paddy, Sep 1, 2006
    #13
  14. Joel Hedlund

    Joel Hedlund Guest

    > You might try doctests, they can be easier to write and fit into the
    > unit test framework if needed.


    While I firmly believe in keeping docs up to date, I don't think that
    doctests alone can solve the problem of maintaining data integrity in
    projects with more comlex interfaces (which is what I really meant to
    talk about. Sorry if my simplified examples led you to believe
    otherwise). For simple, deterministic functions like math.pow I think
    it's great, but for something like BaseHTTPServer... probably not. The
    __doc__'s required would be truly fascinating to behold. And probably
    voluminous and mostly unreadable for humans. Or is there something that
    I've misunderstood?

    /Joel
     
    Joel Hedlund, Sep 1, 2006
    #14
  15. Joel Hedlund

    Paddy Guest

    Joel Hedlund wrote:
    > > You might try doctests, they can be easier to write and fit into the
    > > unit test framework if needed.

    >
    > While I firmly believe in keeping docs up to date, I don't think that
    > doctests alone can solve the problem of maintaining data integrity in
    > projects with more comlex interfaces (which is what I really meant to
    > talk about. Sorry if my simplified examples led you to believe
    > otherwise). For simple, deterministic functions like math.pow I think
    > it's great, but for something like BaseHTTPServer... probably not. The
    > __doc__'s required would be truly fascinating to behold. And probably
    > voluminous and mostly unreadable for humans. Or is there something that
    > I've misunderstood?
    >
    > /Joel


    Oh, I was just addressing your bit about not knowing unit tests.
    Doctests can be quicker to put together and have only a small learning
    curve.
    On the larger scale, I too advocate extensive checking of 'tainted'
    data from 'external' sources, then assuming 'clean' data is as expected
    and doing no explicit further data checks, after all, you've got to
    trust your development team/yourself.

    - Pad.
     
    Paddy, Sep 1, 2006
    #15
  16. Joel Hedlund

    Joel Hedlund Guest

    > Oh, I was just addressing your bit about not knowing unit tests.
    > Doctests can be quicker to put together and have only a small learning
    > curve.


    OK, I see what you mean. And you're right. I'm struggling mightily right
    now with trying to come up with sane unit tests for a bunch of
    generalized parser classes that I'm about to implement, and which are
    supposed to play nice with each other... Gah! But I'll get there
    eventually... :)

    > On the larger scale, I too advocate extensive checking of 'tainted'
    > data from 'external' sources, then assuming 'clean' data is as expected
    > and doing no explicit further data checks, after all, you've got to
    > trust your development team/yourself.


    Right.

    Thanks for helpful tips and insights, and for taking the time!

    Cheers!
    /Joel
     
    Joel Hedlund, Sep 2, 2006
    #16
  17. Joel Hedlund

    Paul Rubin Guest

    Bruno Desthuilliers <> writes:
    > I've rarely encoutered "silent" data corruption with Python - FWIW, I
    > once had such a problem, but with a lower-level statically typed
    > language (integer overflow), and I was a very newbie programmer by that
    > time. Usually, one *very quickly* notices when something goes wrong.


    The same thing can happen in Python, and the resulting bugs can be
    pretty subtle. I noticed the following example as the result of
    another thread, which was about how to sort an 85 gigabyte file.
    Try to put a slice interface on a file-based object and you can
    hit strange integer-overflow bugs once the file gets larger than 2GB:

    Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
    [GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print slice(0, 3**33)

    slice(0, 5559060566555523L, None) # OK ...

    So we expect slicing with large args to work properly. But then:

    >>> class A:

    ... def __getitem__(self, s):
    ... print s
    ...
    >>> a = A()
    >>> a[0:3**33]

    slice(0, 2147483647, None) # oops!!!!
    >>>
     
    Paul Rubin, Sep 3, 2006
    #17
  18. Paul Rubin a écrit :
    > Bruno Desthuilliers <> writes:
    >
    >>I've rarely encoutered "silent" data corruption with Python - FWIW, I
    >>once had such a problem, but with a lower-level statically typed
    >>language (integer overflow), and I was a very newbie programmer by that
    >>time. Usually, one *very quickly* notices when something goes wrong.

    >
    >
    > The same thing can happen in Python, and the resulting bugs can be
    > pretty subtle. I noticed the following example as the result of
    > another thread, which was about how to sort an 85 gigabyte file.
    > Try to put a slice interface on a file-based object and you can
    > hit strange integer-overflow bugs once the file gets larger than 2GB:
    >
    > Python 2.3.4 (#1, Feb 2 2005, 12:11:53)
    > [GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2
    > Type "help", "copyright", "credits" or "license" for more information.
    > >>> print slice(0, 3**33)

    > slice(0, 5559060566555523L, None) # OK ...
    >
    > So we expect slicing with large args to work properly. But then:
    >
    > >>> class A:

    > ... def __getitem__(self, s):
    > ... print s
    > ...
    > >>> a = A()
    > >>> a[0:3**33]

    > slice(0, 2147483647, None) # oops!!!!
    > >>>


    Looks like a Python bug, not a programmer error. And BTW, it doesn't
    happens with >=2.4.1

    Python 2.4.1 (#1, Jul 23 2005, 00:37:37)
    [GCC 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)] on
    linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print slice(0, 3**33)

    slice(0, 5559060566555523L, None)
    >>> class A(object):

    .... def __getitem__(self, s):
    .... print s
    ....
    >>> A()[0:3**33]

    slice(0, 5559060566555523L, None)
    >>>
     
    Bruno Desthuilliers, Sep 3, 2006
    #18
  19. Jean-Paul Calderone <> wrote:
    ...
    > > >>> class A(object):


    note that A is new-style...

    > >>> class x:


    ....while x is old-style.

    Here's a small script to explore the problem...:

    import sys

    class oldstyle:
    def __getitem__(self, index): print index,

    class newstyle(object, oldstyle): pass

    s = slice(0, 3**33)

    print sys.version[:5]
    print 'slice:', s
    print 'old:',
    oldstyle()
    oldstyle()[:3**33]
    print
    print 'new:',
    newstyle()
    newstyle()[:3**33]
    print

    Running this on 2.3.5, 2.4.3, 2.5c1, 2.6a0, the results are ALWAYS:

    2.5c1
    slice: slice(0, 5559060566555523L, None)
    old: slice(0, 5559060566555523L, None) slice(0, 2147483647, None)
    slice(None, 5559060566555523L, 1)
    new: slice(0, 5559060566555523L, None) slice(None, 5559060566555523L,
    None) slice(None, 5559060566555523L, 1)

    [[except for the version ID, of course, which changes across runs;-)]]

    So: no difference across Python releases -- bug systematically there
    when slicing oldstyle classes, but only when slicing them with
    NON-extended slice syntax (all is fine when slicing with extended syntax
    OR when passing a slice object directly; indeed, dis.dis shows that
    using extended syntax builds the slice then passes it, while slicing
    without a step uses the SLICE+2 opcode instead).

    If you add a (deprecated, I believe) __getslice__ method, you'll see the
    same bug appear in newstyle classes too (again, for non-extended slicing
    syntax only).

    A look at ceval.c shows that apply_slice (called by SLICE+2 &c) uses
    _PyEval_SliceIndex and PySequence_GetSlice if the LHO has sq_slice in
    tp_as_sequence, otherwise PySlice_New and PyObject_GetItem. And the
    relevant signature is...:

    _PyEval_SliceIndex(PyObject *v, Py_ssize_t *pi)

    (int instead of Py_ssize_t in older versions of Python), so of course
    the "detour" through this function MUST truncate the value (to 32 or 64
    bits depending on the platform).

    The reason the bug shows up in classic classes even without an explicit
    __getslice__ is of course that a classic class ``has all the slots''
    (from the C-level viewpoint;-) -- only way to allow the per-instance
    behavior of classic-instances...


    My inclination here would be to let the bug persist, just adding an
    explanation of it in the documentation about why one should NOT use
    classic classes and should NOT define __getslice__. Any fix might
    perhaps provoke wrong behavior in old programs that define and use
    __getslice__ and/or classic classes and "count" on the truncation; the
    workaround is easy (only use fully-supported features of the language,
    i.e. newstyle classes and __getitem__ for slicing). But I guess we can
    (and probably should) move this debate to python-dev;-).


    Alex
     
    Alex Martelli, Sep 3, 2006
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ken Varn
    Replies:
    2
    Views:
    657
    Ken Varn
    Jun 22, 2005
  2. Replies:
    3
    Views:
    536
    David Eppstein
    Sep 17, 2003
  3. Pierre Fortin

    args v. *args passed to: os.path.join()

    Pierre Fortin, Sep 18, 2004, in forum: Python
    Replies:
    2
    Views:
    745
    Pierre Fortin
    Sep 18, 2004
  4. er
    Replies:
    2
    Views:
    541
  5. Andrew Tomazos
    Replies:
    5
    Views:
    631
Loading...

Share This Page