Comparisons of incompatible types

Discussion in 'Python' started by TomF, Dec 6, 2010.

  1. TomF

    TomF Guest

    I'm aggravated by this behavior in python:

    x = "4"
    print x < 7 # prints False

    The issue, of course, is comparisons of incompatible types. In most
    languages this throws an error (in Perl the types are converted
    silently). In Python this comparison fails silently. The
    documentation says: "objects of different types *always* compare
    unequal, and are ordered consistently but arbitrarily."

    I can't imagine why this design decision was made. I've been bitten by
    this several times (reading data from a file and not converting the
    numbers before comparison). Can I get this to throw an error instead
    of failing silently?

    TomF, Dec 6, 2010
    1. Advertisements

  2. TomF

    Peter Otten Guest

    This change would break a lot of code, so it could not be made within the
    2.x series. However:

    Python 3.1.1+ (r311:74480, Nov 2 2009, 15:45:00)
    [GCC 4.4.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: unorderable types: str() < int()

    Peter Otten, Dec 6, 2010
    1. Advertisements

  3. TomF

    Tim Golden Guest

    Yes: switch to python 3 where this does throw an exception:

    Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52
    Type "help", "copyright", "credits" or "licenseTraceback (most recent call last):

    Tim Golden, Dec 6, 2010
  4. TomF

    TomF Guest

    Thanks. I was hoping there was something I could do for 2.x but I
    suppose this will have to do.

    But I'm mystified by your statement, "this change would break a lot of
    code". Given that the semantics are virtually random, how could code
    depend on this?

    TomF, Dec 6, 2010
  5. TomF

    Robert Kern Guest

    There are cases where you don't particularly care *what* order is given as long
    as it is consistent. Let's say you want to make sure that two lists have the
    same contents (which may mix types), but you don't care about the order. You
    could just sort each list and then compare the sorted lists. Before sets were
    added to the language, this was a fairly common approach.

    Robert Kern

    "I have come to believe that the whole world is an enigma, a harmless enigma
    that is made terrible by our own mad attempt to interpret it as though it had
    an underlying truth."
    -- Umberto Eco
    Robert Kern, Dec 6, 2010
  6. TomF

    Terry Reedy Guest

    And indeed, code like this that has not been updated does break in 3.x.
    to some people's annoyance. We really really cannot please everyone ;-).
    Terry Reedy, Dec 6, 2010
  7. TomF

    Mark Wooding Guest

    The problem is that there are too many useful properties that one might
    expect from comparison operators. For example, it's frequently nice to
    have a total ordering on all objects. For real numbers, it's nice that
    the ordering obey the usual ordered-field axioms; but the complex
    numbers don't have an ordering compatible with the field operators, and
    imposing a default ordering (e.g., degree-lexicographic) is probably
    asking for trouble.

    I agree that the Python 3 behaviour is an improvement, by the way.

    -- [mdw]
    Mark Wooding, Dec 6, 2010
  8. You've never needed to deal with an heterogeneous list?

    data = ["Fred", "Barney", 2, 1, None]

    Nevertheless, I agree that in hindsight, the ability to sort such lists
    is not as important as the consistency of comparisons.
    Steven D'Aprano, Dec 7, 2010
  9. TomF

    John Nagle Guest

    If you're thinking hard about this, I recommend viewing Alexander
    Stepanov's talk at Stanford last month:

    He makes the point that, for generic programs to work right, the
    basic operations must have certain well-defined semantics. Then
    the same algorithms will work right across a wide variety of

    This is consistent with Python's "duck typing", but inconsistent
    with the current semantics of some operators.

    For example, "+" as concatenation makes "+" non-commutative.
    In other words,

    a + b

    is not always equal to

    b + a

    which is not good.

    Important properties to have across all types:

    a + b == b + a

    Exactly one of

    a > b
    a = b
    a < b

    is true, or an type exception must be raised.

    The basic Boolean identities

    (a or b) == (b or a)
    not (a or b) == (not a) and (not b)
    not (not a) == a

    should all hold, or an type exception should be raised.
    With Python accepting both "True" and "1" as sort of
    equivalent, there are cases where those don't hold.

    John Nagle
    John Nagle, Dec 7, 2010
  10. TomF

    Carl Banks Guest

    Not once, ever.

    I think that feeling the need to sort non-homogenous lists is
    indictative of bad design.

    If the order of the items doesn't matter, then there must be some
    small bit of homogeneity to exploit to use as a sort criterion. In
    that case you should use key= parameter or DSU.

    Carl Banks
    Carl Banks, Dec 7, 2010
  11. TomF

    Carl Banks Guest

    For better or worse (and I say worse, but YMMV) "and" and "or" are not
    boolean operators in Python but special-form expressions that resemble
    boolean semantics in some instances, but not (as you mention above) in

    Likewise, the comparison operators <, >, >=, and <= aren't well-
    ordered; sets use these operators to indicate topological ordering.

    IMO having the operators adhere to defined properties would be a good
    thing. It would improve code reusability since the operators could be
    expected to act in consistent ways, but Python isn't that language.
    So you might as well use the operators for whatever seems like it
    works, + for concatenation, > for superset, and so on.

    Carl Banks
    Carl Banks, Dec 7, 2010
  12. TomF

    Mark Wooding Guest

    This isn't a disaster. You should check that the arguments define the
    necessary operations and obey the necessary axioms. Python is already
    dynamically typed: this kind of proof-obligation is already endemic in
    Python programming, so you've not lost anything significant.
    I think I probably agree with this. Concatenation yields a nonabelian
    monoid (usually with identity); `+' is pretty much universally an
    abelian group operator (exception: natural numbers, where it's used in
    an abelian monoid which extends to a group in a relatively obvious way).
    But then we'd need another operator symbol for concatenation.
    Nonnegative integers act on strings properly, but the action doesn't
    distribute over concatenation, which is also a shame. i.e.,

    n*(a + b) != n*a + n*b

    But it's a familiar notation, by no means peculiar to Python, and
    changing it would be difficult.
    This will get the numerical people screaming. Non-signalling NaNs are
    useful, and they don't obey these axioms.

    I think, more generally, that requiring a full total order (rather than
    either a preorder or a partial order) is unnecessarily proscriptive.
    Sorting only requires a preorder, for example, i.e., { (a, b) | a <= b
    <= a } is an equivalence relation, and the preorder naturally induces a
    total order on the equivalence classes. Topological sorting requires
    only a partial order, and makes good use of the notation. As an
    example, sets use `<=' to denote subsetness, which is well known to be a
    partial order.

    (I presume you weren't going to deny

    a <= b iff a < b or a == b


    a < b iff b > a

    because that really would be bad.)
    The first of these contradicts the axiom

    x => x or _|_ == x

    which is probably more useful. The last can't usefully be true since
    `not' is lossy. But I think that, for all values a, b,

    not (a or b) == not (b or a) == (not a) and (not b)
    not (not (not a)) == not a

    which is probably good enough. (The application of `not' applies a
    boolean coercion, which canonifies adequately.)

    -- [mdw]
    Mark Wooding, Dec 7, 2010
  13. TomF

    Mark Wooding Guest

    Here's a reason you might want to.

    You're given an object, and you want to compute a hash of it. (Maybe
    you want to see whether someone else's object is the same as yours, but
    don't want to disclose the actual object, say.) To hash it, you'll need
    to serialize it somehow. But here's a problem: objects like
    dictionaries and sets don't impose an ordering on their elements. For
    example, the set { 1, 'two' } is the same as the set { 'two', 1 } -- but
    iterating the two might well yield the elements in a different order.
    (The internal details of a hash table tend to reflect the history of
    operations on the hash table as well as its current contents.)

    The obvious answer is to apply a canonical ordering to unordered objects
    like sets and dictionaries. A set can be serialized with its elements
    in ascending order; a dictionary can be serialized as key/value pairs
    with the keys in ascending order. But to do this, you need an
    (arbitrary, total) order on all objects which might be set elements or
    dictionary keys. The order also needs to be dependent only on the
    objects' serializable values, and not on any incidental facts such as
    memory addresses or whatever.

    -- [mdw]
    Mark Wooding, Dec 8, 2010
  14. TomF

    Paul Rubin Guest

    what about dictionaries?
    Paul Rubin, Dec 8, 2010
  15. TomF

    BartC Guest

    Using a simple "<" comparison, perhaps. But can't a list be sorted by other
    criteria? For example, by comparing the string representations of each

    So some sorts will make sense, and others (such as "<" or ">") won't.
    BartC, Dec 8, 2010
  16. TomF

    TomF Guest

    I have no argument that there might be an extra-logical use for such an
    ordering which you might find convenient. This is the point you're
    making. sort() and sorted() both take a cmp argument for this sort of
    thing. My complaint is with Python adopting nonsensical semantics
    ("shoe" < 7) to accomodate it.

    By analogy, I often find it convenient to have division by zero return
    0 to the caller for use in calculations. But if Python defined 0/0==0
    I'd consider it broken.

    TomF, Dec 8, 2010
  17. Or a list that contains unhashable objects.

    Or a list that needs to be presented to a human reader in some arbitrary
    but consistent order.

    Or a doctest that needs to show the keys in a dict:
    ['ham', 'spam', 42, None]

    (although that case is probably the weakest of the three).

    Agreed, but in hindsight I think it would be better if there was a
    separate lexicographic sort function, that guaranteed to sort anything
    (including such unorderable values as complex numbers!), without relying
    on the vagaries of the standard comparison operators.

    Or at least anything printable, in which case sorted() with a key
    function of lambda obj: (repr(type(obj)), repr(obj)) might work, I

    Then at least we could limit our arguments to how this hypothetical
    lexicographic sort function was broken, instead of how all comparison
    operators are broken :)
    Steven D'Aprano, Dec 8, 2010
  18. But they already work right across a wide variety of objects, so long as
    you limit yourself to the subset of objects where the basic operations
    have the same semantics.

    I think that insisting that all operators must always have the same
    semantics is as impractical and unnecessary as insisting that all
    functions and methods with the same name must always have the same
    semantics. We wouldn't expect


    to all have the same semantics, or


    despite the inconvenience it makes to duck-typing. Why should we expect
    more from operators than we expect from named functions, when there are
    so many more named functions and so few useful symbols for operators?

    To my mind, it is foolish for us to expect x*y to always have the same
    semantics when even mathematicians don't expect that. In pure
    mathematics, x*y != y*x for any of the following:


    and probably many others I don't know about.

    No, it only makes + non-commutative for those types where + is non-

    I don't see why. It seems to me that it's only a bad thing if you hope to
    reason about the meaning of a+b without knowing what a and b actually are.

    Personally, I don't consider that a particularly useful trait.

    As Mark Wooding has already pointed out, that would make numeric
    programmers mad, as it eliminates NANs, which are far more important to
    them. And me.

    It also would make it impossible to use > and < to talk about rankings in
    natural hierarchies, such as (say) pecking orders. Using > to mean "out-
    ranks", you might have a pecking order among five hens like this:

    A > B > C > D > E


    D > B

    Not all comparisons are equivalence relations, and it would be a crying
    shame to lose the ability to use > and < to discuss (e.g.) non-transitive
    Steven D'Aprano, Dec 8, 2010
  19. TomF

    Tim Chase Guest

    wouldn't that be something like

    sorted(mixedstuff, key=str)

    or if all you need is a stable order regardless of what that
    order is, one could even get away with:

    sorted(mixedstuff, key=id)

    Tim Chase, Dec 8, 2010
  20. TomF

    John Nagle Guest

    As a sometime numerical person, I've been screaming at this from
    the other side. The problem with comparing non-signalling NaNs is that
    eventually, the program has to make a control flow decision, and it
    may not make it correctly.

    I used to do dynamic simulation engines for animation. I was
    probably the first person to get ragdoll physics to work right,
    back in 1996-1997. In hard collisions, the program would get
    floating point overflows, and I had to abort the interation, back
    up, cut the time step down, and go forward again, until the time
    step was small enough to allow stable integration. This was
    under Windows on x86, where it's possible, in a Windows-dependent
    way, to catch signalling NaNs and turn the hardware exception into
    a C++ exception. If the computation just plowed ahead with
    non-signalling NaNs, with a check at the end, it could go wrong
    and produce bad results, because incorrect branches would be taken
    and the final bogus results might not contain NaNs.

    I personally think that comparing NaN with numbers or other
    NaNs should raise an exception. There's no valid result for
    such comparisons.

    John Nagle
    John Nagle, Dec 8, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.