Comparisons of incompatible types

T

TomF

I'm aggravated by this behavior in python:

x = "4"
print x < 7 # prints False

The issue, of course, is comparisons of incompatible types. In most
languages this throws an error (in Perl the types are converted
silently). In Python this comparison fails silently. The
documentation says: "objects of different types *always* compare
unequal, and are ordered consistently but arbitrarily."

I can't imagine why this design decision was made. I've been bitten by
this several times (reading data from a file and not converting the
numbers before comparison). Can I get this to throw an error instead
of failing silently?

Thanks,
-Tom
 
P

Peter Otten

TomF said:
I'm aggravated by this behavior in python:

x = "4"
print x < 7 # prints False

The issue, of course, is comparisons of incompatible types. In most
languages this throws an error (in Perl the types are converted
silently). In Python this comparison fails silently. The
documentation says: "objects of different types *always* compare
unequal, and are ordered consistently but arbitrarily."

I can't imagine why this design decision was made. I've been bitten by
this several times (reading data from a file and not converting the
numbers before comparison). Can I get this to throw an error instead
of failing silently?

This change would break a lot of code, so it could not be made within the
2.x series. However:

Python 3.1.1+ (r311:74480, Nov 2 2009, 15:45:00)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()

Peter
 
T

Tim Golden

I'm aggravated by this behavior in python:

x = "4"
print x < 7 # prints False

The issue, of course, is comparisons of incompatible types. In most
languages this throws an error (in Perl the types are converted
silently). In Python this comparison fails silently. The documentation
says: "objects of different types *always* compare unequal, and are
ordered consistently but arbitrarily."

I can't imagine why this design decision was made. I've been bitten by
this several times (reading data from a file and not converting the
numbers before comparison). Can I get this to throw an error instead of
failing silently?

Yes: switch to python 3 where this does throw an exception:

Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52
Type "help", "copyright", "credits" or "licenseTraceback (most recent call last):

TJG
 
T

TomF

TomF said:
I'm aggravated by this behavior in python:

x = "4"
print x < 7 # prints False

The issue, of course, is comparisons of incompatible types. In most
languages this throws an error (in Perl the types are converted
silently). In Python this comparison fails silently. The
documentation says: "objects of different types *always* compare
unequal, and are ordered consistently but arbitrarily."

I can't imagine why this design decision was made. I've been bitten by
this several times (reading data from a file and not converting the
numbers before comparison). Can I get this to throw an error instead
of failing silently?

This change would break a lot of code, so it could not be made within the
2.x series. However:

Python 3.1.1+ (r311:74480, Nov 2 2009, 15:45:00)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()

Thanks. I was hoping there was something I could do for 2.x but I
suppose this will have to do.

But I'm mystified by your statement, "this change would break a lot of
code". Given that the semantics are virtually random, how could code
depend on this?

-Tom
 
R

Robert Kern

On 2010-12-06 09:04:00 -0800, Peter Otten said:
This change would break a lot of code, so it could not be made within the
2.x series. However:

Python 3.1.1+ (r311:74480, Nov 2 2009, 15:45:00)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()

Thanks. I was hoping there was something I could do for 2.x but I suppose this
will have to do.

But I'm mystified by your statement, "this change would break a lot of code".
Given that the semantics are virtually random, how could code depend on this?

There are cases where you don't particularly care *what* order is given as long
as it is consistent. Let's say you want to make sure that two lists have the
same contents (which may mix types), but you don't care about the order. You
could just sort each list and then compare the sorted lists. Before sets were
added to the language, this was a fairly common approach.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
T

Terry Reedy

On 12/6/10 11:16 AM, TomF wrote:

There are cases where you don't particularly care *what* order is given
as long as it is consistent. Let's say you want to make sure that two
lists have the same contents (which may mix types), but you don't care
about the order. You could just sort each list and then compare the
sorted lists. Before sets were added to the language, this was a fairly
common approach.

And indeed, code like this that has not been updated does break in 3.x.
to some people's annoyance. We really really cannot please everyone ;-).
 
M

Mark Wooding

Terry Reedy said:
And indeed, code like this that has not been updated does break in
3.x. to some people's annoyance. We really really cannot please
everyone ;-).

The problem is that there are too many useful properties that one might
expect from comparison operators. For example, it's frequently nice to
have a total ordering on all objects. For real numbers, it's nice that
the ordering obey the usual ordered-field axioms; but the complex
numbers don't have an ordering compatible with the field operators, and
imposing a default ordering (e.g., degree-lexicographic) is probably
asking for trouble.

I agree that the Python 3 behaviour is an improvement, by the way.

-- [mdw]
 
S

Steven D'Aprano

I'm aggravated by this behavior in python:

x = "4"
print x < 7 # prints False
I can't imagine why this design decision was made.

You've never needed to deal with an heterogeneous list?

data = ["Fred", "Barney", 2, 1, None]
data.sort()

Nevertheless, I agree that in hindsight, the ability to sort such lists
is not as important as the consistency of comparisons.
 
J

John Nagle

I'm aggravated by this behavior in python:

x = "4"
print x< 7 # prints False
I can't imagine why this design decision was made.

You've never needed to deal with an heterogeneous list?

data = ["Fred", "Barney", 2, 1, None]
data.sort()

Nevertheless, I agree that in hindsight, the ability to sort such lists
is not as important as the consistency of comparisons.

If you're thinking hard about this, I recommend viewing Alexander
Stepanov's talk at Stanford last month:

http://www.stanford.edu/class/ee380/Abstracts/101103.html

He makes the point that, for generic programs to work right, the
basic operations must have certain well-defined semantics. Then
the same algorithms will work right across a wide variety of
objects.

This is consistent with Python's "duck typing", but inconsistent
with the current semantics of some operators.

For example, "+" as concatenation makes "+" non-commutative.
In other words,

a + b

is not always equal to

b + a

which is not good.

Important properties to have across all types:

a + b == b + a

Exactly one of

a > b
a = b
a < b

is true, or an type exception must be raised.

The basic Boolean identities

(a or b) == (b or a)
not (a or b) == (not a) and (not b)
not (not a) == a

should all hold, or an type exception should be raised.
With Python accepting both "True" and "1" as sort of
equivalent, there are cases where those don't hold.

John Nagle
 
C

Carl Banks

I'm aggravated by this behavior in python:
x = "4"
print x < 7    # prints False
I can't imagine why this design decision was made.

You've never needed to deal with an heterogeneous list?

data = ["Fred", "Barney", 2, 1, None]
data.sort()

Not once, ever.

Nevertheless, I agree that in hindsight, the ability to sort such lists
is not as important as the consistency of comparisons.

I think that feeling the need to sort non-homogenous lists is
indictative of bad design.

If the order of the items doesn't matter, then there must be some
small bit of homogeneity to exploit to use as a sort criterion. In
that case you should use key= parameter or DSU.


Carl Banks
 
C

Carl Banks

The basic Boolean identities

        (a or b) == (b or a)
        not (a or b) == (not a) and (not b)
        not (not a) == a

should all hold, or an type exception should be raised.
With Python accepting both "True" and "1" as sort of
equivalent, there are cases where those don't hold.

For better or worse (and I say worse, but YMMV) "and" and "or" are not
boolean operators in Python but special-form expressions that resemble
boolean semantics in some instances, but not (as you mention above) in
others.

Likewise, the comparison operators <, >, >=, and <= aren't well-
ordered; sets use these operators to indicate topological ordering.

IMO having the operators adhere to defined properties would be a good
thing. It would improve code reusability since the operators could be
expected to act in consistent ways, but Python isn't that language.
So you might as well use the operators for whatever seems like it
works, + for concatenation, > for superset, and so on.


Carl Banks
 
M

Mark Wooding

[Stepanov]
makes the point that, for generic programs to work right, the basic
operations must have certain well-defined semantics. Then the same
algorithms will work right across a wide variety of objects.

This is consistent with Python's "duck typing", but inconsistent with
the current semantics of some operators.

This isn't a disaster. You should check that the arguments define the
necessary operations and obey the necessary axioms. Python is already
dynamically typed: this kind of proof-obligation is already endemic in
Python programming, so you've not lost anything significant.
For example, "+" as concatenation makes "+" non-commutative. In other
words,

a + b

is not always equal to

b + a

which is not good.

I think I probably agree with this. Concatenation yields a nonabelian
monoid (usually with identity); `+' is pretty much universally an
abelian group operator (exception: natural numbers, where it's used in
an abelian monoid which extends to a group in a relatively obvious way).
But then we'd need another operator symbol for concatenation.
Nonnegative integers act on strings properly, but the action doesn't
distribute over concatenation, which is also a shame. i.e.,

n*(a + b) != n*a + n*b

But it's a familiar notation, by no means peculiar to Python, and
changing it would be difficult.
Exactly one of

a > b
a = b
a < b

is true, or an type exception must be raised.

This will get the numerical people screaming. Non-signalling NaNs are
useful, and they don't obey these axioms.

I think, more generally, that requiring a full total order (rather than
either a preorder or a partial order) is unnecessarily proscriptive.
Sorting only requires a preorder, for example, i.e., { (a, b) | a <= b
<= a } is an equivalence relation, and the preorder naturally induces a
total order on the equivalence classes. Topological sorting requires
only a partial order, and makes good use of the notation. As an
example, sets use `<=' to denote subsetness, which is well known to be a
partial order.

(I presume you weren't going to deny

a <= b iff a < b or a == b

or

a < b iff b > a

because that really would be bad.)
The basic Boolean identities

(a or b) == (b or a)
not (a or b) == (not a) and (not b)
not (not a) == a

should all hold, or an type exception should be raised.

The first of these contradicts the axiom

x => x or _|_ == x

which is probably more useful. The last can't usefully be true since
`not' is lossy. But I think that, for all values a, b,

not (a or b) == not (b or a) == (not a) and (not b)
not (not (not a)) == not a

which is probably good enough. (The application of `not' applies a
boolean coercion, which canonifies adequately.)

-- [mdw]
 
M

Mark Wooding

Carl Banks said:
I think that feeling the need to sort non-homogenous lists is
indictative of bad design.

Here's a reason you might want to.

You're given an object, and you want to compute a hash of it. (Maybe
you want to see whether someone else's object is the same as yours, but
don't want to disclose the actual object, say.) To hash it, you'll need
to serialize it somehow. But here's a problem: objects like
dictionaries and sets don't impose an ordering on their elements. For
example, the set { 1, 'two' } is the same as the set { 'two', 1 } -- but
iterating the two might well yield the elements in a different order.
(The internal details of a hash table tend to reflect the history of
operations on the hash table as well as its current contents.)

The obvious answer is to apply a canonical ordering to unordered objects
like sets and dictionaries. A set can be serialized with its elements
in ascending order; a dictionary can be serialized as key/value pairs
with the keys in ascending order. But to do this, you need an
(arbitrary, total) order on all objects which might be set elements or
dictionary keys. The order also needs to be dependent only on the
objects' serializable values, and not on any incidental facts such as
memory addresses or whatever.

-- [mdw]
 
B

BartC

Carl Banks said:
I'm aggravated by this behavior in python:
x = "4"
print x < 7 # prints False
I can't imagine why this design decision was made.

You've never needed to deal with an heterogeneous list?

data = ["Fred", "Barney", 2, 1, None]
data.sort()

Not once, ever.

Nevertheless, I agree that in hindsight, the ability to sort such lists
is not as important as the consistency of comparisons.

I think that feeling the need to sort non-homogenous lists is
indictative of bad design.

Using a simple "<" comparison, perhaps. But can't a list be sorted by other
criteria? For example, by comparing the string representations of each
element.

So some sorts will make sense, and others (such as "<" or ">") won't.
 
T

TomF

Here's a reason you might want to.

You're given an object, and you want to compute a hash of it. (Maybe
you want to see whether someone else's object is the same as yours, but
don't want to disclose the actual object, say.) To hash it, you'll need
to serialize it somehow. But here's a problem: objects like
dictionaries and sets don't impose an ordering on their elements. For
example, the set { 1, 'two' } is the same as the set { 'two', 1 } -- but
iterating the two might well yield the elements in a different order.
(The internal details of a hash table tend to reflect the history of
operations on the hash table as well as its current contents.)

The obvious answer is to apply a canonical ordering to unordered objects
like sets and dictionaries. A set can be serialized with its elements
in ascending order; a dictionary can be serialized as key/value pairs
with the keys in ascending order. But to do this, you need an
(arbitrary, total) order on all objects which might be set elements or
dictionary keys. The order also needs to be dependent only on the
objects' serializable values, and not on any incidental facts such as
memory addresses or whatever.

I have no argument that there might be an extra-logical use for such an
ordering which you might find convenient. This is the point you're
making. sort() and sorted() both take a cmp argument for this sort of
thing. My complaint is with Python adopting nonsensical semantics
("shoe" < 7) to accomodate it.

By analogy, I often find it convenient to have division by zero return
0 to the caller for use in calculations. But if Python defined 0/0==0
I'd consider it broken.

-Tom
 
S

Steven D'Aprano

It can also be indicative of code written for a Python that doesn't have
sets.

Or a list that contains unhashable objects.

Or a list that needs to be presented to a human reader in some arbitrary
but consistent order.

Or a doctest that needs to show the keys in a dict:
['ham', 'spam', 42, None]

(although that case is probably the weakest of the three).


So there's no design error in wanting heterogenerous sequences to sort;
it can be quite Pythonic (until the advent of the ‘set’ type).

Agreed, but in hindsight I think it would be better if there was a
separate lexicographic sort function, that guaranteed to sort anything
(including such unorderable values as complex numbers!), without relying
on the vagaries of the standard comparison operators.

Or at least anything printable, in which case sorted() with a key
function of lambda obj: (repr(type(obj)), repr(obj)) might work, I
suppose...

Then at least we could limit our arguments to how this hypothetical
lexicographic sort function was broken, instead of how all comparison
operators are broken :)
 
S

Steven D'Aprano

If you're thinking hard about this, I recommend viewing Alexander
Stepanov's talk at Stanford last month:

http://www.stanford.edu/class/ee380/Abstracts/101103.html

He makes the point that, for generic programs to work right, the basic
operations must have certain well-defined semantics. Then the same
algorithms will work right across a wide variety of objects.

But they already work right across a wide variety of objects, so long as
you limit yourself to the subset of objects where the basic operations
have the same semantics.

I think that insisting that all operators must always have the same
semantics is as impractical and unnecessary as insisting that all
functions and methods with the same name must always have the same
semantics. We wouldn't expect

pencil.draw
six_shooter.draw
game.draw

to all have the same semantics, or

math.sin
priest.sin

despite the inconvenience it makes to duck-typing. Why should we expect
more from operators than we expect from named functions, when there are
so many more named functions and so few useful symbols for operators?

To my mind, it is foolish for us to expect x*y to always have the same
semantics when even mathematicians don't expect that. In pure
mathematics, x*y != y*x for any of the following:

matrices
quaternions
octonions

and probably many others I don't know about.

This is consistent with Python's "duck typing", but inconsistent
with the current semantics of some operators.

For example, "+" as concatenation makes "+" non-commutative.

No, it only makes + non-commutative for those types where + is non-
commutative.

In other words,

a + b

is not always equal to

b + a

which is not good.

I don't see why. It seems to me that it's only a bad thing if you hope to
reason about the meaning of a+b without knowing what a and b actually are.

Personally, I don't consider that a particularly useful trait.

Important properties to have across all types:

a + b == b + a

Exactly one of

a > b
a = b
a < b

is true, or an type exception must be raised.

As Mark Wooding has already pointed out, that would make numeric
programmers mad, as it eliminates NANs, which are far more important to
them. And me.

It also would make it impossible to use > and < to talk about rankings in
natural hierarchies, such as (say) pecking orders. Using > to mean "out-
ranks", you might have a pecking order among five hens like this:

A > B > C > D > E

but

D > B

Not all comparisons are equivalence relations, and it would be a crying
shame to lose the ability to use > and < to discuss (e.g.) non-transitive
comparisons.
 
T

Tim Chase

Or a list that needs to be presented to a human reader in some arbitrary
but consistent order.

Or a doctest that needs to show the keys in a dict:
['ham', 'spam', 42, None]
[snip]

Agreed, but in hindsight I think it would be better if there was a
separate lexicographic sort function, that guaranteed to sort anything
(including such unorderable values as complex numbers!), without relying
on the vagaries of the standard comparison operators.

wouldn't that be something like

sorted(mixedstuff, key=str)

or if all you need is a stable order regardless of what that
order is, one could even get away with:

sorted(mixedstuff, key=id)

-tkc
 
J

John Nagle

This will get the numerical people screaming. Non-signalling NaNs are
useful, and they don't obey these axioms.

As a sometime numerical person, I've been screaming at this from
the other side. The problem with comparing non-signalling NaNs is that
eventually, the program has to make a control flow decision, and it
may not make it correctly.

I used to do dynamic simulation engines for animation. I was
probably the first person to get ragdoll physics to work right,
back in 1996-1997. In hard collisions, the program would get
floating point overflows, and I had to abort the interation, back
up, cut the time step down, and go forward again, until the time
step was small enough to allow stable integration. This was
under Windows on x86, where it's possible, in a Windows-dependent
way, to catch signalling NaNs and turn the hardware exception into
a C++ exception. If the computation just plowed ahead with
non-signalling NaNs, with a check at the end, it could go wrong
and produce bad results, because incorrect branches would be taken
and the final bogus results might not contain NaNs.

I personally think that comparing NaN with numbers or other
NaNs should raise an exception. There's no valid result for
such comparisons.

John Nagle
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top