Python less error-prone than Java

  • Thread starter Christoph Zwerschke
  • Start date
C

Christoph Zwerschke

You will often hear that for reasons of fault minimization, you should
use a programming language with strict typing:
http://turing.une.edu.au/~comp284/Lectures/Lecture_18/lecture/node1.html

I just came across a funny example in which the opposite is the case.

The following is a binary search algorithm in Java. It searches a value
in an ordered array a of ints:

public static int binarySearch(int[] a, int key) {
int low = 0;
int high = a.length - 1;
while (low <= high) {
int mid = (low + high) / 2;
int midVal = a[mid];
if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else
return mid; // key found
}
return -(low + 1); // key not found.
}

Now the same thing, directly converted to Python:

def binarySearch(a, key):
low = 0
high = len(a) - 1
while low <= high:
mid = (low + high) / 2
midVal = a[mid]
if midVal < key:
low = mid + 1
elif midVal > key:
high = mid - 1;
else:
return mid # key found
return -(low + 1) # key not found.

What's better about the Python version? First, it will operate on *any*
sorted array, no matter which type the values have.

But second, there is a hidden error in the Java version that the Python
version does not have.

See the following web page if you dont find it ;-)
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html

-- Christoph
 
C

Cameron Laird

You will often hear that for reasons of fault minimization, you should
use a programming language with strict typing:
http://turing.une.edu.au/~comp284/Lectures/Lecture_18/lecture/node1.html

I just came across a funny example in which the opposite is the case. .
.
.
What's better about the Python version? First, it will operate on *any*
sorted array, no matter which type the values have.

But second, there is a hidden error in the Java version that the Python
version does not have.

See the following web page if you dont find it ;-)
http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
.
.
.
This is all worth saying, that is, I agree with the conclusions.

The premises are arguable, though. For me, this example illustrates
the difficulty faced by people who hear, "strict typing", and think
of Java.

At another level, Python's superiority here is epiphenomenal. Python
probably has a better model for arithmetic than Java, but BDFL knows
that Python is not without its own flaws, particulary in arithmetic.

So, here's my summary: Python's a nice language--a very nice one.
It's safer to use than Java in many ways. Python's typing is
STRICTER than Java's, but it's also dynamic, so people get to argue
for decades about which is a better model. Anyone who thinks typing
is a first-order determinant of code quality is making a big mistake
though, anyway.
 
S

Simon Percivall

Actually, you're wrong on all levels.

First: It's perfectly simple in Java to create a binary sort that sorts
all arrays that contain objects; so wrong there.

Secondly: The bug has nothing to do with static typing (I'm guessing
that's what you meant. Both Python and Java are strongly typed). The
problem is that ints are bounded in Java. They could easily have been
ints and then automatically coerced to (equivalent to) longs when they
got bigger; that they aren't is more a design fault than anything to do
with static typing. The equivalent in Python would have been if an
overflow exception was raised when the int got too big. It might have
been that way, typing or no typing.
 
A

Alex Martelli

Simon Percivall said:
with static typing. The equivalent in Python would have been if an
overflow exception was raised when the int got too big. It might have
been that way, typing or no typing.

Indeed, it _used_ to be that way --
<http://docs.python.org/lib/module-exceptions.html> STILL says...:

exception OverflowError

Raised when the result of an arithmetic operation is too large to be
represented. This cannot occur for long integers (which would rather
raise MemoryError than give up). Because of the lack of standardization
of floating point exception handling in C, most floating point
operations also aren't checked. For plain integers, all operations that
can overflow are checked except left shift, where typical applications
prefer to drop bits than raise an exception.


Actually, the docs are obsolete on this point, and an int becomes a long
when that's necessary:
2147483648L

but, this operation _would_ have raised OverflowError in old-enough
versions of Python (not sure exactly when the switch happened...).


Alex
 
C

Christoph Zwerschke

Simon said:
> First: It's perfectly simple in Java to create a binary sort that
> sorts all arrays that contain objects; so wrong there.

My point was that the *same* Java source example, directly converted to
Python would *automatically* accept all kinds of arrays. No need to make
any extra efforts. By the way, how would you do it in Java? With
function overloading? I would not call that perfectly simple.
> Secondly: The bug has nothing to do with static typing (I'm guessing
> that's what you meant. Both Python and Java are strongly typed). The
> problem is that ints are bounded in Java. They could easily have been
> ints and then automatically coerced to (equivalent to) longs when they
> got bigger; that they aren't is more a design fault than anything to
> do with static typing. The equivalent in Python would have been if an
> overflow exception was raised when the int got too big. It might have
> been that way, typing or no typing.

Yes, sorry, I meant static typing, not strict typing. But I still do
think that the bug has to do with static typing. You're right, the
direct cause is that ints are bounded in Java, and not bounded in
Python, and that it could well be the other way round. However, doing it
the other way round would not be so clever and appropriate for the
respective language due to the difference in static typing.

Java could coerce the result to long, but then it would still crash when
the result is stored back to the statically typed variable. So that
would not be very clever.

And Python could produce an overflow error (and did in the past), but
taking advantage of the possibilities of dynamic typing and
automatically producing longs is a cleverer solution for Python, and
that's why it was proposed and accepted in PEP237.

So the difference in static typing is actually the deeper reason why
ints were made to behave differently in the two languages.

-- Christoph
 
C

Christoph Zwerschke

Cameron said:
So, here's my summary: Python's a nice language--a very nice one.
It's safer to use than Java in many ways. Python's typing is
STRICTER than Java's, but it's also dynamic, so people get to argue
for decades about which is a better model. Anyone who thinks typing
is a first-order determinant of code quality is making a big mistake
though, anyway.

Yes, sorry. It has nothing to do with strict, but with static typing.
And I should not have chosen such a general subject line (I just meant
to be funny, but sounded more like a troll). I had just noticed that the
direct translation of that Java program to Python would not have that
subtle bug and found that this was worth mentioning.

-- Christoph
 
A

Alan Morgan

My point was that the *same* Java source example, directly converted to
Python would *automatically* accept all kinds of arrays.

And the same code converted to SML would automatically work on all
kinds of arrays and SML is statically typed. It's a language issue,
not a typing issue.
No need to make any extra efforts.
By the way, how would you do it in Java? With
function overloading? I would not call that perfectly simple.

Since Java doesn't allow function overloading that clearly can't be
the way. J2SE 5.0 allows generic classes and functions that operate
on generic containers. There are some gotchas, but it's not drastically
more complex than the original int-only java code.

Alan
 
N

Neil Hodgson

Alan said:
Since Java doesn't allow function overloading that clearly can't be
the way. J2SE 5.0 allows generic classes and functions that operate
on generic containers. There are some gotchas, but it's not drastically
more complex than the original int-only java code.

Doesn't Java restrict generics to only operate on reference types so
you can't produce a generic binary search that operates on arrays where
the item type may be int?

Neil
 
A

Alan Morgan

Doesn't Java restrict generics to only operate on reference types so
you can't produce a generic binary search that operates on arrays where
the item type may be int?

Yup, you have to wrap int (and double and float and...). Blame type
erasure.

Alan
 
I

Ilpo =?iso-8859-1?Q?Nyyss=F6nen?=

Christoph Zwerschke said:
What's better about the Python version? First, it will operate on
*any* sorted array, no matter which type the values have.

But second, there is a hidden error in the Java version that the
Python version does not have.

While I can see your point, I'd say you are totally in the wrong level
here.

With Java generics you can sort a list and still keeping the type of
the contents defined. This is makes the code less error-prone. But why
would you implement binary search as the standard library already has
it for both arrays and lists? This is one big thing that makes code
less error-prone: using existing well made libraries. You can find
binary search from python standard library too (but actually the API
in Java is a bit better, see the return values).

Well, you can say that the binary search is a good example and in real
code you would use the stuff from the libraries. I'd say it is not
good example: How often will you write such algorithms? Very rarely.

Integer overflows generally are not those errors you run into in
programs. The errors happening most often are from my point of view:

1. null pointer errors
2. wrong type (class cast in Java, some weird missing attribute in python)
3. array/list index out of bounds

First and third ones are the same in about every language. The second
one is one where the typing can make a difference. If in the code
level you know the type all the way, there is much less changes of it
being wrong. (The sad thing in the Java generics is that it is a
compile time only thing and that causes some really weird stuff, but
that is too off topic to here.)

In python passing sequences for a function and also from a function is
very easy. You can very easily pass a sequence as argument list. You
can also very easily return a sequence from the function and even
split it to variables directly. This is very powerful tool, but it has
a problem too: How can you change what you return without breaking the
callers? There are many cases where passing an object instead of a
sequence makes the code much easier to develop further.

What's the point? The point is that neither with Java or Python you
want to be doing things in the low level. You really want to be doing
stuff with objects and using existing libraries as much as possible.
And in that level Java might be less error-prone as it does restrict
the ways you can shoot yourself more.
 
K

Kaz Kylheku

Christoph said:
You will often hear that for reasons of fault minimization, you should
use a programming language with strict typing:
http://turing.une.edu.au/~comp284/Lectures/Lecture_18/lecture/node1.html

Quoting from that web page:

"A programming language with strict typing and run-time checking should
be used."

This doesn't prescribe latent or manifest typing, only that there be
type checking.

There is no question that for reliability, it is necessary to have type
checking, whether at run time or earlier.

You can have statically typed languages with inadequate type safety,
and you can have dynamically typed languages with inadequate type
safety.
Now the same thing, directly converted to Python:

def binarySearch(a, key):
low = 0
high = len(a) - 1
while low <= high:
mid = (low + high) / 2
midVal = a[mid]
if midVal < key:
low = mid + 1
elif midVal > key:
high = mid - 1;
else:
return mid # key found
return -(low + 1) # key not found.

What's better about the Python version? First, it will operate on *any*
sorted array, no matter which type the values have.

Uh huh! With hard-coded < and = operators, how stupid. What if you want
to use it on strings?

Would that be a case-insensitive lexicographic comparison, or
case-insensitive? How do you specify what kind of less-than and equal
you want to do?

-1 to indicate not found? Why copy Java braindamage induced by an
antiquated form of static typing? The Java version has to do that
because the return value is necessarily declared to be of type integer.


;; Common Lisp
;; Binary search any sorted sequence SEQ for ITEM, returning
;; the position (starting from zero) if the item is found,
;; otherwise returns NIL.
;;
;; :REF specifies positional accessing function, default is ELT
;; :LEN specifies function for retrieving sequence length
;; :LESS specifies function for less-than item comparison
;; :SAME specifies function for equality comparison

(defun binary-search (seq item
&key (ref #'elt) (len #'length)
(less #'<) (same #'=))
(loop with low = 0
and high = (funcall len seq)
while (<= low high)
do
(let* ((mid (truncate (+ low high) 2))
(mid-val (funcall ref seq mid)))
(cond
((funcall less mid-val item)
(setf low (1+ mid)))
((funcall same mid-val item)
(return mid))
(t (setf high (1- mid)))))))

Common Lisp integers are "mathematical", so the overflow problem
described in your referenced article doesn't exist here.
 
P

Peter Otten

Kaz said:
Would that be a case-insensitive lexicographic comparison, or
case-insensitive? How do you specify what kind of less-than and equal
you want to do?

class Key(object):
def __init__(self, value, key):
self.keyval = key(value)
self.key = key
def __lt__(self, other):
return self.keyval < self.key(other)
def __gt__(self, other):
return self.keyval > self.key(other)

items = ["Alpha", "Beta", "Delta", "Gamma"]
print binarySearch(items, Key("DELTA", str.lower)) # 2

You /can/ teach an old duck new tricks :)

Peter
 
K

Kaz Kylheku

Ilpo said:
This is one big thing that makes code
less error-prone: using existing well made libraries.
You can find binary search from python standard library too (but actually the API
in Java is a bit better, see the return values).
Well, you can say that the binary search is a good example and in real
code you would use the stuff from the libraries.

The trouble with your point is that Christoph's original posting refers
to an article, which, in turn, at the bottom, refers to a bug database
which shows that the very same defect had been found in Sun's Java
library!

Buggy library code is what prompted that article.
I'd say it is not
good example: How often will you write such algorithms? Very rarely.

Integer overflows generally are not those errors you run into in
programs.

Except when you feed those programs inputs which are converted to
integers which are then fed as domain values into some operation that
doesn't fit into the range type.

Other than that, you are okay!

Like when would that happen, right?
The errors happening most often are from my point of view:

1. null pointer errors
2. wrong type (class cast in Java, some weird missing attribute in python)
3. array/list index out of bounds

First and third ones are the same in about every language.

.... other than C and C++, where their equivalents just crash or stomp
over memory, but never mind; who uses those? ;)
The second
one is one where the typing can make a difference.

Actually, the first one is also where typing can make a difference.
Instead of this stupid idea of pointers or references having a null
value, you can make a null value which has its own type, and banish
null pointers.

So null pointer errors are transformed into type errors: the special
value NIL was fed into an operation where some other type was expected.
And by means of type polymorphism, an operation can be extended to
handle the case of NIL.
 
C

Christoph Zwerschke

Simon said:
And the same code converted to SML would automatically work on all
kinds of arrays and SML is statically typed. It's a language issue,
not a typing issue.

Ok, here the point was that Java has *explicit* static typing. SML is
not a procedural language and uses *implicit* static typing. Therefore
it shares some of the benefits of dynamically typed languages such as
Python. However, an SML version of the program would probably still have
the same bug as the Java version, right?
Since Java doesn't allow function overloading that clearly can't be
the way. J2SE 5.0 allows generic classes and functions that operate
on generic containers. There are some gotchas, but it's not drastically
more complex than the original int-only java code.

Java doesn't allow function overloading? That would be new to me. Or did
you just want to nitpick that it should be more properly called
"method overloading" in Java? And as you already said, there are some
gotchas and you would have to wrap int and long etc. I still would not
call that perfectly simple, as it is in Python.

-- Christoph
 
F

Fredrik Lundh

Kaz said:
The trouble with your point is that Christoph's original posting refers
to an article, which, in turn, at the bottom, refers to a bug database
which shows that the very same defect had been found in Sun's Java
library!

and as he points out at the top, it was the article author himself who
wrote that library code:

/.../ let me tell you how I discovered the bug: The version
of binary search that I wrote for the JDK contained the same
bug. It was reported to Sun recently when it broke someone's
program, after lying in wait for nine years or so.

</F>
 
C

Christoph Zwerschke

Kaz said:
> You can have statically typed languages with inadequate type safety,
> and you can have dynamically typed languages with inadequate type
> safety.

But the point in this example was that the Java program ironically had
the bug *because* Java handles ints in a type-safe way, while Python
does not.
>
> Uh huh! With hard-coded < and = operators, how stupid. What if you
> want to use it on strings?
> Would that be a case-insensitive lexicographic comparison, or
> case-insensitive? How do you specify what kind of less-than and equal
> you want to do?

Where's the problem? The function uses the standard ordering of the
values you feed to it, i.e. case-insensitive lexicographical order if
you feed a lis of ordinary tuples of strings. You can also feed objects
with a different ordering, like case-insensitive.

Anyway, that was completely not the point. The point was that you could
take that Java program, convert it directly to Python, and have
automatically eliminated a bug. I did not claim that the resulting
Python program was automatically a real good and Pythonic one.
> -1 to indicate not found? Why copy Java braindamage induced by an
> antiquated form of static typing? The Java version has to do that

So you would call Python's str.find() method braindamaged as well?

But as I said, that was not the point here anyway.

-- Christoph
 
N

nikie

Let's look at two different examples: Consider the following C# code:

static decimal test() {
decimal x = 10001;
x /= 100;
x -= 100;
return x;
}

It returns "0.01", as you would expect it. Now, consider the python
equivalent:

def test():
x = 10001
x /= 100
x -= 100
return x

It returns "0". Clearly an error!
Even if you used "from __future__ import division", it would actually
return "0.010000000000005116", which, depending on the context, may
still be an intolerable error.

Morale: the problem isn't whether the the types are chosen at
compile-time or at runtime, it's simply _what_ type is chosen, and
whether it's appropriate or not.

I can even think of an example where C's (and Java's) bounded ints are
the right choice, while Python's arbitraty-precision math isn't: Assume
you get two 32-bit integers containing two time values (or values from
an incremental encoder, or counter values). How do you find out how
many timer ticks (or increments, or counts) have occured between those
two values, and which one was earlier? In C, you can just write:

long Distance(long t1, long t0) { return t1-t0; }

And all the wraparound cases will be handled correctly (assuming there
have been less than 2^31 timer ticks between these two time values).
"Distance" will return a positive value if t1 was measured after t0, a
negative value otherwise, even if there's been a wraparound in between.
Try the same in Python and tell me which version is simpler!
 
D

D H

Christoph said:

The point of that is that it did fail. It threw an
ArrayIndexOutOfBoundsException exception. But it was just luck that
happened. Unfortunately I don't think java and C# have integer overflow
checking turned on by default.

Take this longArithmetic benchmark here:
http://www.cowell-shah.com/research/benchmark/code
and a story about it here:
http://www.osnews.com/story.php?news_id=5602&page=3

The java and C# versions are fast (15 seconds for me), BUT, they
give the incorrect result because of an overflow error.
The python version gives the correct result because it transparently
changes the underlying types to handle the larger numbers, BUT this
causes it to run over 20X slower than Java or C#. It takes 10 minutes
to complete in python, not 15 seconds. With psyco, it takes 5 minutes.

So to say the story you pointed out shows that python is superior
is a matter of perspective. Yes, python gave the correct result
by silently changing the underlying types to longs, and that is
what I would expect of a scripting language. But the price is
speed. In both these cases, I would rather be made aware of the
error in the code and fix it so I didn't have to suffer slowdowns.

That is why in boo ( http://boo.codehaus.org/ ) luckily overflow
checking is enabled by default, and it throws a overflow exception at
runtime to tell you something is wrong with your code. When you
then fix for that, you get the same 15 second time just like java
and C#.
 
C

Christoph Zwerschke

nikie said:
> Let's look at two different examples: Consider the following C# code:
>
> static decimal test() {
> decimal x = 10001;
> x /= 100;
> x -= 100;
> return x;
>
> It returns "0.01", as you would expect it.

Yes, I would expect that because I have defined x as decimal, not int.
> Now, consider the python equivalent:
>
> def test():
> x = 10001
> x /= 100
> x -= 100
> return x

No, that's not the Python equivalent. The equivalent of the line

decimal x = 10001

in Python would be

x = 10001.0

or even:

from decimal import Decimal
x = Decimal(10001)

Setting x = 10001 would be equivalent to the C# code

int x = 10001
> It returns "0". Clearly an error!

That's not clearly an error. If you set int x = 10001 in C#, then you
also get a "0". By setting x to be an integer, you are implicitely
telling Python that you are not interested in fractions, and Python does
what you want. Granted, this is arguable and will be changed in the
__future__, but I would not call that an error.

By the way, the equivalent Python code to your C# program gives on my
machine the very same result:
0.01

> Even if you used "from __future__ import division", it would actually
> return "0.010000000000005116", which, depending on the context, may
> still be an intolerable error.

With from __future__ import division, I also get 0.01 printed. Anyway,
if there are small discrepancies then these have nothing to do with
Python but rather with the underlying floating-point hardware and C
library, the way how you print the value and the fact that 0.01 can
principally not be stored exactly as a float (nor as a C# decimal), only
as a Python Decimal.
> I can even think of an example where C's (and Java's) bounded ints are
> the right choice, while Python's arbitraty-precision math isn't:
> Assume you get two 32-bit integers containing two time values (or
> values from an incremental encoder, or counter values). How do you
> find out how many timer ticks (or increments, or counts) have occured
> between those two values, and which one was earlier? In C, you can
> just write:
>
> long Distance(long t1, long t0) { return t1-t0; }
>
> And all the wraparound cases will be handled correctly (assuming there
> have been less than 2^31 timer ticks between these two time values).
> "Distance" will return a positive value if t1 was measured after t0, a
> negative value otherwise, even if there's been a wraparound in
> between. Try the same in Python and tell me which version is simpler!

First of all, the whole problem only arises because you are using a
statically typed counter ;-) And it only is easy in C when your counter
has 32 bits. But what about a 24 bit counter?

Anyway, in Python, you would first define:

def wrap(x, at=1<<31):
if x < -at:
x += at*2
elif x >= at:
x -= at*2
return x

Then, the Python program would be as simple:

Distance = lambda t1,t0: wrap(t1-t0)

-- Christoph
 
N

nikie

Christoph said:
Yes, I would expect that because I have defined x as decimal, not int.


No, that's not the Python equivalent. The equivalent of the line

decimal x = 10001

in Python would be

x = 10001.0

or even:

from decimal import Decimal
x = Decimal(10001)

Hm, then I probably didn't get your original point: I thought your
argument was that a dynamically typed language was "safer" because it
would choose the "right" type (in your example, an arbitrary-pecision
integer) automatically. As you can see from the above sample, it
sometimes picks the "wrong" type, too. Now you tell me that this
doesn't count, because I should have told Python what type to use. But
shouldn't that apply to the Java binary-search example, too? I mean,
you could have told Java to used a 64-bit or arbitrary-length integer
type instead of a 32-bit integer (which would actually be equivalent to
the Python code), so it would do the same thing as the Python binary
search implementation.
...
By the way, the equivalent Python code to your C# program gives on my
machine the very same result:
0.01

Try entering "x" in the interpreter, and read up about the difference
between str() and repr().
With from __future__ import division, I also get 0.01 printed. Anyway,
if there are small discrepancies then these have nothing to do with
Python but rather with the underlying floating-point hardware and C
library, the way how you print the value and the fact that 0.01 can
principally not be stored exactly as a float (nor as a C# decimal), only
as a Python Decimal.

The is OT, but what makes you think a C# decimal can't store 0.01?
First of all, the whole problem only arises because you are using a
statically typed counter ;-) And it only is easy in C when your counter
has 32 bits. But what about a 24 bit counter?

Easy, multiply it with 256 and it's a 32-bit counter ;-)
Fortunately, 24-bit-counters are quite rare. 16-bit or 32-bit counters
on the other hand are quite common, especially when you're working
close to the hardware (where C is at home). All I wanted to point out
is that bounded integers do have their advantages, because some people
in this thread apparently have never stumbled over them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top