int/long unification hides bugs

R

Rocco Moretti

kartik said:
less than 2**31 most of the time & hardly ever greater than 2**63 - no
matter if my machine is 32-bit, 64-bit or 1024-bit. the required range
depends on the data u want 2 store in the variable & not on the
hardware.

Yes. My point exactly. Very rarely will the platform limit reflect the
algorithmic limit. If you want to limit the range if your numbers, you
need to have knowledge of your particular use case - something that
can't be done with a predefined language limit.
PEP 237 says, "It will give new Python programmers [...] one less
thing to learn [...]". i feel this is not so important as the quality
of code a programmer writes once he does learn the language.

The thing is, the int/long cutoff is arbitrary, determined soley by
implemetation detail.

agreed, but it need not be that way. ints can be defined to be 32-bit
(or 64-bit) on all architectures.

But again, even though consistent, the limit is still arbitrary. Which
one will it be? How do we decide? If we're platform independent, why
bother with hardware based sizes anyway? Why not use a base 10 limit
like 10**10? As mentioned above, the choice of limit depends on the
particular algorithm, which can't be know by the language designers a
priori.
such an assertion must be placed before avery assignment to the
variable - & that's tedious. moreover, it can give u a false sense of
security when u think u have it wherever needed but u've forgotten it
somewhere.

I was thinking of judicious use for local variables inside of a loop.
But if you want general, just subclass int (with 2.3.4):
def __add__(self, other):
value = long.__add__(self, other)
if value > 100:
raise OverflowError
return limint(value)
Traceback (most recent call last):
File "<pyshell#24>", line 1, in -toplevel-
c = b+1
File "<pyshell#18>", line 5, in __add__
raise OverflowError
OverflowError
A bit crude, but it will get you started. If it's too slow, there is
nothing stopping you from making a C extension module with the
appropriate types.

I think that one of the problems we're having in this conversation is
that we are talking across each other. Nobody is denying that finding
bugs is a good thing. It's just that, for the bugs which the overflow
catches, there are much better ways of discovering them. (I'm surprised
no one has mentioned unit testing yet.)

Any decision is always has a cost/benefit analysis. For long/int
unification, the benefits have been pointed out by others, and your
proposed costs are minor, and can be ameliorated by other practices,
which most here would argue are the better way of going about it in the
first place.
 
P

Peter Hansen

Rocco said:
(I'm surprised no one has mentioned unit testing yet.)

The topic (without the word "unit" in it, mind you) was raised
fifteen minutes after kartik first posted. Istvan and Steve
have both mentioned it as well.

-Peter
 
K

kartik

Terry Reedy said:
It is a fundamental characteristic of counts and integers that adding 1 is
always valid. Given that, raising an overflow exception is itself a bug,
one that Python had and has now eliminated.

If one wishes to work with residue classes mod n, +1 is also still always
valid. It is just that (n-1) + 1 is 0 instead of n. So again, raising an
overflow error is a bug.

i don't care what mathematical properties are satisfied; what matters
is to what extent the type system helps me in writing bug-free code
[...]However, the limit n could be
anything, so fixing it at, say, 2**31 - 1 is almost always useless.

i dont think so. if it catches bugs that cause numbers to increase
beyond 2**31, that's valuable.
The use of fixed range ints is a space-time machine performance hack that
has been costly in human thought time.

on what basis do u say that

-kartik
 
K

kartik

Steve Holden said:
That seems to me to be a bit like saying you don't need to do any
engineering calculations for your bridge because you'll find out if it's
not strong enough when it falls down.

i was inaccurate. what i meant was that overflow errors provide a
certain amount of sanity checking in the absence of explicit testing -
& do u check every assignment for bounds?
2)no test (or test suite) can catch all errors, so language support 4
error detection is welcome.

Yes, but you appear to feel that an arbitrary limit on the size of
integers will be helpful [...] Relying on hardware overflows as error
detection is pretty poor, really.

i'm not relying on overflow errors to ensure correctness. it's only a
mechanism that sometimes catches bugs - & that's valuable.
But writing such tests would help much more.

agreed, but do u test your code so thoroughly that u can guarantee
your code is bug-free. till then, overflow errors help.

-kartik
 
K

kartik

Cliff Wells said:
You did say it is. And then you said it again right there.

i think you are getting confused between a mechanism that catches some
bugs & one that can catch all (a validation method)
Again that is using the integer limit to catch bugs. Repeated self-
contradiction does little to bolster your argument.


Because, strangely enough, most people want limitations *removed* from
the language, not added to it.

limits that catch bugs are good. without *any* limitations, i should
be able to redefine the symbol "4" to mean "8". would you program in
such a language? i wouldn't
If you are looking for a language with
arbitrary limits then I think Python isn't quite right for you.

not arbitrary limits, but ones that catch bugs.
 
K

kartik

Cliff Wells said:
I'm going to rewrite that last line in English so that perhaps you'll
catch on to what you are saying:

thank u so much 4 your help, but i know what i'm saying without
assistance from clowns like u. & i dont give a damn about your rules 4
proper communciation, as long as i'm understood.

'''
the required range depends on the data you want to store in the variable
and not on the hardware.
'''

The pivotal word here is "you". The data *you* want to store. One more
time YOU. I'm not correcting your mangling of English at this point,
rather I'm pointing out that it's *you*, not Python, that knows what
sort of data *you* want to store. If *you* want to limit your integers
to some arbitrary amount then *you* are going to have to write code to
do that.
What *you* need for *your* application isn't necessarily what
anyone else needs for theirs.

the required range, while being different for different variables, is
generally is less than 2**31 - & *that* can be checked by the
language.
 
K

kartik

Cliff Wells <[email protected]> wrote:
optional constraint checking [...] can be a handy feature for many kinds of
applications [...] Of course, this has nothing to do with silly and arbitrary
bounds such as 2**31-1.

bounds such as 2**31 are a crude form of constraint checking that you
get by default. if you feel your data is going to be larger, you can
use a long type

-kartik
 
S

Sam Holden

not arbitrary limits, but ones that catch bugs.

Please give an example of some code containing such a bug which
would be caught by integer limits but not caught by the unit tests
someone who has been programming for only a week would write.
 
K

kartik

Rocco Moretti said:
Very rarely will the platform limit reflect the
algorithmic limit. If you want to limit the range if your numbers, you
need to have knowledge of your particular use case - something that
can't be done with a predefined language limit.

i'm saying that most of the time, the algorithmic limit will be less
than 2**31 or 2**63 - & that can be checked by the language.

the limit is still arbitrary. Which
one will it be? How do we decide? If we're platform independent, why
bother with hardware based sizes anyway? Why not use a base 10 limit
like 10**10?

it doesn't really matter what the limit is, as long as it's large
enough that it's not crossed often. (it's only that a limit of 2**31
or 2**63 can be efficiently checked.)

I think that one of the problems we're having in this conversation is
that we are talking across each other. Nobody is denying that finding
bugs is a good thing. It's just that, for the bugs which the overflow
catches, there are much better ways of discovering them. (I'm surprised
no one has mentioned unit testing yet.)

Any decision is always has a cost/benefit analysis. For long/int
unification, the benefits have been pointed out by others, and your
proposed costs are minor, and can be ameliorated by other practices,
which most here would argue are the better way of going about it in the
first place.

agreed, but what about when you don't use these "better practices"? do
you use them for every variable? overflow catches sometimes help you
then.
 
C

Cliff Wells

thank u so much 4 your help, but i know what i'm saying without
assistance from clowns like u. & i dont give a damn about your rules 4
proper communciation, as long as i'm understood.

It's understood that you are completely dense, if that's what you meant.
the required range, while being different for different variables, is
generally is less than 2**31 - & *that* can be checked by the
language.

Do you even know what "generally" means? Generally for *you* perhaps,
and the 3 toy programs you've written, certainly not for me.
 
K

kartik

Try doing some accounting in Turkish liras, one of these days. Today,
each Euro is 189782957 cents of Turkish liras. If an Italian firm
selling (say) woodcutting equipment bids on a pretty modest contract in
Turkey, offering machinery worth 2375220 Euros, they need to easily
compute that their bid is 450776275125540 cents of Turkisk Liras. And
that's a _pretty modest_ contract, again -- if you're doing some
computation about truly substantial sums (e.g. ones connected to
government budgets) the numbers get way larger.
[...]Even just for accounting, unlimited-size integers are simply much more
practical.

as another example, using too long a string as an index into a
dictionary is not a problem (true, the dictionary may not have a
mapping, but i have the same issue with a short string). but too long
an index into a list rewards me with an exception.

But the same index, used as a dictionary key, works just fine. Specious
argument, therefore.

I don't think so. I didn't say that large numbers always cause
trouble, so you can't claim to have refuted by argument by giving a
single counter-example.

As common and everyday a
computation (in some fields) as the factorial of 1000 (number of
permutations of 1000 objects) is 2**8530 -- and combinatorial arithmetic
is anything but an "ivory tower" pursuit these days, and factorial is
the simplest building block in combinatorial arithmetic.

It's nice to get some facts, rather than an attempt to prove your
position by analogy between ints & strings ("Proof by analogy is
fraud" - Bjarne Stroustrup)
 
T

Tim Leslie

the required range, while being different for different variables, is
generally is less than 2**31 - & *that* can be checked by the
language.

I was trying to write a program which counted the number of times
people made stupid arguments on this list so I wrote a python program
to do it. Unfortunately you came along and caused my variable to
overflow. How was I to know I'd go above that magic number? Life sure
would be easier if ints which got too large were seamlessly converted
to longs!

-- Tim
 
P

Peter Hansen

kartik said:
thank u so much 4 your help, but i know what i'm saying without
assistance from clowns like u. & i dont give a damn about your rules 4
proper communciation, as long as i'm understood.

I feel the need to point out in the above the parallel (and equally
mistaken) logic with your comments in the rest of the thread.

In the thread you basically are saying "I want high quality
code, but I refuse to do the thing that will give it to me
(writing good tests) as long as a tiny subset of possible bugs
are caught by causing overflow errors at an arbitrary limit".

Above you are basically saying "I want to be understood,
but I refuse to do the thing that will make it easy for me
to be understood (using proper grammer and spelling) as
long as it's possible for people to laboriously decipher
what I'm trying to say".

Or something like that... I'm with Cliff (which is to say,
I'm outta here).

-Peter
 
B

Bengt Richter

less than 2**31 most of the time & hardly ever greater than 2**63 - no
matter if my machine is 32-bit, 64-bit or 1024-bit. the required range
depends on the data u want 2 store in the variable & not on the
hardware.
r u pstg fm a cel fone?
Anyway, you might like Ada. Googling for ada reference manual gets

http://www.adahome.com/rm95/
----
Examples (buried in lots of language lawyer syntax stuff, maybe there's
a lighter weight manual ;-)

(33)
Examples of integer types and subtypes:

(34)
type Page_Num is range 1 .. 2_000;
type Line_Size is range 1 .. Max_Line_Size;
(35)
subtype Small_Int is Integer range -10 .. 10;
subtype Column_Ptr is Line_Size range 1 .. 10;
subtype Buffer_Size is Integer range 0 .. Max;
(36)
type Byte is mod 256; -- an unsigned byte
type Hash_Index is mod 97; -- modulus is prime

----
PEP 237 says, "It will give new Python programmers [...] one less
thing to learn [...]". i feel this is not so important as the quality
of code a programmer writes once he does learn the language.

The thing is, the int/long cutoff is arbitrary, determined soley by
implemetation detail.

agreed, but it need not be that way. ints can be defined to be 32-bit
(or 64-bit) on all architectures.
So what's your point? That you're used to 32 and 64 bit registers?
Is signed two's (decided that was possessive, but maybe it's plural?;-)
complement the specific flavor you like? And python should offer these
constrained integer types? Why don't you scratch your own itch and see
if it's just a passing brain mite ;-)
such an assertion must be placed before avery assignment to the
variable - & that's tedious. moreover, it can give u a false sense of
security when u think u have it wherever needed but u've forgotten it
somewhere.

a 32-bit limit is a crude kind of assertion that u get for free, and
one u expect should hold for most variables. for those few variables
it doesn't, u can use a long.

If you are willing to make your variables exist in an object's attribute
name space, you can define almost any behavior you want. E.g., here's
a class that will make objects that will only allow integral values within
the limits you specify to be bound to names in the attribute space. Since it's
guaranteed on binding, the retrieval needs no test.
... def __init__(self, lo=-sys.maxint-1, hi=sys.maxint):
... self.__dict__[''] = (lo, hi)
... def __setattr__(self, vname, value):
... if not isinstance(value,(int, long)):
... raise TypeError,'Only integral values allowed'
... lo, hi = self.__dict__['']
... if value < lo: raise ValueError, '%r < %r (lower limit)' %(value, lo)
... if value > hi: raise ValueError, '%r > %r (high limit)' %(value, hi)
... self.__dict__[vname] = value
... ... try: i3_10.x = i; print (i,i3_10.x),
... except ValueError, e: print e
...
0 < 3 (lower limit)
1 < 3 (lower limit)
2 < 3 (lower limit)
(3, 3) (4, 4) (5, 5) (6, 6) (7, 7) (8, 8) (9, 9) (10, 10) 11 > 10 (high limit)
12 > 10 (high limit)
13 > 10 (high limit)
14 > 10 (high limit)
15 > 10 (high limit)


It defaults to your favored 32-bit range ;-)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 8, in __setattr__
ValueError: -2147483649L < -2147483648 (lower limit)


Regards,
Bengt Richter
 
K

kartik

Try doing some accounting in Turkish liras, one of these days. Today,
each Euro is 189782957 cents of Turkish liras. If an Italian firm
selling (say) woodcutting equipment bids on a pretty modest contract in
Turkey, offering machinery worth 2375220 Euros, they need to easily
compute that their bid is 450776275125540 cents of Turkisk Liras. And
that's a _pretty modest_ contract, again -- if you're doing some
computation about truly substantial sums (e.g. ones connected to
government budgets) the numbers get way larger.

Even just for
accounting, unlimited-size integers are simply much more practical.

Thank you for the information. I appreciate it.
-kartik
 
A

Andrew Dalke

Bengt said:
If you are willing to make your variables exist in an object's attribute
name space, you can define almost any behavior you want.

Or, here's another solution -- make a number-like object which
handles range checking for every operation.


The big problem here is that unlike Ada, C++, or other languages
that let you declare variable types, I have to figure out how
to merge the allowed ranges of the two values during a binary op.
I decided to use the intersection. Unlike Pythons ranges, I
choose low <= val <= high (as compared to low <= val < high).

import sys

class RangedNumber:
def __init__(self, val, low = -sys.maxint-1, high = sys.maxint):
if not (low <= high):
raise ValueError("low(= %r) > high(= %r)" % (low, high))
if not (low <= val <= high):
raise ValueError("value %r not in range %r to %r" %
(val, low, high))
self.val = val
self.low = low
self.high = high

def __str__(self):
return str(self.val)
def __repr__(self):
return "RangedNumber(%r, %r, %r)" % (self.val, self.low, self.high)

def __int__(self):
return self.val
def __float__(self):
return self.val

def _get_range(self, other):
if isinstance(other, RangedNumber):
low = max(self.low, other.low)
high = min(self.high, other.high)
other_val = other.val
else:
low = self.low
high = self.high
other_val = other

return other_val, low, high

def __add__(self, other):
other_val, low, high = self._get_range(other)
x = self.val + other_val
return RangedNumber(x, low, high)

def __radd__(self, other):
other_val, low, high = self._get_range(other)
x = other_val + self.val
return RangedNumber(x, low, high)

def __sub__(self, other):
other_val, low, high = self._get_range(other)
x = self.val - other_val
return RangedNumber(x, low, high)

def __rsub__(self, other):
other_val, low, high = self._get_range(other)
x = other_val - self.val
return RangedNumber(x, low, high)

def __abs__(self):
return RangedNumber(abs(self.val), self.low, self.high)

def __mul__(self, other):
other_val, low, high = self._get_range(other)
x = self.val * other_val
return RangedNumber(x, low, high)

def __rmul__(self, other):
other_val, low, high = self._get_range(other)
x = other_val * self.val
return RangedNumber(x, low, high)

# ... and many, many more ...

Here's some code using it

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "spam.py", line 39, in __add__
return RangedNumber(x, low, high)
File "spam.py", line 8, in __init__
raise ValueError("value %r not in range %r to %r" %
ValueError: value 101 not in range 0 to 100Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "spam.py", line 67, in __rmul__
return RangedNumber(x, low, high)
File "spam.py", line 8, in __init__
raise ValueError("value %r not in range %r to %r" %
ValueError: value 110 not in range 0 to 100Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "spam.py", line 54, in __rsub__
return RangedNumber(x, low, high)
File "spam.py", line 8, in __init__
raise ValueError("value %r not in range %r to %r" %
ValueError: value -10 not in range 0 to 100
Andrew
(e-mail address removed)
 
A

Alex Martelli

kartik said:
(e-mail address removed) (Alex Martelli) wrote in message
Cliff Wells <[email protected]> wrote:
optional constraint checking [...] can be a handy feature for many kinds
of applications [...] Of course, this has nothing to do with silly and
arbitrary bounds such as 2**31-1.

bounds such as 2**31 are a crude form of constraint checking that you
get by default. if you feel your data is going to be larger, you can
use a long type

Too crude to be any real use, and wrong-headed _as a default_.
Constraints should apply _when explicitly requested_, with the default
always being "unconstrained". (It's a wart in Python that we don't get
that with recursion, for example; it forces people to look for
non-recursive solutions, where recursive ones are simpler and smoother,
because it robs recursive approaches of some generality).


Alex
 
A

Alex Martelli

kartik said:
Try doing some accounting in Turkish liras, one of these days. Today, ...
[...]Even just for accounting, unlimited-size integers are simply much
more practical.

I see you quote this but can't refute it...
I don't think so. I didn't say that large numbers always cause
trouble, so you can't claim to have refuted by argument by giving a
single counter-example.

I rewrote your post, using long strings instead of large numbers, to
show the arguments are exactly identical, and equally bereft of
substance, against getting either unlimited strings or numbers as the
default. You tried to show asymmetry by comparing strings used as keys
into a dictionary vs ints used as indices into a list, and I refute that
silly attempt: if you have dictionaries you can use long strings or
large numbers as keys into them just as well.

Your mention of lists, in fact, shows exactly how specious your
arguments for a default integer limit of 2**31-1 are. That totally
arbitrary limit has nothing to do with the size of any given list; the
size of number that would give problems when used as list index varies,
but it's more likely to be a few millions, than billions. _Moreover_,
as soon as you try to use a too-large index for the specific list, you
get an IndexError. It's therefore totally useless to try and get an
OverflowError instead if the index, besides being too big for that
specific list, is also over 2**31-1 (or other arbitrary boundary).

It's nice to get some facts, rather than an attempt to prove your
position by analogy between ints & strings ("Proof by analogy is
fraud" - Bjarne Stroustrup)

Go use C++, then, and stop wasting our time. If we preferred the way
Stroustrup designs programming languages, to that in which van Rossum
designs them, we'd be over in comp.lang.c++, not here in
comp.lang.python -- ever thought of that?

The analogy I posed, and still defend, merely shows your arguments were
badly thought out, weak, and totally useless in the first place. It
does not need to 'prove' anything, because we're neither in a court of
law, nor in mathematics: it just shows up your arguments for the
worthless froth they are. The facts (that should be obvious to anybody,
of course) that the factorial function easily makes very big numbers,
that some countries have very devalued currencies, etc, further show
that big numbers (just like big strings) _are_ useful to practical
problems, thus your totally wrong-headed request to put default limits
on numbers would also cause practical damage -- for example, it would
make an accounting-arithmetic package designed with strong currencies in
mind (Euros, dollars, pounds, ...) unusable for weak currencies, because
of the _default_ nature of the limits.

Fortunately there is no chance whatsoever that Python will get into
reverse gear and put back the arbitrary default limits you hanker for.
If the many analogies, arguments, and practical examples that have been
offered to help you see why, help you accept the fact, good. If not,
good riddance -- you have not offered _one_ sound and useful line of
reasoning throughout this big thread, after all, so it's not as if
losing your input will sadly impoverish discussions here.


Alex
 
K

kartik

Andrew Dalke said:
Real code? Here's one used for generating the canonical
SMILES representation of a chemical compound. It comes
from the FROWNS package.

try:
val = 1
for offset, bondtype in offsets[index]:
val *= symclasses[offset] * bondtype
except OverflowError:
# Hmm, how often does this occur?
val = 1L
for offset, bondtype in offsets[index]:
val *= symclasses[offset] * bondtype


The algorithm uses the fundamental theorem of arithmetic
as part of computing a unique characteristic value for
every atom in the molecule, up to symmetry.

It's an iterative algorithm, and the new value for
a given atom is the product of the old values of its
neighbor atoms in the graph:

V'(atom1) = V(atom1.neighbor[0]) * V(atom1.neighbor[1]) * ...

In very rare cases this can overflow 32 bits. Rare
enough that it's faster to do everything using 32 bit
numbers and just redo the full calculation if there's
an overflow.

Because Python now no longer gives this overflow error,
we have the advantage of both performance and simplified
code.

Relatively speaking, 2**31 is tiny. My little laptop
can count that high in Python in about 7 minutes, and
my hard drive has about 2**35 bits of space. I deal
with single files bigger than 2**32 bits.

Why then should I have to put in all sorts of workarounds
into *my* code because *you* don't know how to write
good code, useful test cases, and appropriate internal
sanity checks?

Thank you for the info.
-kartik
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top