int/long unification hides bugs

Cliff Wells · Oct 26, 2004

less than 2**31 most of the time & hardly ever greater than 2**63 - no
matter if my machine is 32-bit, 64-bit or 1024-bit. the required range
depends on the data u want 2 store in the variable & not on the
hardware.

I'm going to rewrite that last line in English so that perhaps you'll
catch on to what you are saying:

'''
the required range depends on the data you want to store in the variable
and not on the hardware.
'''

The pivotal word here is "you". The data *you* want to store. One more
time YOU. I'm not correcting your mangling of English at this point,
rather I'm pointing out that it's *you*, not Python, that knows what
sort of data *you* want to store. If *you* want to limit your integers
to some arbitrary amount then *you* are going to have to write code to
do that. What *you* need for *your* application isn't necessarily what
anyone else needs for theirs.

If you want such a domain-specific language then you should take a look
at http://kartik.sourceforge.net as I think that's the only language
that's going to suit you.

Sam Holden · Oct 26, 2004

integers are used in different ways from strings. i may expect file
paths to be around 100 characters, and if i get a 500-character path,
i have no problem just because of the length. but if a person's age is
500 where i expect it to be less than 100, then **definitely**
something's wrong.

What if Noah is using your program?

Cliff Wells · Oct 26, 2004

integers are used in different ways from strings. i may expect file
paths to be around 100 characters, and if i get a 500-character path,
i have no problem just because of the length. but if a person's age is
500 where i expect it to be less than 100, then **definitely**
something's wrong.

So Python should raise an exception if it notices that your variable is
named "age" and you've put a number greater than 100 in it? I'm amazed
there isn't already a PEP on this. Have you checked to make sure?

as another example, using too long a string as an index into a
dictionary is not a problem (true, the dictionary may not have a
mapping, but i have the same issue with a short string). but too long
an index into a list rewards me with an exception.

I'd rather be rewarded with a cookie.

as i look at my code, i rarely have an issue with string sizes, but if
an integer variable gets very large (say > 2**31 or 2**63), it
generally reflects a bug in my code.

Well, of course. After all, there are only 2**63 numbers in the
universe, so any number higher than that is clearly an error. In fact,
given that so many people use numbers on a regular basis, I suspect
there are actually far fewer numbers left now than there were
originally.

i suggest u base your comments on real code, rather than reasoning in
an abstract manner from your ivory tower.

I can see why you would want to avoid that.

Cliff Wells · Oct 26, 2004

What if Noah is using your program?

Then Python clearly needs a special mode so that when a person's age is
entered large numbers are allowed, but when counting animals it throws
an exception if the number is greater than two.

Duh.

Josiah Carlson · Oct 26, 2004

integers are used in different ways from strings. i may expect file
paths to be around 100 characters, and if i get a 500-character path,
i have no problem just because of the length. but if a person's age is
500 where i expect it to be less than 100, then **definitely**
something's wrong.

So *you* need to do bounds checking. Last time I checked, Python didn't
raise an overflow error on getting to 500, so you're going to need to do
the check anyways.

as i look at my code, i rarely have an issue with string sizes, but if
an integer variable gets very large (say > 2**31 or 2**63), it
generally reflects a bug in my code.

And you should check that, and not rely on a misfeature and a mistake.
It is going in whether you like it or not, and has been planned since
Python 2.3.

i suggest u base your comments on real code, rather than reasoning in
an abstract manner from your ivory tower.

Ahh, real code. Ok *rummages through some code*. Yeah, so I checked,
and I don't rely on overflow errors. I use explicit bounds checking in
all cases where values outside my expected range can come. Then again,
I prefer to check my variables when I rely on them.

- Josiah

Josiah Carlson · Oct 26, 2004

maybe, why not use an automated test built-in 2 the language? i get it
4 free.

Overflow errors are not automated tests. Overflow errors were a (mis)
feature of Python (non-long) integers. Had the unification occurred
prior to your using of Python, this conversation wouldn't be happening.

- Josiah

Andrew Dalke · Oct 26, 2004

kartik said:
i suggest u base your comments on real code, rather than reasoning in
an abstract manner from your ivory tower.

My experience, based on 10+ years of making a living as a
professional programmer, says that you are wrong and the
comments made by others has been spot on.

Real code? Here's one used for generating the canonical
SMILES representation of a chemical compound. It comes
from the FROWNS package.

try:
val = 1
for offset, bondtype in offsets[index]:
val *= symclasses[offset] * bondtype
except OverflowError:
# Hmm, how often does this occur?
val = 1L
for offset, bondtype in offsets[index]:
val *= symclasses[offset] * bondtype

The algorithm uses the fundamental theorem of arithmetic
as part of computing a unique characteristic value for
every atom in the molecule, up to symmetry.

It's an iterative algorithm, and the new value for
a given atom is the product of the old values of its
neighbor atoms in the graph:

V'(atom1) = V(atom1.neighbor[0]) * V(atom1.neighbor[1]) * ...

In very rare cases this can overflow 32 bits. Rare
enough that it's faster to do everything using 32 bit
numbers and just redo the full calculation if there's
an overflow.

Because Python now no longer gives this overflow error,
we have the advantage of both performance and simplified
code.

Relatively speaking, 2**31 is tiny. My little laptop
can count that high in Python in about 7 minutes, and
my hard drive has about 2**35 bits of space. I deal
with single files bigger than 2**32 bits.

Why then should I have to put in all sorts of workarounds
into *my* code because *you* don't know how to write
good code, useful test cases, and appropriate internal
sanity checks?

Your examples, btw, are hypothetical. Having an
OverflowException at 2**31 doesn't fix your 500 year
old person problem and you'll have an IndexError / KeyError
well before you reach that limit, assuming your string /
dictionary doesn't have that much data.

Why not give some example of real code that shows
1) that giving an OverflowError is the right behaviour
(excluding talking to hardware or other system that
requires a fixed size number), 2) that there is a
fixed number N that is always appropriate, and 3)
that value is < sys.maxint.

For bonus points, use proper spelling and capitalization.

Andrew
(e-mail address removed)

Andrew Dalke · Oct 26, 2004

Cliff said:
Then Python clearly needs a special mode so that when a person's age is
entered large numbers are allowed, but when counting animals it throws
an exception if the number is greater than two.

Unless of course the count is of a clean animal, in which case
the exception limit is greater than seven.

Andrew
(e-mail address removed)

Alex Martelli · Oct 26, 2004

kartik said:
integers are used in different ways from strings. i may expect file

Integers are used in a huge variety of ways, and so are strings.

paths to be around 100 characters, and if i get a 500-character path,
i have no problem just because of the length. but if a person's age is
500 where i expect it to be less than 100, then **definitely**
something's wrong.

Try doing some accounting in Turkish liras, one of these days. Today,
each Euro is 189782957 cents of Turkish liras. If an Italian firm
selling (say) woodcutting equipment bids on a pretty modest contract in
Turkey, offering machinery worth 2375220 Euros, they need to easily
compute that their bid is 450776275125540 cents of Turkisk Liras. And
that's a _pretty modest_ contract, again -- if you're doing some
computation about truly substantial sums (e.g. ones connected to
government budgets) the numbers get way larger.

[[and yes, integer numbers of some fraction of a currency, typically
cents, are a good and practical way to do accounting -- nix floating
point, be it binary or decimal]].

Sure, Turkey will rebase its currency in 2005 -- but who can predict
when some other country's currency will similarly debase. Even just for
accounting, unlimited-size integers are simply much more practical.

as another example, using too long a string as an index into a
dictionary is not a problem (true, the dictionary may not have a
mapping, but i have the same issue with a short string). but too long
an index into a list rewards me with an exception.

But the same index, used as a dictionary key, works just fine. Specious
argument, therefore.

as i look at my code, i rarely have an issue with string sizes, but if
an integer variable gets very large (say > 2**31 or 2**63), it
generally reflects a bug in my code.

This may be peculiar to the kind of code you write -- if you hit more
bugs whose symptom is a large integer than ones whose symptom is a large
string, you're probably generating fewer strings than integers. But
many other people's code have the opposite character, and it's quite
presumptous of you to suggest changing Python without considering that.

i suggest u base your comments on real code, rather than reasoning in
an abstract manner from your ivory tower.

I suggest you take your blinkers off, and look at all ways integers are
commonly used in all sorts of computations, rather than imagining your
personal code is the end of the world. As common and everyday a
computation (in some fields) as the factorial of 1000 (number of
permutations of 1000 objects) is 2**8530 -- and combinatorial arithmetic
is anything but an "ivory tower" pursuit these days, and factorial is
the simplest building block in combinatorial arithmetic.

If you need objects with constraints, build them as custom types.
Subclass int, str, whatever, to accept bounds or other checkers in their
constructors (or as compiletime constants, whatever), convert the result
of each operation to the same type, and raise as soon as an instance is
constructed that's out of bounds. It's not a difficult exercise, and if
you design and apply your constraints well it may help you catch some
categories of bugs sooner than without such constraints.

Once you have convincing use cases and experience to show that this kind
of thing is more useful than previously noticed, _then_ you may stand a
chance to have some kind of optional limit-checking subtypes rolled into
Python's core/standard library. So far, you have nothing of the kind.

Alex

Alex Martelli · Oct 26, 2004

Cliff Wells said:
The pivotal word here is "you". The data *you* want to store. One more
time YOU. I'm not correcting your mangling of English at this point,
rather I'm pointing out that it's *you*, not Python, that knows what
sort of data *you* want to store. If *you* want to limit your integers
to some arbitrary amount then *you* are going to have to write code to
do that. What *you* need for *your* application isn't necessarily what
anyone else needs for theirs.

This is correct, but optional constraint checking (on all kinds of data
types) can be a handy feature for many kinds of applications. It might
be handy to have a 'type decorator' for that, one which would wrap all
operations returning a result (and all mutations, for mutables) into
suitable checks. If I have a list that's supposed to always be between
5 and 8 items, it might be handy to write:

x = constrain(list, LenConstraint(5, 9))([0]*7)

and if I have an integer number that's supposed to always be (and return
when operated upon other such ints) nonnegative and less than 500,

y = constrain(int, SizeConstraint(0, 500))()

Now, y-1 would raise a ConstraintViolation, as would, say,
x.extend('hello'). Sure, nothing momentous, but this would catch some
typos (where I meant y+1) or thinkos (where I meant x.append('hello'))
faster than the unittests would.

There is some previous art for that, of course. For integers only,
Pascal let you declare a range of admissible values; in debug mode only,
many compilers would helpfully catch range-mistakes, much like (say) in
Python, debug or not, list indexing catches out-of-bounds errors. In
SQL (and other data modeling situations), it's normal and helpful to
express optional constraints on the range of allowed values.

Of course, this has nothing to do with silly and arbitrary bounds such
as 2**31-1. But constraint checking should not necessarily be ruled out
as a generally helpful technique.

Maybe it's better to attach constraints to attributes rather than to
types -- use c.y rather than bare y, so that assignments to c.y that do
not meet the constraints will raise ConstraintViolation, say. That
would surely be easier to program, and speedier, for non-mutables.

Exploring these design-space variations and gathering real-world use
cases is best done by designing and implementing a Python extension for
the purpose, of course. Rolling some such functionality into the Python
core / standard library would be silly without such specific previous
experience.

Alex

Michael Hoffman · Oct 26, 2004

kartik said:
i suggest u base your comments on real code, rather than reasoning in
an abstract manner from your ivory tower.

Could someone please help me with a related question? Why do I have the
unmistakable feeling that this is a troll? Is it the misspellings? The
gratuitous sniping? Or the OP's suggestions to provide "real code" when
he has not done the same?

Alex Martelli · Oct 26, 2004

Michael Hoffman said:
Could someone please help me with a related question? Why do I have the
unmistakable feeling that this is a troll? Is it the misspellings? The
gratuitous sniping? Or the OP's suggestions to provide "real code" when
he has not done the same?

It's probably just the usual way he interacts, no big deal. He didn't
ask for real code to be provided, but used to base one's comments on,
please note; and I gave two real-life examples (a woodworking equipment
company in Italy bidding to supply some equipment to a Turkish customer
who prefers to get billed in their local currency; a factorial) without
any trouble and without needing to post the code (it's quite obvious).

Alex

Andrew Dalke · Oct 26, 2004

Michael said:
Could someone please help me with a related question? Why do I have the
unmistakable feeling that this is a troll? Is it the misspellings? The
gratuitous sniping? Or the OP's suggestions to provide "real code" when
he has not done the same?

If you check the archives you'll see the OP's nym come up in two
other hits. One for a proposal to gcc and another as a proposal to
OpenOffice. Both of the sort "I think XYZ would be cool", both using
the poor syntax.

Neither show much troll-like behavior.

My guess is the not unusual case of someone who works mostly alone
and doesn't have much experience in diverse projects nor working
with more experience people.

I've seen some similar symptoms working with, for example,
undergraduate students who are hotshot programmers ... when
compared to other students in their non-CS department but not
when compared to, say, a CS student, much less a experienced
developer.

Andrew
(e-mail address removed)

Andrew Dalke · Oct 26, 2004

Me:

I've seen some similar symptoms ...

But of course correlation does not imply causation and I use
that solely as an example to show how it could be caused by
non-troll behavior.

There could also be cultural reasons, and/or personal ones,
and/or the keyboard could have a broken shift state... Though
that would require '*' to not be on a shifted position.

Andrew
(e-mail address removed)

Michael Hoffman · Oct 26, 2004

Andrew said:
Neither show much troll-like behavior.

My guess is the not unusual case of someone who works mostly alone
and doesn't have much experience in diverse projects nor working
with more experience people.

Perhaps. Maybe I have just been spending too much time in troll-infested
forums, which sets one's troll-o-meter off a lot earlier.

Cliff Wells · Oct 26, 2004

This is correct, but optional constraint checking (on all kinds of data
types) can be a handy feature for many kinds of applications. It might
be handy to have a 'type decorator' for that, one which would wrap all
operations returning a result (and all mutations, for mutables) into
suitable checks. If I have a list that's supposed to always be between
5 and 8 items, it might be handy to write:

Of course, this has nothing to do with silly and arbitrary bounds such
as 2**31-1. But constraint checking should not necessarily be ruled out
as a generally helpful technique.

Not at all. I do quite a bit of database programming and use
constraints (foreign keys, unique indices, etc) extensively. The
concept is also in widespread use in GUI programming libraries for
controls that deal with user input (e.g. masked input controls). In
fact, most controls present in a GUI implicitly constrain user input
(menus, buttons, etc).

Of course what you describe above can be done now using functions and
derived classes, but it would certainly be interesting to have a general
(and concise) way of describing constraints within the language itself.

Regards,
Cliff

John Machin · Oct 26, 2004

Cliff Wells said:
Here's one:

# count how many ferrets I have
ferrets = 0
while 1:
try:
ferrets += 1
except:
break
print ferrets

As you can clearly see, the answer should have been 3, but due to Python
silently allowing numbers larger than 3 the program gets stuck in an
apparently interminable loop, requiring me to reboot Microsoft Bob.

There always were legends that sys.maxferretpopulation is
implementation-dependant, not readable, and not writable. More
recently the whisper is that at least under Windows it is set by the
installer at installation time using an algorithm known only to the
timbot.

Alex Martelli · Oct 26, 2004

Cliff Wells said:
Not at all. I do quite a bit of database programming and use
constraints (foreign keys, unique indices, etc) extensively. The

Yep, and besides such 'structural' constraints even the simple kind of
check such as "this number is alway between X and Y" may to a lesser
extent come in handy.

concept is also in widespread use in GUI programming libraries for
controls that deal with user input (e.g. masked input controls). In
fact, most controls present in a GUI implicitly constrain user input
(menus, buttons, etc).

A different case, IMHO. An input/'edit' box with inherent checks is a
bit closer.

Of course what you describe above can be done now using functions and
derived classes, but it would certainly be interesting to have a general
(and concise) way of describing constraints within the language itself.

Functions are in general the proper way to check, but having to
explicitly call the checking function each time a change may have
occurred gets old fast.

What I was musing about is (well, half of it) is easy to implement if
you're using qualified names rather than barenames -- each assignment to
a.x may easily be made to go through a setter-function that calls an
appropriate checker function. Attaching similar setter-functions to
barenames is a very alien concept to Python today, but in a sense the
difference could be seen as mere syntax sugar.

The other half of the problem has to do with mutables, and ensuring a
checker function (on object invariants, if you will) runs after each
mutation -- not all that new a notion, just a part of design by contract
(people always focus on preconditions and postconditions and appear to
forget invariants, which are just as crucial;-).

I'm not sure there's a "grand unification" between these halves just
waiting to happen. Surely, though, a little add-on package making it
easy to add checker functions and providing a few typical such checkers
might be of some help. If the checkers could be easily disabled at the
flip of a switch (say the debug flag;-) they might even help by being
potentially-executable specifications of programmer intention (much like
DbC helps in part exactly because it _can_ be disabled that way;-).

Guess I'm mostly musing on the general issue and the 2**31-1 silliness
was just a spark that lit some waiting tinder in my mind;-).

Alex

Peter Hansen · Oct 26, 2004

Cliff said:
Then Python clearly needs a special mode so that when a person's age is
entered large numbers are allowed, but when counting animals it throws
an exception if the number is greater than two.

Duh.

And what about Methuselah? He's not going to be receiving his
social security cheques if he can't enter his age as higher than,
say, 2**7 (as our friend kartik would arbitrarily say...). We
actually need 2**10 for his age, but now that's a little too
high so maybe we'd better just kill the old bugger off at 2**9
(hey, he's lived so long that shaving 208 years off his age
won't bother him, right?).

-Peter

Cliff Wells · Oct 26, 2004

Cliff Wells wrote:

And what about Methuselah? He's not going to be receiving his
social security cheques if he can't enter his age as higher than,
say, 2**7 (as our friend kartik would arbitrarily say...). We
actually need 2**10 for his age, but now that's a little too
high so maybe we'd better just kill the old bugger off at 2**9
(hey, he's lived so long that shaving 208 years off his age
won't bother him, right?).

Ah, the infamous Methuselah quandary. I think actually killing him with
a program would involve specialized hardware (and possibly a gun
permit), which is why this problem has yet to be solved satisfactorily.
Without corporate interest this problem will probably remain unresolved.

Of course the Social Security Administration is one of those places
where constraints define reality, so my guess is he's already received a
letter informing him he's dead and so is no longer eligible anyway. Yet
another example of practicality-beats-purity.

Regards,
Cliff

EuroPython 2006 and Py3.0	23	Jul 5, 2006
Recommend an E-book Meeting the Following Criteria (Newbie, Long)	4	Dec 14, 2005
python-dev Summary for 2003-08-16 through 2003-08-31	0	Sep 13, 2003
python-dev Summary for 2004-08-01 through 2004-08-15	17	Aug 24, 2004
Memory management strategies in C. (long)	7	Aug 20, 2003
PEP 350: Codetags	20	Sep 26, 2005
10 Reasons Business Intelligence spooks IT Managers	0	Mar 3, 2008
Mixed clocked/combinatorial coding styles (another thread)	23	Aug 21, 2008

int/long unification hides bugs

Cliff Wells

Sam Holden

Cliff Wells

Cliff Wells

Josiah Carlson

Josiah Carlson

Andrew Dalke

Andrew Dalke

Alex Martelli

Alex Martelli

Michael Hoffman

Alex Martelli

Andrew Dalke

Andrew Dalke

Michael Hoffman

Cliff Wells

John Machin

Alex Martelli

Peter Hansen

Cliff Wells

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads