enumerate overflow

C

crwe

Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

Cheers.
 
D

Diez B. Roggisch

Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

Most probably you can't, because it is a C-written function I presume.

But as python 2.4 has generators, it's ease to create an enumerate yourself:


def lenumerate(f):
i = 0
for line in f:
yield i, line
i += 1

Diez
 
T

Tim Chase

in python2.4, i read lines from a file with
Most probably you can't, because it is a C-written function I presume.

But as python 2.4 has generators, it's ease to create an enumerate yourself:


def lenumerate(f):
i = 0
for line in f:
yield i, line
i += 1


I'd consider this a bug: either in the implementation of
enumerate(), or in the documentation

http://docs.python.org/lib/built-in-funcs.html#l2h-24

which fails to mention such arbitrary limitations. The
documentation describes what you create as an lenumerate() function.

Most likely, if one doesn't want to change the implementation,
one should update the documentation for enumerate() to include a
caveat like xrange() has

http://docs.python.org/lib/built-in-funcs.html#l2h-80

"""
Note: xrange() is intended to be simple and fast. Implementations
may impose restrictions to achieve this. The C implementation of
Python restricts all arguments to native C longs ("short" Python
integers), and also requires that the number of elements fit in a
native C long.
"""

While yes, it's easy enough to create the above lenumerate
generator (just as it's only slightly more work to create an
lxrange function), it would be good if the docs let you know that
you might need to create such a function

-tkc
 
S

Steve Holden

Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?
Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...

But it is true that Python 2.5 uses an enumobject representation that
limits the index to a (C) long:

typedef struct {
PyObject_HEAD
long en_index; /* current index of enumeration */
PyObject* en_sit; /* secondary iterator of enumeration */
PyObject* en_result; /* result tuple */
} enumobject;

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline
 
T

Tim Chase

for lineNum, line in enumerate(f): ...
Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...

A modestly (but not overwhelmingly) long time:

(defining our own xrange-ish generator that can handle things
larger than longs)
.... i = 0
.... while i < x:
.... yield i
.... i += 1
........
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError


It took me about an 60-90 minutes to hit the assertion on a
dual-core 2.8ghz machine under otherwise-light-load. If
batch-processing lengthy log files or other large data such as
genetic data, it's entirely possible to hit this limit as the OP
discovered.

-tkc
 
S

Steve Holden

Tim said:
A modestly (but not overwhelmingly) long time:

(defining our own xrange-ish generator that can handle things larger
than longs)

... i = 0
... while i < x:
... yield i
... i += 1
...
...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError


It took me about an 60-90 minutes to hit the assertion on a dual-core
2.8ghz machine under otherwise-light-load. If batch-processing lengthy
log files or other large data such as genetic data, it's entirely
possible to hit this limit as the OP discovered.
I wouldn't dream of suggesting it's impossible. I just regard "soon" as
less than an hour in commuter's terms, I suppose.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline
 
T

Tim Golden

Steve said:
I wouldn't dream of suggesting it's impossible.
> I just regard "soon" as less than an hour in
> commuter's terms, I suppose.

Sadly, speaking as a Londoner, an hour is indeed
"soon" in commuter terms.

TJG
 
P

Paul Rubin

Tim Chase said:
I'd consider this a bug: either in the implementation of enumerate(),
or in the documentation

http://docs.python.org/lib/built-in-funcs.html#l2h-24

2.5 has a patch that causes enumerate() and count() to raise overflow
if the count wraps around, which is still bad but at least beats
having the number suddenly go negative. See:

http://bugs.python.org/issue1512504 and
http://mail.python.org/pipermail/python-checkins/2007-February/058486.html

also:

http://bugs.python.org/issue1326277

I hope in 3.0 there's a real fix, i.e. the count should promote to
long. The rationale for leaving the bug in the library is just silly.
2**32 is not that big a number if we're talking about a language and
runtime system supposedly good for writing servers that stay up
continuously for years.
 
R

Raymond Hettinger

[Paul Rubin]
I hope in 3.0 there's a real fix, i.e. the count should promote to
long.

In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").


Raymond
 
P

Paul Rubin

Raymond Hettinger said:
In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").

Great, this is good to hear. I think it's ok if the enumeration slows
down after fixnum overflow is reached. So it's just a matter of
replacing the overflow signal with consing up a long. The fixnum case
would be the same as it is now. To be fancy, the count could be
stored in two C ints (or a gcc long long) so it would go up to 64 bits
but I don't think it's worth it, especially for itertools.count which
should be able to take arbitrary (i.e. larger than 64 bits) initializers.

As for real programs, well, the Y2038 bug is slowly creeping up on us.
That's when Unix timestamps overflow a signed 32-bit counter. It's
already caused an actual system failure, in 2006:

http://worsethanfailure.com/Articles/The_Harbinger_of_the_Epoch_.aspx

Really, the whole idea of int/long unification is so we can stop
worrying about "omg, that could happen". We want to write programs
without special consideration or "omg" about those possibilities, and
still have them keep working smoothly if that DOES happen. Just about
all of us these days have 100's of GB's or more of disk space on our
systems, and files with over 2**32 bytes or lines are not even
slightly unreasonable. We shouldn't have to write special generators
to deal with them, the library should instead just do the right thing.
 
H

Hrvoje Niksic

Raymond Hettinger said:
[Paul Rubin]
I hope in 3.0 there's a real fix, i.e. the count should promote to
long.

In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones
used in real programs, not examples contrived to say, "omg, see what
*could* happen").

Using PY_LONG_LONG for the counter, and PyLong_FromLongLong to create
the Python number should work well for huge sequences without
(visibly) slowing down the normal case.
 
K

koara

In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").

Raymond


Thanks everybody for the reply and suggestions, I'm glad to see the
issues's already been discovered/discussed/almostresolved.

By the way, I do not consider my programs in any way 'unreal'.
 
R

Raymond Hettinger

Thanks everybody for the reply and suggestions, I'm glad to see the
issues's already been discovered/discussed/almostresolved.

The new code is checked-in. In Py2.6, enumerate() will no longer
raise an OverflowError and it will automatically shift from ints to
longs. Will check in something similar for itertools.count() when I
get a chance.


Raymond
 
G

Gabriel Genellina

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

(what kind of files are you using? enumerate overlows after more than two
billion lines... is that "soon" for you?)

I'm afraid neither iterate nor itertools.count will generate a long
integer; upgrading to Python 2.5 won't help. I think the only way is to
roll your own counter:

lineNum = 0
for line in f:
...
lineNum += 1
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top