enumerate overflow

crwe · Oct 3, 2007

Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

Cheers.

Diez B. Roggisch · Oct 3, 2007

Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

Most probably you can't, because it is a C-written function I presume.

But as python 2.4 has generators, it's ease to create an enumerate yourself:

def lenumerate(f):
i = 0
for line in f:
yield i, line
i += 1

Diez

Tim Chase · Oct 3, 2007

in python2.4, i read lines from a file with

Most probably you can't, because it is a C-written function I presume.

But as python 2.4 has generators, it's ease to create an enumerate yourself:

def lenumerate(f):
i = 0
for line in f:
yield i, line
i += 1

I'd consider this a bug: either in the implementation of
enumerate(), or in the documentation

http://docs.python.org/lib/built-in-funcs.html#l2h-24

which fails to mention such arbitrary limitations. The
documentation describes what you create as an lenumerate() function.

Most likely, if one doesn't want to change the implementation,
one should update the documentation for enumerate() to include a
caveat like xrange() has

http://docs.python.org/lib/built-in-funcs.html#l2h-80

"""
Note: xrange() is intended to be simple and fast. Implementations
may impose restrictions to achieve this. The C implementation of
Python restricts all arguments to native C longs ("short" Python
integers), and also requires that the number of elements fit in a
native C long.
"""

While yes, it's easy enough to create the above lenumerate
generator (just as it's only slightly more work to create an
lxrange function), it would be good if the docs let you know that
you might need to create such a function

-tkc

Steve Holden · Oct 3, 2007

Hello all,

in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...

But it is true that Python 2.5 uses an enumobject representation that
limits the index to a (C) long:

typedef struct {
PyObject_HEAD
long en_index; /* current index of enumeration */
PyObject* en_sit; /* secondary iterator of enumeration */
PyObject* en_result; /* result tuple */
} enumobject;

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

Tim Chase · Oct 3, 2007

for lineNum, line in enumerate(f): ...

Just how "soon" exactly do you read sys.maxint lines from a file? I
should have thought that it would take a significant amount of time to
read 2,147,483,647 lines ...

A modestly (but not overwhelmingly) long time:

(defining our own xrange-ish generator that can handle things
larger than longs)
.... i = 0
.... while i < x:
.... yield i
.... i += 1
........
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError

It took me about an 60-90 minutes to hit the assertion on a
dual-core 2.8ghz machine under otherwise-light-load. If
batch-processing lengthy log files or other large data such as
genetic data, it's entirely possible to hit this limit as the OP
discovered.

-tkc

Steve Holden · Oct 3, 2007

Tim said:
A modestly (but not overwhelmingly) long time:

(defining our own xrange-ish generator that can handle things larger
than longs)

... i = 0
... while i < x:
... yield i
... i += 1
...
...
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AssertionError

It took me about an 60-90 minutes to hit the assertion on a dual-core
2.8ghz machine under otherwise-light-load. If batch-processing lengthy
log files or other large data such as genetic data, it's entirely
possible to hit this limit as the OP discovered.

I wouldn't dream of suggesting it's impossible. I just regard "soon" as
less than an hour in commuter's terms, I suppose.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

Tim Golden · Oct 3, 2007

Steve said:
I wouldn't dream of suggesting it's impossible.
> I just regard "soon" as less than an hour in
> commuter's terms, I suppose.

Sadly, speaking as a Londoner, an hour is indeed
"soon" in commuter terms.

TJG

Paul Rubin · Oct 3, 2007

Tim Chase said:
I'd consider this a bug: either in the implementation of enumerate(),
or in the documentation

http://docs.python.org/lib/built-in-funcs.html#l2h-24

2.5 has a patch that causes enumerate() and count() to raise overflow
if the count wraps around, which is still bad but at least beats
having the number suddenly go negative. See:

http://bugs.python.org/issue1512504 and
http://mail.python.org/pipermail/python-checkins/2007-February/058486.html

also:

http://bugs.python.org/issue1326277

I hope in 3.0 there's a real fix, i.e. the count should promote to
long. The rationale for leaving the bug in the library is just silly.
2**32 is not that big a number if we're talking about a language and
runtime system supposedly good for writing servers that stay up
continuously for years.

Raymond Hettinger · Oct 3, 2007

[Paul Rubin]

I hope in 3.0 there's a real fix, i.e. the count should promote to
long.

In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").

Raymond

Paul Rubin · Oct 3, 2007

Raymond Hettinger said:
In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").

Great, this is good to hear. I think it's ok if the enumeration slows
down after fixnum overflow is reached. So it's just a matter of
replacing the overflow signal with consing up a long. The fixnum case
would be the same as it is now. To be fancy, the count could be
stored in two C ints (or a gcc long long) so it would go up to 64 bits
but I don't think it's worth it, especially for itertools.count which
should be able to take arbitrary (i.e. larger than 64 bits) initializers.

As for real programs, well, the Y2038 bug is slowly creeping up on us.
That's when Unix timestamps overflow a signed 32-bit counter. It's
already caused an actual system failure, in 2006:

http://worsethanfailure.com/Articles/The_Harbinger_of_the_Epoch_.aspx

Really, the whole idea of int/long unification is so we can stop
worrying about "omg, that could happen". We want to write programs
without special consideration or "omg" about those possibilities, and
still have them keep working smoothly if that DOES happen. Just about
all of us these days have 100's of GB's or more of disk space on our
systems, and files with over 2**32 bytes or lines are not even
slightly unreasonable. We shouldn't have to write special generators
to deal with them, the library should instead just do the right thing.

Hrvoje Niksic · Oct 3, 2007

Raymond Hettinger said:
[Paul Rubin]

I hope in 3.0 there's a real fix, i.e. the count should promote to
long.

Click to expand...

In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones
used in real programs, not examples contrived to say, "omg, see what
*could* happen").

Using PY_LONG_LONG for the counter, and PyLong_FromLongLong to create
the Python number should work well for huge sequences without
(visibly) slowing down the normal case.

koara · Oct 3, 2007

In Py2.6, I will mostly likely put in an automatic promotion to long
for both enumerate() and count(). It took a while to figure-out how
to do this without killing the performance for normal cases (ones used
in real programs, not examples contrived to say, "omg, see what
*could* happen").

Raymond

Thanks everybody for the reply and suggestions, I'm glad to see the
issues's already been discovered/discussed/almostresolved.

By the way, I do not consider my programs in any way 'unreal'.

Raymond Hettinger · Oct 3, 2007

Thanks everybody for the reply and suggestions, I'm glad to see the
issues's already been discovered/discussed/almostresolved.

The new code is checked-in. In Py2.6, enumerate() will no longer
raise an OverflowError and it will automatically shift from ints to
longs. Will check in something similar for itertools.count() when I
get a chance.

Raymond

Gabriel Genellina · Oct 5, 2007

En Wed said:
in python2.4, i read lines from a file with

for lineNum, line in enumerate(f): ...

However, lineNum soon overflows and starts counting backwards. How do
i force enumerate to return long integer?

(what kind of files are you using? enumerate overlows after more than two
billion lines... is that "soon" for you?)

I'm afraid neither iterate nor itertools.count will generate a long
integer; upgrading to Python 2.5 won't help. I think the only way is to
roll your own counter:

lineNum = 0
for line in f:
...
lineNum += 1

TF-IDF	1	Aug 19, 2021
[Q] raise exception with fake filename and linenumber	4	Apr 7, 2010
find, replace and save string in ascii file	5	Aug 23, 2006
No overflow in variables?	8	Jan 22, 2014
an enumerate question	8	Mar 20, 2007
csv read _csv.Error: line contains NULL byte	5	Mar 21, 2014
A quirk/gotcha of for i, x in enumerate(seq) when seq is empty	14	Feb 24, 2012
Translater + module + tkinter	1	Feb 16, 2023

enumerate overflow

crwe

Diez B. Roggisch

Tim Chase

Steve Holden

Tim Chase

Steve Holden

Tim Golden

Paul Rubin

Raymond Hettinger

Paul Rubin

Hrvoje Niksic

koara

Raymond Hettinger

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads