enumerate overflow

Discussion in 'Python' started by crwe@post.cz, Oct 3, 2007.

  1. Guest

    Hello all,

    in python2.4, i read lines from a file with

    for lineNum, line in enumerate(f): ...

    However, lineNum soon overflows and starts counting backwards. How do
    i force enumerate to return long integer?

    Cheers.
    , Oct 3, 2007
    #1
    1. Advertising

  2. schrieb:
    > Hello all,
    >
    > in python2.4, i read lines from a file with
    >
    > for lineNum, line in enumerate(f): ...
    >
    > However, lineNum soon overflows and starts counting backwards. How do
    > i force enumerate to return long integer?


    Most probably you can't, because it is a C-written function I presume.

    But as python 2.4 has generators, it's ease to create an enumerate yourself:


    def lenumerate(f):
    i = 0
    for line in f:
    yield i, line
    i += 1

    Diez
    Diez B. Roggisch, Oct 3, 2007
    #2
    1. Advertising

  3. Tim Chase Guest

    >> in python2.4, i read lines from a file with
    >>
    >> for lineNum, line in enumerate(f): ...
    >>
    >> However, lineNum soon overflows and starts counting backwards. How do
    >> i force enumerate to return long integer?

    >
    > Most probably you can't, because it is a C-written function I presume.
    >
    > But as python 2.4 has generators, it's ease to create an enumerate yourself:
    >
    >
    > def lenumerate(f):
    > i = 0
    > for line in f:
    > yield i, line
    > i += 1



    I'd consider this a bug: either in the implementation of
    enumerate(), or in the documentation

    http://docs.python.org/lib/built-in-funcs.html#l2h-24

    which fails to mention such arbitrary limitations. The
    documentation describes what you create as an lenumerate() function.

    Most likely, if one doesn't want to change the implementation,
    one should update the documentation for enumerate() to include a
    caveat like xrange() has

    http://docs.python.org/lib/built-in-funcs.html#l2h-80

    """
    Note: xrange() is intended to be simple and fast. Implementations
    may impose restrictions to achieve this. The C implementation of
    Python restricts all arguments to native C longs ("short" Python
    integers), and also requires that the number of elements fit in a
    native C long.
    """

    While yes, it's easy enough to create the above lenumerate
    generator (just as it's only slightly more work to create an
    lxrange function), it would be good if the docs let you know that
    you might need to create such a function

    -tkc
    Tim Chase, Oct 3, 2007
    #3
  4. Steve Holden Guest

    wrote:
    > Hello all,
    >
    > in python2.4, i read lines from a file with
    >
    > for lineNum, line in enumerate(f): ...
    >
    > However, lineNum soon overflows and starts counting backwards. How do
    > i force enumerate to return long integer?
    >

    Just how "soon" exactly do you read sys.maxint lines from a file? I
    should have thought that it would take a significant amount of time to
    read 2,147,483,647 lines ...

    But it is true that Python 2.5 uses an enumobject representation that
    limits the index to a (C) long:

    typedef struct {
    PyObject_HEAD
    long en_index; /* current index of enumeration */
    PyObject* en_sit; /* secondary iterator of enumeration */
    PyObject* en_result; /* result tuple */
    } enumobject;

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden

    Sorry, the dog ate my .sigline
    Steve Holden, Oct 3, 2007
    #4
  5. Tim Chase Guest

    >> for lineNum, line in enumerate(f): ...
    >>
    >> However, lineNum soon overflows and starts counting backwards. How do
    >> i force enumerate to return long integer?
    >>

    > Just how "soon" exactly do you read sys.maxint lines from a file? I
    > should have thought that it would take a significant amount of time to
    > read 2,147,483,647 lines ...


    A modestly (but not overwhelmingly) long time:

    (defining our own xrange-ish generator that can handle things
    larger than longs)

    >>> def xxrange(x):

    .... i = 0
    .... while i < x:
    .... yield i
    .... i += 1
    ....
    >>> for i,j in enumerate(xxrange(2**33)): assert i==j

    ....
    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    AssertionError


    It took me about an 60-90 minutes to hit the assertion on a
    dual-core 2.8ghz machine under otherwise-light-load. If
    batch-processing lengthy log files or other large data such as
    genetic data, it's entirely possible to hit this limit as the OP
    discovered.

    -tkc
    Tim Chase, Oct 3, 2007
    #5
  6. Steve Holden Guest

    Tim Chase wrote:
    >>> for lineNum, line in enumerate(f): ...
    >>>
    >>> However, lineNum soon overflows and starts counting backwards. How do
    >>> i force enumerate to return long integer?
    >>>

    >> Just how "soon" exactly do you read sys.maxint lines from a file? I
    >> should have thought that it would take a significant amount of time to
    >> read 2,147,483,647 lines ...

    >
    > A modestly (but not overwhelmingly) long time:
    >
    > (defining our own xrange-ish generator that can handle things larger
    > than longs)
    >
    > >>> def xxrange(x):

    > ... i = 0
    > ... while i < x:
    > ... yield i
    > ... i += 1
    > ...
    > >>> for i,j in enumerate(xxrange(2**33)): assert i==j

    > ...
    > Traceback (most recent call last):
    > File "<stdin>", line 1, in ?
    > AssertionError
    >
    >
    > It took me about an 60-90 minutes to hit the assertion on a dual-core
    > 2.8ghz machine under otherwise-light-load. If batch-processing lengthy
    > log files or other large data such as genetic data, it's entirely
    > possible to hit this limit as the OP discovered.
    >

    I wouldn't dream of suggesting it's impossible. I just regard "soon" as
    less than an hour in commuter's terms, I suppose.

    regards
    Steve
    --
    Steve Holden +1 571 484 6266 +1 800 494 3119
    Holden Web LLC/Ltd http://www.holdenweb.com
    Skype: holdenweb http://del.icio.us/steve.holden

    Sorry, the dog ate my .sigline
    Steve Holden, Oct 3, 2007
    #6
  7. Tim Golden Guest

    Steve Holden wrote:
    > I wouldn't dream of suggesting it's impossible.
    > I just regard "soon" as less than an hour in
    > commuter's terms, I suppose.


    Sadly, speaking as a Londoner, an hour is indeed
    "soon" in commuter terms.

    TJG
    Tim Golden, Oct 3, 2007
    #7
  8. Paul Rubin Guest

    Tim Chase <> writes:
    > I'd consider this a bug: either in the implementation of enumerate(),
    > or in the documentation
    >
    > http://docs.python.org/lib/built-in-funcs.html#l2h-24


    2.5 has a patch that causes enumerate() and count() to raise overflow
    if the count wraps around, which is still bad but at least beats
    having the number suddenly go negative. See:

    http://bugs.python.org/issue1512504 and
    http://mail.python.org/pipermail/python-checkins/2007-February/058486.html

    also:

    http://bugs.python.org/issue1326277

    I hope in 3.0 there's a real fix, i.e. the count should promote to
    long. The rationale for leaving the bug in the library is just silly.
    2**32 is not that big a number if we're talking about a language and
    runtime system supposedly good for writing servers that stay up
    continuously for years.
    Paul Rubin, Oct 3, 2007
    #8
  9. [Paul Rubin]
    > I hope in 3.0 there's a real fix, i.e. the count should promote to
    > long.


    In Py2.6, I will mostly likely put in an automatic promotion to long
    for both enumerate() and count(). It took a while to figure-out how
    to do this without killing the performance for normal cases (ones used
    in real programs, not examples contrived to say, "omg, see what
    *could* happen").


    Raymond
    Raymond Hettinger, Oct 3, 2007
    #9
  10. Paul Rubin Guest

    Raymond Hettinger <> writes:
    > In Py2.6, I will mostly likely put in an automatic promotion to long
    > for both enumerate() and count(). It took a while to figure-out how
    > to do this without killing the performance for normal cases (ones used
    > in real programs, not examples contrived to say, "omg, see what
    > *could* happen").


    Great, this is good to hear. I think it's ok if the enumeration slows
    down after fixnum overflow is reached. So it's just a matter of
    replacing the overflow signal with consing up a long. The fixnum case
    would be the same as it is now. To be fancy, the count could be
    stored in two C ints (or a gcc long long) so it would go up to 64 bits
    but I don't think it's worth it, especially for itertools.count which
    should be able to take arbitrary (i.e. larger than 64 bits) initializers.

    As for real programs, well, the Y2038 bug is slowly creeping up on us.
    That's when Unix timestamps overflow a signed 32-bit counter. It's
    already caused an actual system failure, in 2006:

    http://worsethanfailure.com/Articles/The_Harbinger_of_the_Epoch_.aspx

    Really, the whole idea of int/long unification is so we can stop
    worrying about "omg, that could happen". We want to write programs
    without special consideration or "omg" about those possibilities, and
    still have them keep working smoothly if that DOES happen. Just about
    all of us these days have 100's of GB's or more of disk space on our
    systems, and files with over 2**32 bytes or lines are not even
    slightly unreasonable. We shouldn't have to write special generators
    to deal with them, the library should instead just do the right thing.
    Paul Rubin, Oct 3, 2007
    #10
  11. Raymond Hettinger <> writes:

    > [Paul Rubin]
    >> I hope in 3.0 there's a real fix, i.e. the count should promote to
    >> long.

    >
    > In Py2.6, I will mostly likely put in an automatic promotion to long
    > for both enumerate() and count(). It took a while to figure-out how
    > to do this without killing the performance for normal cases (ones
    > used in real programs, not examples contrived to say, "omg, see what
    > *could* happen").


    Using PY_LONG_LONG for the counter, and PyLong_FromLongLong to create
    the Python number should work well for huge sequences without
    (visibly) slowing down the normal case.
    Hrvoje Niksic, Oct 3, 2007
    #11
  12. koara Guest

    On Oct 3, 7:22 pm, Raymond Hettinger <> wrote:
    > In Py2.6, I will mostly likely put in an automatic promotion to long
    > for both enumerate() and count(). It took a while to figure-out how
    > to do this without killing the performance for normal cases (ones used
    > in real programs, not examples contrived to say, "omg, see what
    > *could* happen").
    >
    > Raymond



    Thanks everybody for the reply and suggestions, I'm glad to see the
    issues's already been discovered/discussed/almostresolved.

    By the way, I do not consider my programs in any way 'unreal'.
    koara, Oct 3, 2007
    #12
  13. On Oct 3, 12:52 pm, koara <> wrote:
    > Thanks everybody for the reply and suggestions, I'm glad to see the
    > issues's already been discovered/discussed/almostresolved.


    The new code is checked-in. In Py2.6, enumerate() will no longer
    raise an OverflowError and it will automatically shift from ints to
    longs. Will check in something similar for itertools.count() when I
    get a chance.


    Raymond
    Raymond Hettinger, Oct 3, 2007
    #13
  14. En Wed, 03 Oct 2007 08:46:31 -0300, <> escribi�:

    > in python2.4, i read lines from a file with
    >
    > for lineNum, line in enumerate(f): ...
    >
    > However, lineNum soon overflows and starts counting backwards. How do
    > i force enumerate to return long integer?


    (what kind of files are you using? enumerate overlows after more than two
    billion lines... is that "soon" for you?)

    I'm afraid neither iterate nor itertools.count will generate a long
    integer; upgrading to Python 2.5 won't help. I think the only way is to
    roll your own counter:

    lineNum = 0
    for line in f:
    ...
    lineNum += 1

    --
    Gabriel Genellina
    Gabriel Genellina, Oct 5, 2007
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. poi

    Enumerate Roles?

    poi, Nov 14, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    497
    Oliver
    Nov 15, 2003
  2. shiv
    Replies:
    3
    Views:
    6,780
    S. Justin Gengo
    Dec 3, 2003
  3. localhost

    Enumerate Control Attributes?

    localhost, Dec 16, 2003, in forum: ASP .Net
    Replies:
    7
    Views:
    609
    localhost
    Dec 22, 2003
  4. Arsen Vladimirskiy

    How to enumerate sessions?

    Arsen Vladimirskiy, Jan 9, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    3,415
    David Browne
    Jan 9, 2004
  5. Mad Scientist Jr
    Replies:
    1
    Views:
    5,419
Loading...

Share This Page