Creating Long Lists

Discussion in 'Python' started by Kelson Zawack, Feb 22, 2011.

  1. I have a large (10gb) data file for which I want to parse each line into
    an object and then append this object to a list for sorting and further
    processing. I have noticed however that as the length of the list
    increases the rate at which objects are added to it decreases
    dramatically. My first thought was that I was nearing the memory
    capacity of the machine and the decrease in performance was due to the
    os swapping things in and out of memory. When I looked at the memory
    usage this was not the case. My process was the only job running and
    was consuming 40gb of the the total 130gb and no swapping processes were
    running. To make sure there was not some problem with the rest of my
    code, or the servers file system, I ran my program again as it was but
    without the line that was appending items to the list and it completed
    without problem indicating that the decrease in performance is the
    result of some part of the process of appending to the list. Since
    other people have observed this problem as well
    (http://tek-tips.com/viewthread.cfm?qid=1096178&page=13,
    http://stackoverflow.com/questions/...n-list-append-becoming-progressively-slower-i)
    I did not bother to further analyze or benchmark it. Since the answers
    in the above forums do not seem very definitive I thought I would
    inquire here about what the reason for this decrease in performance is,
    and if there is a way, or another data structure, that would avoid this
    problem.
    Kelson Zawack, Feb 22, 2011
    #1
    1. Advertising

  2. Kelson Zawack

    alex23 Guest

    On Feb 22, 12:57 pm, Kelson Zawack <-star.edu.sg>
    wrote:
    > I did not bother to further analyze or benchmark it.  Since the answers
    > in the above forums do not seem very definitive  I thought  I would
    > inquire here about what the reason for this decrease in performance is,
    > and if there is a way, or another data structure, that would avoid this
    > problem.


    The first link is 6 years old and refers to Python 2.4. Unless you're
    using 2.4 you should probably ignore it.

    The first answer on the stackoverflow link was accepted by the poster
    as resolving his issue. Try disabling garbage collection.
    alex23, Feb 22, 2011
    #2
    1. Advertising

  3. Kelson Zawack

    John Bokma Guest

    alex23 <> writes:

    > On Feb 22, 12:57 pm, Kelson Zawack <-star.edu.sg>
    > wrote:
    >> I did not bother to further analyze or benchmark it.  Since the answers
    >> in the above forums do not seem very definitive  I thought  I would
    >> inquire here about what the reason for this decrease in performance is,
    >> and if there is a way, or another data structure, that would avoid this
    >> problem.

    >
    > The first link is 6 years old and refers to Python 2.4. Unless you're
    > using 2.4 you should probably ignore it.
    >
    > The first answer on the stackoverflow link was accepted by the poster
    > as resolving his issue. Try disabling garbage collection.


    I just read http://bugs.python.org/issue4074 which discusses a patch
    that has been included 2 years ago. So using a recent Python 2.x also
    doesn't have this issue?

    --
    John Bokma j3b

    Blog: http://johnbokma.com/ Facebook: http://www.facebook.com/j.j.j.bokma
    Freelance Perl & Python Development: http://castleamber.com/
    John Bokma, Feb 22, 2011
    #3
  4. The answer it turns out is the garbage collector. When I disable the
    garbage collector before the loop that loads the data into the list
    and then enable it after the loop the program runs without issue.
    This raises a question though, can the logic of the garbage collector
    be changed so that it is not triggered in instances like this were you
    really do want to put lots and lots of stuff in memory. Turning on
    and off the garbage collector is not a big deal, but it would
    obviously be nicer not to have to.
    Kelson Zawack, Feb 22, 2011
    #4
  5. I am using python 2.6.2, so it may no longer be a problem.

    I am open to using another data type, but the way I read the
    documentation array.array only supports numeric types, not arbitrary
    objects. I also tried playing around with numpy arrays, albeit for
    only a short time, and it seems that although they do support
    arbitrary objects they seem to be geared toward numbers as well and I
    found it cumbersome to manipulate objects with them. It could be
    though that if I understood them better they would work fine. Also do
    numpy arrays support sorting arbitrary objects, I only saw a method
    that sorts numbers.
    Kelson Zawack, Feb 22, 2011
    #5
  6. Kelson Zawack

    Terry Reedy Guest

    On 2/22/2011 4:40 AM, Kelson Zawack wrote:
    > The answer it turns out is the garbage collector. When I disable the
    > garbage collector before the loop that loads the data into the list
    > and then enable it after the loop the program runs without issue.
    > This raises a question though, can the logic of the garbage collector
    > be changed so that it is not triggered in instances like this were you
    > really do want to put lots and lots of stuff in memory. Turning on
    > and off the garbage collector is not a big deal, but it would
    > obviously be nicer not to have to.


    Heuristics, by their very nature, are not correct in all situations.

    --
    Terry Jan Reedy
    Terry Reedy, Feb 22, 2011
    #6
  7. Kelson Zawack

    Jorgen Grahn Guest

    On Tue, 2011-02-22, Ben Finney wrote:
    > Kelson Zawack <-star.edu.sg> writes:
    >
    >> I have a large (10gb) data file for which I want to parse each line
    >> into an object and then append this object to a list for sorting and
    >> further processing.

    >
    > What is the nature of the further processing?
    >
    > Does that further processing access the items sequentially? If so, they
    > don't all need to be in memory at once, and you can produce them with a
    > generator <URL:http://docs.python.org/glossary.html#term-generator>.


    He mentioned sorting them -- you need all of them for that.

    If that's the *only* such use, I'd experiment with writing them as
    sortable text to file, and run GNU sort (the Unix utility) on the file.
    It seems to have a clever file-backed sort algorithm.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
    Jorgen Grahn, Feb 23, 2011
    #7
  8. Kelson Zawack

    Tim Wintle Guest

    On Wed, 2011-02-23 at 13:57 +0000, Jorgen Grahn wrote:
    > If that's the *only* such use, I'd experiment with writing them as
    > sortable text to file, and run GNU sort (the Unix utility) on the file.
    > It seems to have a clever file-backed sort algorithm.


    +1 - and experiment with the different flags to sort (compression of
    intermediate results, intermediate batch size, etc)

    Tim
    Tim Wintle, Feb 23, 2011
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George Marsaglia

    Assigning unsigned long to unsigned long long

    George Marsaglia, Jul 8, 2003, in forum: C Programming
    Replies:
    1
    Views:
    662
    Eric Sosman
    Jul 8, 2003
  2. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    392
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
  3. Daniel Rudy

    unsigned long long int to long double

    Daniel Rudy, Sep 19, 2005, in forum: C Programming
    Replies:
    5
    Views:
    1,180
    Peter Shaggy Haywood
    Sep 20, 2005
  4. Mathieu Dutour

    long long and long

    Mathieu Dutour, Jul 17, 2007, in forum: C Programming
    Replies:
    4
    Views:
    463
    santosh
    Jul 24, 2007
  5. Bart C

    Use of Long and Long Long

    Bart C, Jan 9, 2008, in forum: C Programming
    Replies:
    27
    Views:
    784
    Peter Nilsson
    Jan 15, 2008
Loading...

Share This Page