memory problem with list creation

Discussion in 'Python' started by Allard Warrink, Jan 13, 2010.

  1. Within a python script I'm using a couple of different lists
    containing a large number of floats (+8M). The execution of this
    script fails because of an memory error (insufficient memory).
    I thought this was strange because I delete all lists that are not
    longer necessary directly and my workstation theoretically has more
    than enough memory to run the script.

    so I did some investigation on the memory use of the script. I found
    out that when i populated the lists with floats using a for ... in
    range() loop a lot of overhead memory is used and that this memory is
    not freed after populating the list and is also not freed after
    deleting the list.

    This way the memory keeps filling up after each newly populated list
    until the script crashes.


    I did a couple of tests and found that populating lists with range or
    xrange is responsible for the memory overhead.
    Does anybody know why this happens and if there's a way to avoid this
    memory problem?

    First the line(s) python code I executed.
    Then the memory usage of the process:
    Mem usage after creation/populating of big_list
    sys.getsizeof(big_list)
    Mem usage after deletion of big_list

    big_list = [0.0] * 2700*3250
    40
    35
    6

    big_list = [0.0 for i in xrange(2700*3250)]
    40
    36
    6

    big_list = [0.0 for i in range(2700*3250)]
    145
    36
    110

    big_list = [float(i) for i in xrange(2700*3250)]
    180
    36
    145

    big_list = [float(i) for i in range(2700*3250)]
    285
    36
    250

    big_list = [i for i in xrange(2700*3250)]
    145
    36
    110

    big_list = [i for i in range(2700*3250)]
    145
    36
    110

    big_list = []
    for i in range(2700*3250):
    big_list.append(float(i))
    285
    36
    250

    big_list = []
    for i in xrange(2700*3250):
    big_list.append(float(i))
    180
    36
    145
    Allard Warrink, Jan 13, 2010
    #1
    1. Advertising

  2. Allard Warrink, 13.01.2010 15:24:
    > I found out that when i populated the lists with floats using a for ... in
    > range() loop a lot of overhead memory is used


    Note that range() returns a list in Python 2.x. For iteration, use
    xrange(), or switch to Python 3 where range() returns an iterable.

    Stefan
    Stefan Behnel, Jan 13, 2010
    #2
    1. Advertising

  3. Allard Warrink, 13.01.2010 15:24:
    > so I did some investigation on the memory use of the script. I found
    > out that when i populated the lists with floats using a for ... in
    > range() loop a lot of overhead memory is used and that this memory is
    > not freed after populating the list and is also not freed after
    > deleting the list.


    You didn't say how you "investigated" the memory usage. Note that the
    Python interpreter does not necessarily free heap memory that it has
    allocated, even if it is not used anymore. Newly created objects will still
    end up in that memory area, so nothing is lost.

    Stefan
    Stefan Behnel, Jan 13, 2010
    #3
  4. En Wed, 13 Jan 2010 11:24:04 -0300, Allard Warrink
    <> escribió:

    > Within a python script I'm using a couple of different lists
    > containing a large number of floats (+8M). The execution of this
    > script fails because of an memory error (insufficient memory).
    > I thought this was strange because I delete all lists that are not
    > longer necessary directly and my workstation theoretically has more
    > than enough memory to run the script.
    >
    > so I did some investigation on the memory use of the script. I found
    > out that when i populated the lists with floats using a for ... in
    > range() loop a lot of overhead memory is used and that this memory is
    > not freed after populating the list and is also not freed after
    > deleting the list.
    >
    > This way the memory keeps filling up after each newly populated list
    > until the script crashes.


    After reading my comments below, please revise your testing and this
    conclusion.
    If you build the *same* list several times and the memory usage keeps
    growing, this may indicate a memory leak. But a peak memory consumption
    because of temporary objects is not enough evidence.

    > I did a couple of tests and found that populating lists with range or
    > xrange is responsible for the memory overhead.
    > Does anybody know why this happens and if there's a way to avoid this
    > memory problem?
    >
    > First the line(s) python code I executed.
    > Then the memory usage of the process:
    > Mem usage after creation/populating of big_list
    > sys.getsizeof(big_list)
    > Mem usage after deletion of big_list


    Note that sys.getsizeof(big_list) must be always the same - the list
    itself takes always the same space, it depends on the number of contained
    items alone (and in second place, its history). You didn't take into
    account the memory taken for the contained items themselves.

    > 1) big_list = [0.0] * 2700*3250


    This involves the objects 0.0, an intermediate list of size 2700, a couple
    integere and nothing more.

    > 2) big_list = [0.0 for i in xrange(2700*3250)]


    This involves creating an integer object representing every integer in the
    range, but most of them are quickly discarded.

    > 3) big_list = [0.0 for i in range(2700*3250)]


    This involves building a temporary list containing every integer in the
    range. All of them must be available simultaneously (to exist in the list).
    In all these three scenarios, the only "permanent" objects are a big list
    which holds several million references to the single float object 0.0; on
    my Windows build, 32 bits, this takes 35MB.

    > 4) big_list = [float(i) for i in xrange(2700*3250)]

    Like 2) above, but now the final list contains several million different
    objects. 175MB would be required on my PC: getsizeof(big_list) +
    len(big_list)*getsizeof(0.0)

    > 5) big_list = [float(i) for i in range(2700*3250)]

    Like 4), the final list requires more memory, and also like in 3), a
    temporary integer list is required.

    > 6) big_list = [i for i in xrange(2700*3250)]

    Same as 4). float objects are slightly bigger than integers so this one
    takes less memory.

    > 7) big_list = [i for i in range(2700*3250)]

    Compared with 6) this requires building a temporary list with all those
    integers, like 3) and 5)

    > 8)
    > big_list = []
    > for i in range(2700*3250):
    > big_list.append(float(i))
    > 285
    > 36
    > 250
    >
    > 9) same as 8) but using xrange.


    As above, range() requires building an intermediate list.

    In Python (CPython specifically) many types (like int and float) maintain
    a pool of unused, freed objects. And the memory manager maintains a pool
    of allocated memory blocks. If your program has a peak memory load and
    later frees most of the involved objects, memory may not always be
    returned to the OS - it may be kept available for Python to use it again.

    --
    Gabriel Genellina
    Gabriel Genellina, Jan 13, 2010
    #4
  5. On Wed, 13 Jan 2010 06:24:04 -0800, Allard Warrink wrote:

    > Within a python script I'm using a couple of different lists containing
    > a large number of floats (+8M). The execution of this script fails
    > because of an memory error (insufficient memory). I thought this was
    > strange because I delete all lists that are not longer necessary
    > directly and my workstation theoretically has more than enough memory to
    > run the script.


    Keep in mind that Python floats are rich objects, not C floats, and so
    take up more space: 16 bytes on a 32 bit system compared to typically 8
    bytes for a C float. (Both of these may vary on other hardware or
    operating systems.)

    Also keep in mind that your Python process may not have access to all
    your machine's memory -- some OSes default to relatively small per-
    process memory limits. If you are using a Unix or Linux, you may need to
    look at ulimit.



    > so I did some investigation on the memory use of the script. I found out
    > that when i populated the lists with floats using a for ... in range()
    > loop a lot of overhead memory is used and that this memory is not freed
    > after populating the list and is also not freed after deleting the list.


    I would be very, very, very surprised if the memory truly wasn't freed
    after deleting the lists. A memory leak of that magnitude is unlikely to
    have remained undetected until now. More likely you're either
    misdiagnosing the problem, or you have some sort of reference cycle.




    > This way the memory keeps filling up after each newly populated list
    > until the script crashes.


    Can you post us the smallest extract of your script that crashes?



    > I did a couple of tests and found that populating lists with range or
    > xrange is responsible for the memory overhead.


    I doubt it. Even using range with 8 million floats only wastes 35 MB or
    so. That's wasteful, but not excessively so.



    > Does anybody know why
    > this happens and if there's a way to avoid this memory problem?
    >
    > First the line(s) python code I executed. Then the memory usage of the
    > process: Mem usage after creation/populating of big_list
    > sys.getsizeof(big_list)
    > Mem usage after deletion of big_list
    >
    > big_list = [0.0] * 2700*3250
    > 40
    > 35
    > 6



    You don't specify what those three numbers are (the middle one is
    getsizeof the list, but the other two are unknown. How do you calculate
    memory usage? I don't believe that your memory usage is 6 bytes! Nor do I
    believe that getsizeof(big_list) returns 35 bytes!

    On my system:

    >>> x = [0.0] * 2700*3250
    >>> sys.getsizeof(x)

    35100032



    > big_list = [0.0 for i in xrange(2700*3250)]
    > 40
    > 36
    > 6


    This produces a lightweight xrange object, then wastefully iterates over
    it to produce a list made up of eight million instances of the float 0.0.
    The xrange object is then garbage collected automatically.


    > big_list = [0.0 for i in range(2700*3250)]
    > 145
    > 36
    > 110


    This produces a list containing the integers 0 through 8+ million, then
    wastefully iterates over it to produce a second list made up of eight
    million instances of the float 0.0, before garbage collecting the first
    list. So at its peak, you require 35100032 bytes for a pointless
    intermediate list, doubling the memory capacity needed to generate the
    list you actually want.



    > big_list = [float(i) for i in xrange(2700*3250)]
    > 180
    > 36
    > 145


    Again, the technique you are using does a pointless amount of extra work.
    The values in the xrange object are already floats, calling float on them
    just wastes time. And again, the memory usage you claim is utterly
    implausible.


    To really solve this problem, we need to see actual code that raises
    MemoryError. Otherwise we're just wasting time.



    --
    Steven
    Steven D'Aprano, Jan 14, 2010
    #5
  6. On Thu, 14 Jan 2010 02:03:52 +0000, Steven D'Aprano wrote:

    > Again, the technique you are using does a pointless amount of extra
    > work. The values in the xrange object are already floats, calling float
    > on them just wastes time.


    Er what?

    Sorry, please ignore that. This is completely untrue -- xrange produces
    ints, not floats.

    Sorry for the noise, I don't know what I was thinking!




    --
    Steven
    Steven D'Aprano, Jan 14, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. srinukasam

    memory creation with record

    srinukasam, Jun 30, 2005, in forum: VHDL
    Replies:
    1
    Views:
    571
    Jonathan Bromley
    Jun 30, 2005
  2. Brendan Guild

    Java memory usage and object creation

    Brendan Guild, Aug 13, 2003, in forum: Java
    Replies:
    2
    Views:
    621
    Roedy Green
    Aug 13, 2003
  3. Joseph Dionne
    Replies:
    38
    Views:
    15,290
    Michael Borgwardt
    Jul 9, 2004
  4. Replies:
    3
    Views:
    383
    Daniel T.
    Oct 12, 2006
  5. OW Ghim Siong
    Replies:
    2
    Views:
    382
    Peter Otten
    Nov 30, 2010
Loading...

Share This Page