Looking for info on Python's memory allocation

Steven D'Aprano · Oct 11, 2005

Can somebody help me please? I've spent a fruitless
hour googling with no luck.

I'm discussing memory allocation techniques with
somebody, and I'm trying to find a quote from -- I
think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic
terms, he says that every time you try to append to a
list that is already full, Python doubles the size of
the list. This wastes no more than 50% of the memory
needed for that list, but has various advantages -- and
I'm damned if I can remember exactly what those
advantages were.

Can anyone point me in the right direction?

Thanks,

R Toop · Oct 11, 2005

http://mail.python.org/pipermail/python-list/2003-December/198141.html

wherein Tim gently corrects my brash guess that Python lists are
pointer-linked.
The example's linearly-constructed list is allocated by doubling storage,
copying & freeing (cf realloc).
The result that the process virtual memory is twice the size of the list,
more or less, with the freed predecessor chunks on the process heap, but not
released to the operating system.
This only SEEMS to use as much memory as pointer-linked elements would do.
Hope this URL helps.

-Bob

jepler · Oct 11, 2005

While there may be a discussion somewhere in the archives,
the comments in listobject.c are enlightening too.

/* Ensure ob_item has room for at least newsize elements, and set
* ob_size to newsize. If newsize > ob_size on entry, the content
* of the new slots at exit is undefined heap trash; it's the caller's
* responsiblity to overwrite them with sane values.
* The number of allocated elements may grow, shrink, or stay the same.
* Failure is impossible if newsize <= self.allocated on entry, although
* that partly relies on an assumption that the system realloc() never
* fails when passed a number of bytes <= the number of bytes last
* allocated (the C standard doesn't guarantee this, but it's hard to
* imagine a realloc implementation where it wouldn't be true).
* Note that self->ob_item may change, and even if newsize is less
* than ob_size on entry.
*/
[...]
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
*/
"linear-time amortized behavior" (i.e., that list.append() is O(1) on average)
is the important characteristic you're probably looking for.

CVS source for listobject.c can be seen here:
http://cvs.sourceforge.net/viewcvs....rc/Objects/listobject.c?rev=2.227&view=markup

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFDSysUJd01MZaTXX0RAlk1AJkB3mVHggpOkT0bmAf7Zmg4jXOtLgCeIVGc
UxQk5/QcAyi7x53FfJhPRD8=
=Ea6T
-----END PGP SIGNATURE-----

Sybren Stuvel · Oct 11, 2005

Steven D'Aprano enlightened us with:

he says that every time you try to append to a list that is already
full, Python doubles the size of the list. This wastes no more than
50% of the memory needed for that list, but has various advantages
-- and I'm damned if I can remember exactly what those advantages
were.

Allocating extra memory usually requires the following steps:

- Allocate more memory
- Copy all data from old to new memory
- Deallocate old memory

That second step requires N actions if there are N data items in the
list. If you add one 'block' of memory to the list for each item you
add, you waste no memory, but every item added requires N steps to
copy all the old items. That means that after you've added N items,
N**2 steps have been taken. That's very bad for performance.

If, on the other hand, you double the memory every time you run out,
you have to copy much less data, and in the end it turns out you need
roughly N steps to add N items to the list. That's a lot better, isn't
it?

Sybren

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?= · Oct 11, 2005

Sybren said:
Steven D'Aprano enlightened us with:

If, on the other hand, you double the memory every time you run out,
you have to copy much less data, and in the end it turns out you need
roughly N steps to add N items to the list. That's a lot better, isn't
it?

This begs a different question along the same lines.

If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s = iterable.next()

Peter Otten · Oct 11, 2005

Lasse said:
If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s = iterable.next()

You can easily answer the speed aspect of your question using the timeit
module:

~ $ python2.4 -m timeit -s'iterable=range(1000)' '[k for k in iterable]'
10000 loops, best of 3: 111 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 's = [0]*len(iterable); it
= iter(iterable)' 'for i in xrange(len(iterable)): s = it.next()'
1000 loops, best of 3: 513 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 's = [0]*len(iterable)'
'for i, v in enumerate(iterable): s = v'
1000 loops, best of 3: 269 usec per loop

~ $ python2.4 -m timeit -s'iterable=range(1000)' 'list(iterable)'
100000 loops, best of 3: 7.33 usec per loop

Peter

Steven D'Aprano · Oct 11, 2005

This begs a different question along the same lines.

Er, no it doesn't. "Begs the question" does _not_ mean "asks the question"
or "suggests the question". It means "assumes the truth of that which
needs to be proven".

http://en.wikipedia.org/wiki/Begging_the_question
http://www.worldwidewords.org/qa/qa-beg1.htm

(Both of these sources are far more forgiving of the modern mis-usage than
I am. Obviously.)

If I have a generator or other iterable producing a vast number of
items, and use it like this:

s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s = iterable.next()

Faster? Maybe. Only testing can tell -- but I doubt it. But as for less
memory, look at the two situations.

In the first, you create a list of N objects.

In the second, you end up with the same list of N objects, plus an xrange
object, which may be bigger or smaller than an ordinary list of N
integers, depending on how large N is.

So it won't use *less* memory -- at best, it will use just slightly more.

Is there a way from within Python to find out how much memory a single
object uses, and how much memory Python is using in total?

Alex Martelli · Oct 11, 2005

Steven D'Aprano said:
s = [k for k in iterable]

if I know beforehand how many items iterable would possibly yield, would
a construct like this be faster and "use" less memory?

s = [0] * len(iterable)
for i in xrange(len(iterable)):
s = iterable.next()

Click to expand...

Faster? Maybe. Only testing can tell -- but I doubt it. But as for less

Testing, of course, is trivially easy:

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' 'L=[x for x in i]'
100000 loops, best of 3: 6.65 usec per loop

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' 'L=list(i)'
100000 loops, best of 3: 5.26 usec per loop

So, using list() instead of that silly list comprehension does have an
easily measurable advantage. To check the complicated option...:

Helen:/tmp alex$ python -mtimeit -s'i=xrange(7)' '
s = [0]*7
u = iter(i)
for x in xrange(7):
s[x] = u.next()
'
10000 loops, best of 3: 24.7 usec per loop

So, the "advantage" of all the complications is to slow down execution
by about four times -- net of all the juicy possibilities for confusion
and errors (there is no common Python type on which you can both call
len(...) AND the .next() method, for example -- a combination which
really makes no sense).

*SIMPLICITY* is the keyword here -- list(...) is by far simplest, and
almost five times faster than that baroque gyrations above...

Alex

Fredrik Lundh · Oct 11, 2005

Alex said:
(there is no common Python type on which you can both call
len(...) AND the .next() method, for example -- a combination
which really makes no sense).

L = [1, 2, 3]
len(L) 3
I = iter(L)
I

Click to expand...

len(I) 3
I.next() 1
len(I) 2
I.next() 2
len(I) 1
I.next() 3
len(I) 0
I.next()

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

(it's probably not a good idea to rely on this behaviour...)

</F>

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?= · Oct 12, 2005

Fredrik said:
3

Click to expand...

[...]
(it's probably not a good idea to rely on this behaviour...)

I believe this has been classified as a bug in Python 2.4, which
will be undone in Python 2.5.

Regards,
Martin

Edvard Majakari · Nov 1, 2005

Steven D'Aprano said:
I'm discussing memory allocation techniques with somebody, and I'm trying to
find a quote from -- I think -- Tim Peters where he discusses the way Python
allocates memory when you append to lists. In basic terms, he says that every
time you try to append to a list that is already full, Python doubles the size
of the list. This wastes no more than 50% of the memory needed for that list,
but has various advantages -- and I'm damned if I can remember exactly what
those advantages were.

That method of allocating memory is not used only in Python. The idea is that
if you double the amount of memory always allocated, the interval between
allocations grows exponentially (assuming storage usage grows in linear
manner).

Memory allocation is often very expensive, because the os works often has to
seek for large enough free block to allocate for application. What is even
more expensive is joining of free blocks which happens every now and then.

I guess you could do the same by tripling memory usage every time you need
more memory. This would reduce number of allocations needed even more. But the
problem is you'd waste even more memory - 2/3 actually. So, doubling the size
of chunks is used, and the technique is quite common.

Hooking into Python's memory management	2	May 4, 2011
Memory pre-allocation for large lists	9	Feb 18, 2004
strings, arrays, pointers and dynamic memory allocation	5	Dec 24, 2004
Detecting memory leaks	2	Nov 21, 2004
print header for output	0	Jun 19, 2011
Skipping on memory in Python classes	2	Aug 7, 2003
python-dev Summary for 2005-01-16 through 2005-01-31	7	Mar 1, 2005
python-dev Summary for 2005-04-01 through 2005-04-15	1	Apr 21, 2005

Looking for info on Python's memory allocation

Steven D'Aprano

R Toop

jepler

Sybren Stuvel

=?ISO-8859-1?Q?Lasse_V=E5gs=E6ther_Karlsen?=

Peter Otten

Steven D'Aprano

Alex Martelli

Fredrik Lundh

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Edvard Majakari

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads