Memory pre-allocation for large lists

A

Avi Kak

Hello:

Is it possible in Python to define an empty list
of a predetermined size?

To illustrate my question further, I can do the
following in Perl

my @arr = ();
$#arr = 999;

that wold cause @arr to become an empty array
of size 1000. This array could be used to
store 1000 elements without any further memory
allocation for the array itself.

Can the same be done in Python?

Avi Kak
(e-mail address removed)
 
A

Aahz

Is it possible in Python to define an empty list of a predetermined
size?

To illustrate my question further, I can do the following in Perl

my @arr = ();
$#arr = 999;

that wold cause @arr to become an empty array of size 1000. This
array could be used to store 1000 elements without any further memory
allocation for the array itself.

Can the same be done in Python?

Can, yes. But why? Python uses an allocation mechanism for lists that
results in constant amortized time; there's really no reason to
pre-allocate.
 
J

Josiah Carlson

Can, yes. But why? Python uses an allocation mechanism for lists that
results in constant amortized time; there's really no reason to
pre-allocate.

It is a bit faster to pre-allocate, and sometimes you know how big the
sequence is going to be beforehand.

- Josiah
 
E

Erik Max Francis

Aahz said:
Can, yes. But why? Python uses an allocation mechanism for lists
that
results in constant amortized time; there's really no reason to
pre-allocate.

If you know the size of the (large) array in advance, it's completely
legitimate to preallocate the array to make the constant time factor.
 
J

Jeremy Fincher

Can, yes. But why? Python uses an allocation mechanism for lists that
results in constant amortized time; there's really no reason to
pre-allocate.

Python's allocation scheme for lists may be constant amortized time,
but it's *very* conservative and thus has a significant constant
factor. For example:

functor% python /usr/lib/python2.3/timeit.py 'L = [None]*1000' 'for _
in xrange(1000): pass'
1000 loops, best of 3: 248 usec per loop
functor% python /usr/lib/python2.3/timeit.py 'L = []' 'for _ in
xrange(1000): L.append(None)'
1000 loops, best of 3: 1.88e+03 usec per loop

So there's a factor of 7.5 difference in speed there. I doubt it
matters, but it at least lends some credibility to the idea that
preallocation might be useful.

Jeremy
 
P

Peter Otten

Jeremy said:
functor% python /usr/lib/python2.3/timeit.py 'L = [None]*1000' 'for _
in xrange(1000): pass'
1000 loops, best of 3: 248 usec per loop
functor% python /usr/lib/python2.3/timeit.py 'L = []' 'for _ in
xrange(1000): L.append(None)'
1000 loops, best of 3: 1.88e+03 usec per loop

So there's a factor of 7.5 difference in speed there. I doubt it
matters, but it at least lends some credibility to the idea that
preallocation might be useful.

In the real world, (some of) the default values have to be overwritten,
which reduces the speed advantage to a factor of 2:

$ timeit.py 'L=[None]*1000' 'for i in xrange(1000): L = None'
1000 loops, best of 3: 338 usec per loop
$ timeit.py 'L=[]' 'for i in xrange(1000): L.append(None)'
1000 loops, best of 3: 660 usec per loop

Peter
 
P

Paul Prescod

Aahz said:
...
Can, yes. But why? Python uses an allocation mechanism for lists that
results in constant amortized time; there's really no reason to
pre-allocate.

What if you just want to use index assignment without worrying about
indexing past the part of the list that hasn't been allocated yet?

Nevertheless, your question is legitimate. You seldom have to
pre-allocate lists in Python.

Paul Prescod
 
J

Jeff Epler

Jeremy said:
functor% python /usr/lib/python2.3/timeit.py 'L = [None]*1000' 'for _
in xrange(1000): pass'
1000 loops, best of 3: 248 usec per loop
functor% python /usr/lib/python2.3/timeit.py 'L = []' 'for _ in
xrange(1000): L.append(None)'
1000 loops, best of 3: 1.88e+03 usec per loop

So there's a factor of 7.5 difference in speed there. I doubt it
matters, but it at least lends some credibility to the idea that
preallocation might be useful.

In the real world, (some of) the default values have to be overwritten,
which reduces the speed advantage to a factor of 2:

$ timeit.py 'L=[None]*1000' 'for i in xrange(1000): L = None'
1000 loops, best of 3: 338 usec per loop
$ timeit.py 'L=[]' 'for i in xrange(1000): L.append(None)'
1000 loops, best of 3: 660 usec per loop


But what you're measuring here is function-call overhead.

# Item assignment
$ timeit.py 'L=[None]*1000' 'for i in xrange(1000): L = None'
1000 loops, best of 3: 1.58 msec per loop

# Method resolution and function call with 1 parameter
$ timeit.py 'L=[]' 'for i in xrange(1000): L.append(None)'
100 loops, best of 3: 2.99 msec per loop

# One-at-a-time insertion with no function call overhead
$ timeit.py '[None for i in xrange(1000)]'
1000 loops, best of 3: 1.9 msec per loop

# Method resolution and function call with 2 parameters
$ timeit.py 'L=[None]*1000' 'for i in xrange(1000): L.__setitem__(i, None)'
100 loops, best of 3: 4.39 msec per loop

# Method resolution and function call with 1 parameter
$ timeit.py 'L=[None]*1000' 'for i in xrange(1000): L.__getitem__(i)'
100 loops, best of 3: 3.28 msec per loop

Jeff
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top