Most efficient way to "pre-grow" a list?

kj · Nov 6, 2009

In Perl one can assign a value to any element of an array, even to
ones corresponding to indices greater or equal than the length of
the array:

my @arr;
$arr[999] = 42;

perl grows the array as needed to accommodate this assignment. In
fact one common optimization in Perl is to "pre-grow" the array to
its final size, rather than having perl grow it piecemeal as required
by assignments like the one above:

my @arr;
$#arr = 999_999;

After assigning to $#arr (the last index of @arr) as shown above,
@arr has length 1,000,000, and all its elements are initialized to
undef.

In Python the most literal translation of the first code snippet
above triggers an IndexError exception:

arr = list()
arr[999] = 42

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

In fact, one would need to pre-grow the list sufficiently to be
able to make an assignment like this one. I.e. one needs the
equivalent of the second Perl snippet above.

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

TIA!

kynn

Paul Rubin · Nov 6, 2009

kj said:
arr[999] = 42

Click to expand...

Click to expand...

...
The best I can come up with is this:
arr = [None] * 1000000
Is this the most efficient way to achieve this result?

If you're talking about an array of ints, use the array module.
You might also look at numpy.

Jon Clements · Nov 6, 2009

[snip]

In fact, one would need to pre-grow the list sufficiently to be
able to make an assignment like this one. I.e. one needs the
equivalent of the second Perl snippet above.

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

That's a good as it gets I think.

If sparsely populated I might be tempted to use a dict (or maybe
defaultdict):

d = {999: 42, 10673: 123}
for idx in xrange(1000000): # Treat it as though it's a list of
1,000,000 items...
print 'index %d has a value of %d' % (idx, d.get(idx, None))

Efficiency completely untested.

Jon.

Andre Engels · Nov 6, 2009

In Perl one can assign a value to any element of an array, even to
ones corresponding to indices greater or equal than the length of
the array:

Â my @arr;
Â $arr[999] = 42;

perl grows the array as needed to accommodate this assignment. Â In
fact one common optimization in Perl is to "pre-grow" the array to
its final size, rather than having perl grow it piecemeal as required
by assignments like the one above:

Â my @arr;
Â $#arr = 999_999;

After assigning to $#arr (the last index of @arr) as shown above,
@arr has length 1,000,000, and all its elements are initialized to
undef.

In Python the most literal translation of the first code snippet
above triggers an IndexError exception:

arr = list()
arr[999] = 42

Click to expand...

Click to expand...

Traceback (most recent call last):
Â File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

In fact, one would need to pre-grow the list sufficiently to be
able to make an assignment like this one. Â I.e. one needs the
equivalent of the second Perl snippet above.

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

It depends - what do you want to do with it? My first hunch would be
to use a dictionary instead of a list, then the whole problem
disappears. If there is a reason you don't want to do that, what is
it?

Raymond Hettinger · Nov 6, 2009

[kj]

In fact, one would need to pre-grow the list sufficiently to be
able to make an assignment like this one. I.e. one needs the
equivalent of the second Perl snippet above.

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

Yes.

Raymond

Emily Rodgers · Nov 6, 2009

Andre Engels said:
arr = [None] * 1000000

Is this the most efficient way to achieve this result?

Click to expand...

It depends - what do you want to do with it? My first hunch would be
to use a dictionary instead of a list, then the whole problem
disappears. If there is a reason you don't want to do that, what is
it?

I second this. It might seem a sensible thing to do in perl, but I can't
imagine what you would actually want to do it for! Seems like an odd thing
to want to do!

kj · Nov 6, 2009

Andre Engels said:
Andre Engels said:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

Click to expand...

It depends - what do you want to do with it? My first hunch would be
to use a dictionary instead of a list, then the whole problem
disappears. If there is a reason you don't want to do that, what is
it?

Click to expand...

I second this. It might seem a sensible thing to do in perl, but I can't
imagine what you would actually want to do it for! Seems like an odd thing
to want to do!

As I said, this is considered an optimization, at least in Perl,
because it lets the interpreter allocate all the required memory
in one fell swoop, instead of having to reallocate it repeatedly
as the array grows. (Of course, like with all optimizations,
whether it's worth the bother is another question.)

Another situation where one may want to do this is if one needs to
initialize a non-sparse array in a non-sequential order, e.g. if
that's the way the data is initially received by the code. Of
course, there are many ways to skin such a cat; pre-allocating the
space and using direct list indexing is just one of them. I happen
to think it is a particularly straighforward one, but I realize that
others (you, Andre, etc.) may not agree.

kynn

gil_johnson · Nov 7, 2009

In Perl one can assign a value to any element of an array, even to
ones corresponding to indices greater or equal than the length of
the array:

my @arr;
$arr[999] = 42;

perl grows the array as needed to accommodate this assignment. In
fact one common optimization in Perl is to "pre-grow" the array to
its final size, rather than having perl grow it piecemeal as required
by assignments like the one above:

my @arr;
$#arr = 999_999;

After assigning to $#arr (the last index of @arr) as shown above,
@arr has length 1,000,000, and all its elements are initialized to
undef.

In Python the most literal translation of the first code snippet
above triggers an IndexError exception:

arr = list()
arr[999] = 42

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

In fact, one would need to pre-grow the list sufficiently to be
able to make an assignment like this one. I.e. one needs the
equivalent of the second Perl snippet above.

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

TIA!

kynn

I don't have the code with me, but for huge arrays, I have used
something like:

arr[0] = initializer
for i in range N:
arr.extend(arr)

Click to expand...

Click to expand...

This doubles the array every time through the loop, and you can add
the powers of 2 to get the desired result.
Gil

r · Nov 7, 2009

In Perl one can assign a value to any element of an array, even to
ones corresponding to indices greater or equal than the length of
the array:

my @arr;
$arr[999] = 42;

perl grows the array as needed to accommodate this assignment. In
fact one common optimization in Perl is to "pre-grow" the array to
its final size, rather than having perl grow it piecemeal as required
by assignments like the one above:

my @arr;
$#arr = 999_999;

After assigning to $#arr (the last index of @arr) as shown above,
@arr has length 1,000,000, and all its elements are initialized to
undef.

In Python the most literal translation of the first code snippet
above triggers an IndexError exception:

arr = list()
arr[999] = 42

Click to expand...

Click to expand...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list assignment index out of range

In fact, one would need to pre-grow the list sufficiently to be
able to make an assignment like this one. I.e. one needs the
equivalent of the second Perl snippet above.

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

TIA!

kynn

You mean sum'in like dis?

class PerlishList(list):
'''Hand holding list object for even the most demanding Perl
hacker'''
def __init__(self, dim=0):
list.__init__(self)
if dim:
self.__setitem__(dim, None)

def __setitem__(self, idx, v):
lenkeys = len(self)
sup = super(PerlishList, self)
if idx > lenkeys:
for idx in range(lenkeys, idx):
sup.append(None)
sup.__setitem__(idx, v)

def __getitem__(self, idx):
return self[idx]

l = PerlishList(3)
l.append('a')
l.append('b')
print l
l[10] = 10
print l

;-)

Steven D'Aprano · Nov 7, 2009

I don't have the code with me, but for huge arrays, I have used
something like:

arr[0] = initializer
for i in range N:
arr.extend(arr)

Click to expand...

Click to expand...

This doubles the array every time through the loop, and you can add the
powers of 2 to get the desired result. Gil

Why is it better to grow the list piecemeal instead of just allocating a
list the size you want in one go?

arr = [x]*size_wanted

Bruno Desthuilliers · Nov 7, 2009

kj a écrit :

As I said, this is considered an optimization, at least in Perl,
because it lets the interpreter allocate all the required memory
in one fell swoop, instead of having to reallocate it repeatedly
as the array grows.

IIRC, CPython has it's own way to optimize list growth.

(Of course, like with all optimizations,
whether it's worth the bother is another question.)

My very humble opinion is that unless you spot a bottleneck (that is,
you have real performance issues AND the profiler identified list growth
as the culprit), the answer is a clear and obvious NO.

Another situation where one may want to do this is if one needs to
initialize a non-sparse array in a non-sequential order,

Then use a dict.

Luis Alberto Zarrabeitia Gomez · Nov 7, 2009

Quoting Bruno Desthuilliers said:
Then use a dict.

Ok, he has a dict.

Now what? He needs a non-sparse array.

Andre Engels · Nov 7, 2009

Ok, he has a dict.

Now what? He needs a non-sparse array.

Let d be your dict.

Call the zeroeth place in your array d[0], the first d[1], the 10000th
d[100000].

Luis Alberto Zarrabeitia Gomez · Nov 7, 2009

Quoting Andre Engels said:
Ok, he has a dict.

Now what? He needs a non-sparse array.

Click to expand...

Let d be your dict.

Call the zeroeth place in your array d[0], the first d[1], the 10000th
d[100000].

Following that reasoning, we could get rid of lists and arrays altogether.

Here's why that wouldn't work:

for x,y in zip(d,other):
... do something ...

Yes, we could also ignore zip and just use range/xrange to iterate for the
indices...

Lists and dictionaries have different semantics. One thing is to argue that you
shouldn't be thinking on pre-growing a list for performance reasons before being
sure that it is a bottleneck, and a very different one is to argue that because
one operation (__setitem__) is the same with both structures, we should not use
lists for what may need, depending on the problem, list semantics.

¿Have you ever tried to read list/matrix that you know it is not sparse, but you
don't know the size, and it may not be in order? A "grow-able" array would just
be the right thing to use - currently I have to settle with either hacking
together my own grow-able array, or preloading the data into a dict, growing a
list with the [0]*size trick, and updating that list. Not hard, not worthy of a
PEP, but certainly not so easy to dismiss.

Terry Reedy · Nov 7, 2009

Steven said:
I don't have the code with me, but for huge arrays, I have used
something like:

arr[0] = initializer
for i in range N:
arr.extend(arr)

Click to expand...

This doubles the array every time through the loop, and you can add the
powers of 2 to get the desired result. Gil

Click to expand...

Why is it better to grow the list piecemeal instead of just allocating a
list the size you want in one go?

It isn't.

arr = [x]*size_wanted

Is what I would do.

Ivan Illarionov · Nov 7, 2009

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

It is the most efficient SAFE way to achieve this result.

In fact, there IS the more efficient way, but it's dangerous, unsafe,
unpythonic and plain evil:

-- Ivan

kj · Nov 8, 2009

Quoting Andre Engels <[email protected]>:

Ok, he has a dict.

Now what? He needs a non-sparse array.

Click to expand...

Let d be your dict.

Call the zeroeth place in your array d[0], the first d[1], the 10000th
d[100000].

Click to expand...

Following that reasoning, we could get rid of lists and arrays altogether.

Here's why that wouldn't work:

for x,y in zip(d,other):
... do something ...

Yes, we could also ignore zip and just use range/xrange to iterate for the
indices...

Lists and dictionaries have different semantics. One thing is to argue that you
shouldn't be thinking on pre-growing a list for performance reasons before being
sure that it is a bottleneck, and a very different one is to argue that because
one operation (__setitem__) is the same with both structures, we should not use
lists for what may need, depending on the problem, list semantics.

¿Have you ever tried to read list/matrix that you know it is not sparse, but you
don't know the size, and it may not be in order? A "grow-able" array would just
be the right thing to use - currently I have to settle with either hacking
together my own grow-able array, or preloading the data into a dict, growing a
list with the [0]*size trick, and updating that list. Not hard, not worthy of a
PEP, but certainly not so easy to dismiss.

Thanks. Well said.

Saludos,

kynn

sturlamolden · Nov 8, 2009

The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

Yes, but why would you want to? Appending to a Python list has
amortized O(1) complexity. I am not sure about Perl, but in MATLAB
arrays are preallocated because resize has complexity O(n), instead of
amortized O(1). You don't need to worry about that in Python. Python
lists are resized with empty slots at the end, in proportion to the
size of the list. On average, this has the same complexity as pre-
allocation.

sturlamolden · Nov 8, 2009

On 7 Nov said:
I don't have the code with me, but for huge arrays, I have used
something like:

arr[0] = initializer
for i in range N:
arr.extend(arr)

Click to expand...

Click to expand...

This doubles the array every time through the loop, and you can add
the powers of 2 to get the desired result.
Gil

You should really use append instead of extend. The above code is O
(N**2), with append it becomes O(N) on average.

sturlamolden · Nov 8, 2009

As I said, this is considered an optimization, at least in Perl,
because it lets the interpreter allocate all the required memory
in one fell swoop, instead of having to reallocate it repeatedly
as the array grows.

Python does not need to reallocate repeatedly as a list grows. That is
why it's called a 'list' and not an array.

There will be empty slots at the end of a list you can append to,
without reallocating. When the list is resized, it is reallocated with
even more of these. Thus the need to reallocate becomes more and more
rare as the list grows, and on average the complexity of appending is
just O(1).

What is the most efficient way to compare similar contents in two lists?	12	Jun 13, 2011
Most efficient way to evaluate the contents of a variable.	2	Jul 13, 2007
Fast Efficient way to transfer an object to another list	10	May 1, 2010
Efficient way to break up a list into two pieces	16	Feb 21, 2010
What is the most efficient way to test for False in a list?	20	Jul 9, 2007
most efficient way to get number of files in a directory	9	Jan 3, 2010
Most efficient algorithm for matching	36	Feb 13, 2007
The most efficient way to innitialize an array with an innitialization list	11	Oct 2, 2004

Most efficient way to "pre-grow" a list?

kj

Paul Rubin

Jon Clements

Andre Engels

Raymond Hettinger

Emily Rodgers

kj

gil_johnson

r

Steven D'Aprano

Bruno Desthuilliers

Luis Alberto Zarrabeitia Gomez

Andre Engels

Luis Alberto Zarrabeitia Gomez

Terry Reedy

Ivan Illarionov

kj

sturlamolden

sturlamolden

sturlamolden

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads