Most efficient way to "pre-grow" a list?

C

Carl Banks

I don't have the code with me, but for huge arrays, I have used
something like:
arr[0] = initializer
for i in range N:
     arr.extend(arr)
This doubles the array every time through the loop, and you can add
the powers of 2 to get the desired result.
Gil

You should really use append instead of extend. The above code is O
(N**2), with append it becomes O(N) on average.

I think the top one is O(N log N), and I'm suspicious that it's even
possible to grow a list in less than O(N log N) time without knowing
the final size in advance. Citation? Futhermore why would it matter
to use extend instead of append; I'd assume extend uses the same
growth algorithm. (Although in this case since the extend doubles the
size of the list it most likely reallocates after each call.)

[None]*N is linear time and is better than growing the list using
append or extend.


Carl Banks
 
E

exarkun

I don't have the code with me, but for huge arrays, I have used
something like:
arr[0] = initializer
for i in range N:
     arr.extend(arr)
This doubles the array every time through the loop, and you can add
the powers of 2 to get the desired result.
Gil

You should really use append instead of extend. The above code is O
(N**2), with append it becomes O(N) on average.

I think the top one is O(N log N), and I'm suspicious that it's even
possible to grow a list in less than O(N log N) time without knowing
the final size in advance. Citation? Futhermore why would it matter
to use extend instead of append; I'd assume extend uses the same
growth algorithm. (Although in this case since the extend doubles the
size of the list it most likely reallocates after each call.)

[None]*N is linear time and is better than growing the list using
append or extend.

The wikipedia page for http://en.wikipedia.org/wiki/Amortized_analysis
conveniently uses exactly this example to explain the concept of
amortized costs.

Jean-Paul
 
P

Paul Rudin

kj said:
The best I can come up with is this:

arr = [None] * 1000000

Is this the most efficient way to achieve this result?

Depends on what you take as given. You can do it with ctypes more
efficiently, but you can also shoot yourself in the foot.

Another possibility is to use a numpy array.
 
G

gil_johnson

arr[0] = initializer
for i in range N:
     arr.extend(arr)

This doubles the array every time through the loop, and you can add
the powers of 2 to get the desired result.
Gil

To all who asked about my previous post,

I'm sorry I posted such a cryptic note before, I should have waited
until I
had the code available.

I am still new to Python, and I couldn't find a way to create an array
of
size N, with each member initialized to a given value. If there is
one,
please let me know.

I think Paul Rubin and Paul Rudin are right, if they really are 2
different
people, an array is a better solution if you're dealing with integers.
They
both mention numpy, which I know nothing about, or the array module.

kj, what *are* you going to do with this list/array?
As others have pointed out, there are differences between lists,
arrays, and
dictionaries.

The problem I was solving was this: I wanted an array of 32-bit
integers to
be used as a bit array, and I wanted it initialized with all bits set,
that
is, each member of the array had to be set to 4294967295. Of course,
you
could set your initializer to 0, or any other 32-bit number.

Originally I found that the doubling method I wrote about before was a
LOT
faster than appending the elements one at a time, and tonight I tried
the
"list = initializer * N" method. Running the code below, the doubling
method is still fastest, at least on my system.

Of course, as long as you avoid the 'one at a time' method, we're
talking
about fractions of a second, even for arrays that I think are huge,
like
the 536,870,912 byte beastie below.

Code:
# Written in Python 3.x

import array
import time

#* * * * * * * * * * * * * * * * * * * * * *
# Doubling method, run time = 0.413938045502

t0 = time.time()

newArray = array.array('I')                   # 32-bit unsigned
integers

newArray.append(4294967295)

for i in range(27):                        # 2**27 integers, 2**29
bytes
    newArray.extend(newArray)

print(time.time() - t0)

print(newArray[134217727])          # confirm array is fully
initialized

#* * * * * * * * * * * * * * * * * * * * * *
# One at a time, run time = 28.5479729176

t0 = time.time()

newArray2 = array.array('I')

for i in range(134217728):                        # the same size as
above
    newArray2.append(4294967295)

print(time.time() - t0)

print(newArray2[134217727])                      # confirm array

#* * * * * * * * * * * * * * * * * * * * * *
# List with "*", run time = 1.06160402298

t0 = time.time()

newList = [4294967295] * 134217728

print(time.time() - t0)

print(newList[134217727])                      # confirm list

If, instead of 134,217,728 integers, I want something different, like
100,000,000, the method I use is:

Code:
#* * * * * * * * * * * * * * * * * * * * * *
# Not a power of 2, run time = 0.752086162567

t0 = time.time()

newArray = array.array('I')

tempArray = array.array('I')

tempArray.append(4294967295)

size = 100000000

while size:                                          # chew through
'size' until it's gone
    if (size & 1):                                    # if this bit of
'size' is 1
        newArray.extend(tempArray)    # add a copy of the temp array
    size >>= 1                                     # chew off one bit
    tempArray.extend(tempArray)       # double the size of the temp
array

print(time.time() - t0)

print(newArray[99999999])

#* * * * * * * * * * * * * * * * * * * * * *
# # Not a power of 2, list with "*", run time = 1.19271993637

t0 = time.time()

newList = [4294967295] * 100000000

print(time.time() - t0)

print(newList[99999999])

I think it is interesting that the shorter list takes longer than the
one
that is a power of 2 in length. I think this may say that the "list =
initializer * N" method uses something similar to the doubling method.

Also, tempArray (above) gets reallocated frequently, and demonstrates
that reallocation is not a big problem.

Finally, I just looked into calling C functions, and found
PyMem_Malloc,
PyMem_Realloc, PyMem_Free, etc. in the Memory Management section of
the
Python/C API Reference Manual. This gives you uninitialized memory,
and
should be really fast, but it's 6:45 AM here, and I don't have the
energy
to try it.

Gil
 
G

gil_johnson

I don't have the code with me, but for huge arrays, I have used
something like:
arr[0] = initializer
for i in range N:
     arr.extend(arr)

This doubles the array every time through the loop, and you can add
the powers of 2 to get the desired result.
Gil

To all who asked about my previous post,

I'm sorry I posted such a cryptic note before, I should have waited
until I had the code available.

I am still new to Python, and I couldn't find a way to create an array
of size N, with each member initialized to a given value. If there is
one, please let me know.

I think Paul Rubin and Paul Rudin are right, if they really are 2
different people, an array is a better solution if you're dealing with
integers. They both mention numpy, which I know nothing about, or the
array module.

kj, what *are* you going to do with this list/array? As others have
pointed out, there are differences between lists, arrays, and
dictionaries.

The problem I was solving was this: I wanted an array of 32-bit
integers to be used as a bit array, and I wanted it initialized with
all bits set, that is, each member of the array had to be set to
4294967295. Of course, you could set your initializer to 0, or any
other 32-bit number.

Originally I found that the doubling method I wrote about before was a
LOT faster than appending the elements one at a time, and tonight I
tried the "list = initializer * N" method. Running the code below, the
doubling method is still fastest, at least on my system.

Of course, as long as you avoid the 'one at a time' method, we're
talking about fractions of a second, even for arrays that I think are
huge, like the 536,870,912 byte beastie below.

Code:
# Written in Python 3.x

import array
import time

#* * * * * * * * * * * * * * * * * * * * * *
# Doubling method, run time = 0.413938045502

t0 = time.time()

newArray = array.array('I')             # 32-bit unsigned
integers

newArray.append(4294967295)

for i in range(27):             # 2**27 integers, 2**29 bytes
    newArray.extend(newArray)

print(time.time() - t0)

print(newArray[134217727]) # confirm array is fully initialized

#* * * * * * * * * * * * * * * * * * * * * *
# One at a time, run time = 28.5479729176

t0 = time.time()

newArray2 = array.array('I')

for i in range(134217728):        # the same size as above
    newArray2.append(4294967295)

print(time.time() - t0)

print(newArray2[134217727])                   # confirm array

#* * * * * * * * * * * * * * * * * * * * * *
# List with "*", run time = 1.06160402298

t0 = time.time()

newList = [4294967295] * 134217728

print(time.time() - t0)

print(newList[134217727])                      # confirm list

If, instead of 134,217,728 integers, I want something different, like
100,000,000, the method I use is:

Code:
#* * * * * * * * * * * * * * * * * * * * * *
# Not a power of 2, run time = 0.752086162567

t0 = time.time()

newArray = array.array('I')

tempArray = array.array('I')

tempArray.append(4294967295)

size = 100000000

while size:                                          # chew through
'size' until it's gone
    if (size & 1):                                    # if this bit of
'size' is 1
        newArray.extend(tempArray)    # add a copy of the temp array
    size >>= 1                                    # chew off one bit
    tempArray.extend(tempArray)       # double the size of the temp
array

print(time.time() - t0)

print(newArray[99999999])

#* * * * * * * * * * * * * * * * * * * * * *
# # Not a power of 2, list with "*", run time = 1.19271993637

t0 = time.time()

newList = [4294967295] * 100000000

print(time.time() - t0)

print(newList[99999999])

I think it is interesting that the shorter list takes longer than the
one that is a power of 2 in length. I think this may say that the
"list = initializer * N" method uses something similar to the doubling
method.

Also, tempArray (above) gets reallocated frequently, and demonstrates
that reallocation is not a big problem.

Finally, I just looked into calling C functions, and found
PyMem_Malloc, PyMem_Realloc, PyMem_Free, etc. in the Memory Management
section of the Python/C API Reference Manual. This gives you
uninitialized memory, and should be really fast, but it's 6:45 AM
here, and I don't have the energy to try it.

Gil
 
G

gb345

In said:
¿Have you ever tried to read list/matrix that you know it is not sparse, but you
jdon't know the size, and it may not be in order? A "grow-able" array would just
be the right thing to use - currently I have to settle with either hacking
together my own grow-able array, or...
...Not hard, not worthy of a
PEP, but certainly not so easy to dismiss.



I'm pretty new to Python, and I thought I'd give this a try:

class array(list):
"""A list that grows as needed upon assignment.

Any required padding is done with the value of the fill option.

Attempts to retrieve an index greater than the maximum assigned
index still produces an IndexError.
"""

def __init__(self, seq='', fill=None):
super(array, self).__init__(seq)
self.fill = fill

@property
def max(self):
return len(self) - 1

def __setitem__(self, index, item):
m = self.max
while m < index:
self.append(self.fill)
m += 1
super(array, self).__setitem__(index, item)

if __name__ == '__main__':
x = array(('zero',))
x[3] = 'it works!'
print x

---------------------------------------------------------------------------
['zero', None, None, 'it works!']



Looks like it works, but it seems almost too easy. Did I miss
anything? Comments welcome. (E.g. I'm not sure that '' is the
best choice of default value for the seq parameter to __init__.)

Thanks in advance,

Gabe
 
E

Emile van Sebille

On 11/7/2009 5:18 PM Carl Banks said...
I think the top one is O(N log N), and I'm suspicious that it's even
possible to grow a list in less than O(N log N) time without knowing
the final size in advance. Citation?

Quoting Tim --

Python uses mildly exponential over-allocation, sufficient so
that growing a list takes *amortized* O(1) time per append.
Speed of indexed list access is much more important in most
apps than speed of append, but Python has good O() behavior
for both. O() behavior for deleting (or inserting) "in the
middle" of a list is atrocious, though (O(len(list)). Life
is a never-ending sequence of tradeoffs <wink>.

--

For full context see
http://groups.google.com/g/2e77fef2/t/82ee7a80757e84ab/d/6d1c154901794bb


Emile
 
G

Gabriel Genellina

En Sun, 08 Nov 2009 10:08:35 -0300, gil_johnson
The problem I was solving was this: I wanted an array of 32-bit
integers to be used as a bit array, and I wanted it initialized with
all bits set, that is, each member of the array had to be set to
4294967295. Of course, you could set your initializer to 0, or any
other 32-bit number.

Originally I found that the doubling method I wrote about before was a
LOT faster than appending the elements one at a time, and tonight I
tried the "list = initializer * N" method. Running the code below, the
doubling method is still fastest, at least on my system.

Of course, as long as you avoid the 'one at a time' method, we're
talking about fractions of a second, even for arrays that I think are
huge, like the 536,870,912 byte beastie below.

I don't get your same results; one_item*N is faster on my tests. Note that
you're comparing disparate things:
# Doubling method, run time = 0.413938045502
newArray = array.array('I') # 32-bit unsigned
integers

Method 1 creates an array object; this takes roughly 2**29 bytes
# One at a time, run time = 28.5479729176
newArray2 = array.array('I')
for i in range(134217728): # the same size as above
newArray2.append(4294967295)

This creates also an array object, *and* 134 million integer objects in
the meantime.
# List with "*", run time = 1.06160402298
newList = [4294967295] * 134217728

And this creates a list object, not an array.

Note that all your 3 methods create global objects and you never delete
them - so the last method is at a disadvantage.
A more fair test would use timeit, and run just one method at a time:

<code>
# Written in Python 3.x
# use xrange everywhere with Python 2.x

import array

INITIAL_VALUE = 4294967295
LOG2SIZE=25
SIZE = 2**LOG2SIZE

def method1a():
newArray = array.array('I')
newArray.append(INITIAL_VALUE)
for i in range(LOG2SIZE):
newArray.extend(newArray)
assert len(newArray)==SIZE
assert newArray[SIZE-1]==INITIAL_VALUE

def method2a():
newArray = array.array('I')
for i in range(SIZE):
newArray.append(INITIAL_VALUE)
assert len(newArray)==SIZE
assert newArray[SIZE-1]==INITIAL_VALUE

def method3a():
newArray = array.array('I', [INITIAL_VALUE]) * SIZE
assert len(newArray)==SIZE
assert newArray[SIZE-1]==INITIAL_VALUE

def method1l():
newList = [INITIAL_VALUE]
for i in range(LOG2SIZE):
newList.extend(newList)
assert len(newList)==SIZE
assert newList[SIZE-1]==INITIAL_VALUE

def method2l():
newList = []
for i in range(SIZE):
newList.append(INITIAL_VALUE)
assert len(newList)==SIZE
assert newList[SIZE-1]==INITIAL_VALUE

def method3l():
newList = [INITIAL_VALUE] * SIZE
assert len(newList)==SIZE
assert newList[SIZE-1]==INITIAL_VALUE
</code>

D:\temp>python31 -m timeit -s "from arraygrow import method1a" "method1a()"
10 loops, best of 3: 411 msec per loop

D:\temp>python31 -m timeit -s "from arraygrow import method2a" "method2a()"
....aborted, too long...

D:\temp>python31 -m timeit -s "from arraygrow import method3a" "method3a()"
10 loops, best of 3: 377 msec per loop

D:\temp>python31 -m timeit -s "from arraygrow import method1l" "method1l()"
10 loops, best of 3: 628 msec per loop

D:\temp>python31 -m timeit -s "from arraygrow import method3l" "method3l()"
10 loops, best of 3: 459 msec per loop

So arrays are faster than lists, and in both cases one_item*N outperforms
your doubling algorithm.
Adding one item at a time is -at least- several hundred times slower; I
was not patient enough to wait.
Finally, I just looked into calling C functions, and found
PyMem_Malloc, PyMem_Realloc, PyMem_Free, etc. in the Memory Management
section of the Python/C API Reference Manual. This gives you
uninitialized memory, and should be really fast, but it's 6:45 AM
here, and I don't have the energy to try it.

No, those are for internal use only, you can't use the resulting pointer
in Python code. An array object is a contiguous memory block, so you don't
miss anything.
 
G

gil_johnson

[much cutting]

def method3a():
   newArray = array.array('I', [INITIAL_VALUE]) * SIZE
   assert len(newArray)==SIZE
   assert newArray[SIZE-1]==INITIAL_VALUE

[more cutting]

So arrays are faster than lists, and in both cases one_item*N outperforms  
your doubling algorithm.
Adding one item at a time is -at least- several hundred times slower; I  
was not patient enough to wait.
Finally, I just looked into calling C functions, and found
PyMem_Malloc, PyMem_Realloc, PyMem_Free, etc. in the Memory Management
section of the Python/C API Reference Manual. This gives you
uninitialized memory, and should be really fast, but it's 6:45 AM
here, and I don't have the energy to try it.

No, those are for internal use only, you can't use the resulting pointer  
in Python code. An array object is a contiguous memory block, so you don't  
miss anything.

Gabriel,
Thanks for the reply - your method 3a was what I wanted,
I missed "array.array('I', [INITIAL_VALUE]) * SIZ" on my
earlier pass through the Python documentation. I included
the 'one at a time' method because that was my first shot
at the problem - I was so happy to find a method that would
get the job done today that I didn't look farther than the
doubling method.

Yes, I know that I was comparing lists and arrays. Part of the
confusion in this thread is that the problem hasn't been
defined very well. The original post asked about arrays
but the solution proposed generated a list. At this point,
I have learned a lot about arrays, lists and dictionaries, and
will be better able to chose the right solution to future
problems.

It's too bad about the C functions, they sure sounded good.

And with that, I think I will withdraw from the field, and go
write some code and learn about timeit.

Gil
 
F

Francis Carr

Hmmm. I am trying some algorithms, and timeit reports that a
list.extend variant is fastest... WTH?! Really this seems like it
must be a "bug" in implementing the "[None]*x" idiom.


As expected, appending one-at-a-time is slowest (by an order of
magnitude):
% python -m timeit -s "N=1000000" \
"z=[2**32-1]" \
"while len(z)<N: z.append(z[0])"
10 loops, best of 3: 223 msec per loop

A method based on list.extend doubling is much better:
% python -m timeit -s "N=1000000" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z[:N-len(z)])"
10 loops, best of 3: 22.2 msec per loop

The straightforward "pythonic" solution is better yet:
% python -m timeit -s "N=1000000" \
"z=[2**32-1]*N"
100 loops, best of 3: 8.81 msec per loop

And the winner for speed is... over-allocate using list.extend, then
delete!
% python -m timeit -s "N=1000000" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
100 loops, best of 3: 6.93 msec per loop


Using array.array('I') gives similar results:
% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.append(z[0])"
10 loops, best of 3: 278 msec per loop

% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.extend(z[:N-len(z)])"
100 loops, best of 3: 7.82 msec per loop

% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"z=z*N"
100 loops, best of 3: 6.04 msec per loop

% python -m timeit -s "import array" -s "N=1000000" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
100 loops, best of 3: 4.02 msec per loop


These results appear to scale up. I replaced N=1000000 < 2**20 with
N=2**25 and obtained the same relative ordering:
% python -m timeit -s "N=2**25" \
"z=[2**32-1]" \
"while len(z)<N: z.append(z[0])"
[Ctrl-C after a loooong wait]

% python -m timeit -s "N=2**25" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z[:N-len(z)])"
10 loops, best of 3: 872 msec per loop

% python -m timeit -s "N=2**25" \
"z=[2**32-1]*N"
10 loops, best of 3: 434 msec per loop

% python -m timeit -s "N=2**25" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
10 loops, best of 3: 346 msec per loop


And just to see if array.array can help us when large amounts of
memory are in use:
% python -m timeit -s "import array" -s "N=2**25" \
"z=array.array('I', [2**32-1])" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
10 loops, best of 3: 149 msec per loop
If speed is the overriding concern, then there's another factor of two
right there.


Python version and hardware info:
Python 2.6.2 (release26-maint, Apr 19 2009, 01:58:18) [GCC 4.3.3] on
linux2
12Gb RAM on a quad-core Intel Xeon 3Ghz CPU


-- FC
 
S

Steven D'Aprano

Hmmm. I am trying some algorithms, and timeit reports that a
list.extend variant is fastest... WTH?! Really this seems like it must
be a "bug" in implementing the "[None]*x" idiom.
The straightforward "pythonic" solution is better yet:
% python -m timeit -s "N=1000000" \
"z=[2**32-1]*N"
100 loops, best of 3: 8.81 msec per loop


You're possibly including the time taken to calculate 2**32-1 in each
test, which is unnecessary. The peek-hole optimizer *may* be optimizing
that away to "z=[4294967295L]*N", or it may not. In any case, why not
make the test even simpler and use z=[None]*N? You're testing list
methods, not arithmetic and longs.

But regardless of what's inside the list, what Python does internally is
create an array of some size M, where M is some multiple of two and
larger than N, then set the first N elements to the same item, leaving
the rest unallocated. (That's what CPython does, other Pythons may behave
differently.) Then each time you grow the list, it doubles in size. So
over-allocating doesn't necessarily have the cost you think it does.


And the winner for speed is... over-allocate using list.extend, then
delete!
% python -m timeit -s "N=1000000" \
"z=[2**32-1]" \
"while len(z)<N: z.extend(z)" \
"del z[N:]"
100 loops, best of 3: 6.93 msec per loop


I can't confirm that. I suspect you're running into a hardware specific
cache effect or some side-effect of your specific setup.

For what it's worth, my timings compared to your timings are:

Append one-at-a-time:
Me : You
520 : 223

list.extend doubling:
18.5 : 22.2

Multiplication:
9.36 : 8.81

over-allocate then delete
9.42 : 6.93
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,211
Latest member
NelleWilde

Latest Threads

Top