preallocate list

Jim · Apr 13, 2005

Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

What about non integers?

I've just claimed in the newsgroup above that pre-allocating helps but I
might be getting confused with matlab

If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking ages.

Jim

rbt · Apr 13, 2005

Jim said:
If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking
ages.

Perhaps you should use readlines (notice the s) instead of readline.

Bill Mill · Apr 13, 2005

Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

the 0 is unnecessary; range(length) does the same thing.

What about non integers?

arr = [myobject() for i in range(length)]

I've just claimed in the newsgroup above that pre-allocating helps but I
might be getting confused with matlab

If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking ages.

I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

t1 = timeit.Timer('test1()', 'from __main__ import test1')
t2 = timeit.Timer('test2()', 'from __main__ import test2')
print "time1: %f" % t1.timeit(100)
print "time2: %f" % t2.timeit(100)

09:09 AM ~$ python test.py
time1: 12.435000
time2: 12.385000

Peace
Bill Mill
bill.mill at gmail.com

Jim · Apr 13, 2005

rbt said:
Perhaps you should use readlines (notice the s) instead of readline.

I don't know if I thought of that, but I'm tokenizing each line before
adding to a list of lists.

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(factor)

Is this nasty?

Jim

Bill Mill · Apr 13, 2005

Just a correction:

I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

t1 = timeit.Timer('test1()', 'from __main__ import test1')
t2 = timeit.Timer('test2()', 'from __main__ import test2')
print "time1: %f" % t1.timeit(100)
print "time2: %f" % t2.timeit(100)

The results change slightly when I actually insert an integer, instead
of a float, with lst = i and lst.append(i):

09:14 AM ~$ python test.py
time1: 3.352000
time2: 3.672000

The preallocated list is slightly faster in most of my tests, but I
still don't think it'll bring a large performance benefit with it
unless you're making a truly huge list.

I need to wake up before pressing "send".

Peace
Bill Mill

Jim · Apr 13, 2005

Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Jim

Mike C. Fletcher · Apr 13, 2005

Jim said:
Thanks for the suggestions. I guess I must ensure that this is my
bottle neck.
....

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)

....

You might try:

factors = [ [float(item) for item in line.split()] for line in f ]

avoiding the extra statements for appending to the lists. Also might try:

factors = [ map(float, line.split()) for line in f ]

though it uses the out-of-favour functional form for the mapping.

Good luck,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com

peufeu · Apr 13, 2005

what about :

factors = [map(float, line.split()) for line in file]

should be a hell of a lot faster and nicer.

Steven Bethard · Apr 13, 2005

Jim said:
Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

STeVe

Jim · Apr 13, 2005

Steven said:
Jim said:

Thanks for the suggestions. I guess I must ensure that this is my
bottle neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines
it uses all CPU for at least 30 secs. There must be a better way.

Click to expand...

Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

Just this. I had a breakpoint on the return.

I'm going to try peufeu's line of code and I'll report back.

Jim

Jim · Apr 13, 2005

what about :

factors = [map(float, line.split()) for line in file]

should be a hell of a lot faster and nicer.

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(factor)

Is this nasty?

Jim

Click to expand...

Oh the relief

Of course, line.split() is already a list.

Couple of seconds for the 10000 line file.

Thanks.

What I really want is a Numeric array but I don't think Numeric supports
importing files.

Jim

Jim · Apr 13, 2005

Steven said:
Jim wrote: ...

Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

STeVe

Well it's fast enough now. Thanks for having a look.

Jim

Steven Bethard · Apr 13, 2005

Jim said:
What I really want is a Numeric array but I don't think Numeric supports
importing files.

Hmmm... Maybe the scipy package?

I think scipy.io.read_array might help, but I've never used it.

STeVe

F. Petitjean · Apr 13, 2005

Le Wed, 13 Apr 2005 16:46:53 +0100, Jim a écrit :

What I really want is a Numeric array but I don't think Numeric supports
importing files.

Numeric arrays can be serialized from/to files through pickles :
import Numeric as N
help(N.load)
help(N.dump)
(and it is space efficient)

beliavsky · Apr 13, 2005

Jim said:
Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

For serious numerical work you should use Numeric or Numarray, as
others suggested. When I do allocate lists the initial values 0:n-1 are
rarely what I want, so I use

ivec = n*[None]

so that if I use a list element before intializing it, for example

ivec[0] += 1

I get an error message

File "xxnone.py", line 2, in ?
ivec[0] += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

This is in the same spirit as Python's (welcome) termination of a
program when one tries to use an uninitalized scalar variable.

Jim · Apr 13, 2005

F. Petitjean said:
Le Wed, 13 Apr 2005 16:46:53 +0100, Jim a écrit :

Numeric arrays can be serialized from/to files through pickles :
import Numeric as N
help(N.load)
help(N.dump)
(and it is space efficient)

Yeah thanks. I'm generating them using Matlab though so I'd have to get
the format the same. I use Matlab because I get the results I want. When
I get to know Python + scipy etc. better I might remove that step.

Thanks again

Jim

Jim · Apr 13, 2005

Steven said:
Hmmm... Maybe the scipy package?

I think scipy.io.read_array might help, but I've never used it.

STeVe

Sounds promising.

I only got Numeric because I wanted scipy but I've hardly explored it as
I kept running into problems even with the complicated examples cut and
paste into a file

Oh yeah, I wanted to explore the GA module but no docs

and I got busy
doing other stuff.

Thanks

Jim

Jim · Apr 13, 2005

ivec = n*[None]

so that if I use a list element before intializing it, for example

ivec[0] += 1

I get an error message

File "xxnone.py", line 2, in ?
ivec[0] += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

This is in the same spirit as Python's (welcome) termination of a
program when one tries to use an uninitalized scalar variable.

I feel foolish that I forgot about *. I've just started with Python then
took 2 weeks off. I'll explore pre-allocation when I'm back up to speed.

Yep, I use None a lot.

Thanks

Jim

Dan Christensen · Apr 13, 2005

Bill Mill said:
Bill Mill said:

I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

Click to expand...

I don't think there's any point putting strings into the preallocated
list. A list is just an array of pointers to objects, so any object
will do fine for preallocation, no matter what the list will be used for.

My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

Click to expand...

....

The results change slightly when I actually insert an integer, instead
of a float, with lst = i and lst.append(i):

09:14 AM ~$ python test.py
time1: 3.352000
time2: 3.672000

If you use

lst = range(100000)

or even better

lst = [None]*100000

then test1 is more than twice as fast as test2:

time1: 2.437730
time2: 5.308054

(using python 2.4).

Your code

lst = [0 for i in range(100000)]

made python do an extra 100000-iteration loop.

Dan

John Machin · Apr 13, 2005

Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):

1. "numberLoads" is not used.

factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:

2. "columns" is already an int (unless of course you've redefined
"len"!). Doing int(columns) is pointless.
3. What is "number"? Same as "numberLoads"?
4. Please explain in general what is the layout of your file and in
particular, what is the significance of the first line of the file and
of the above "if" test.

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))

4. "factor" is built and then not used any more??

factors.append(loadFactor)

5. What is "loadFactor"? Same as "factor"?

else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)

6. You throw away any tokens in the line after the first??

return factors
</code>

OK. I've just tried with 4 lines and the code works.

Which code works? The code you posted? Please define "works".

With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Perhaps after you post the code that you've actually run, and
explained what your file layout is, and what you are trying to
achieve, then we can give you some meaningful help.

Cheers,

John

Escaping list of numbers for Postgres column names	1	Feb 23, 2013
Padding strings for a clean visual print out...	5	Dec 23, 2023
Minimum Total Difficulty	0	Nov 15, 2023
can't generate iterator from list	3	Sep 10, 2011
Fast list traversal	4	Nov 2, 2008
Format specification mini-language for list joining	5	Nov 10, 2012
Most efficient way to "pre-grow" a list?	30	Nov 6, 2009
Odd behaviour with list comprehension	2	Mar 1, 2008

preallocate list

Jim

rbt

Bill Mill

Jim

Bill Mill

Jim

Mike C. Fletcher

peufeu

Steven Bethard

Jim

Jim

Jim

Steven Bethard

F. Petitjean

beliavsky

Jim

Jim

Jim

Dan Christensen

John Machin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads