preallocate list

J

Jim

Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

What about non integers?

I've just claimed in the newsgroup above that pre-allocating helps but I
might be getting confused with matlab ;)

If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking ages.

Jim
 
R

rbt

Jim said:
If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking
ages.

Perhaps you should use readlines (notice the s) instead of readline.
 
B

Bill Mill

Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

the 0 is unnecessary; range(length) does the same thing.
What about non integers?

arr = [myobject() for i in range(length)]
I've just claimed in the newsgroup above that pre-allocating helps but I
might be getting confused with matlab ;)

If I have a file with a floating point number on each line, what is the
best way of reading them into a list (or other ordered structure)?

I was iterating with readline and appending to a list but it is taking ages.

I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

t1 = timeit.Timer('test1()', 'from __main__ import test1')
t2 = timeit.Timer('test2()', 'from __main__ import test2')
print "time1: %f" % t1.timeit(100)
print "time2: %f" % t2.timeit(100)

09:09 AM ~$ python test.py
time1: 12.435000
time2: 12.385000

Peace
Bill Mill
bill.mill at gmail.com
 
J

Jim

rbt said:
Perhaps you should use readlines (notice the s) instead of readline.

I don't know if I thought of that, but I'm tokenizing each line before
adding to a list of lists.

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(factor)

Is this nasty?

Jim
 
B

Bill Mill

Just a correction:

I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

t1 = timeit.Timer('test1()', 'from __main__ import test1')
t2 = timeit.Timer('test2()', 'from __main__ import test2')
print "time1: %f" % t1.timeit(100)
print "time2: %f" % t2.timeit(100)


The results change slightly when I actually insert an integer, instead
of a float, with lst = i and lst.append(i):

09:14 AM ~$ python test.py
time1: 3.352000
time2: 3.672000

The preallocated list is slightly faster in most of my tests, but I
still don't think it'll bring a large performance benefit with it
unless you're making a truly huge list.

I need to wake up before pressing "send".

Peace
Bill Mill
 
J

Jim

Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Jim
 
M

Mike C. Fletcher

Jim said:
Thanks for the suggestions. I guess I must ensure that this is my
bottle neck.
....

for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
....

You might try:

factors = [ [float(item) for item in line.split()] for line in f ]

avoiding the extra statements for appending to the lists. Also might try:

factors = [ map(float, line.split()) for line in f ]

though it uses the out-of-favour functional form for the mapping.

Good luck,
Mike

________________________________________________
Mike C. Fletcher
Designer, VR Plumber, Coder
http://www.vrplumber.com
http://blog.vrplumber.com
 
P

peufeu

what about :

factors = [map(float, line.split()) for line in file]

should be a hell of a lot faster and nicer.
 
S

Steven Bethard

Jim said:
Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

STeVe
 
J

Jim

Steven said:
Jim said:
Thanks for the suggestions. I guess I must ensure that this is my
bottle neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(loadFactor)
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)
return factors
</code>

OK. I've just tried with 4 lines and the code works. With 11000 lines
it uses all CPU for at least 30 secs. There must be a better way.


Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

Just this. I had a breakpoint on the return.

I'm going to try peufeu's line of code and I'll report back.

Jim
 
J

Jim

what about :

factors = [map(float, line.split()) for line in file]

should be a hell of a lot faster and nicer.
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))
factors.append(factor)

Is this nasty?

Jim
Oh the relief :)

Of course, line.split() is already a list.

Couple of seconds for the 10000 line file.

Thanks.

What I really want is a Numeric array but I don't think Numeric supports
importing files.

Jim
 
J

Jim

Steven said:
Jim wrote: ...


Was your test on *just* this function? Or were you doing something with
the list produced by this function as well?

STeVe

Well it's fast enough now. Thanks for having a look.

Jim
 
S

Steven Bethard

Jim said:
What I really want is a Numeric array but I don't think Numeric supports
importing files.

Hmmm... Maybe the scipy package?

I think scipy.io.read_array might help, but I've never used it.

STeVe
 
F

F. Petitjean

Le Wed, 13 Apr 2005 16:46:53 +0100, Jim a écrit :
What I really want is a Numeric array but I don't think Numeric supports
importing files.
Numeric arrays can be serialized from/to files through pickles :
import Numeric as N
help(N.load)
help(N.dump)
(and it is space efficient)
 
B

beliavsky

Jim said:
Hi all

Is this the best way to preallocate a list of integers?
listName = range(0,length)

For serious numerical work you should use Numeric or Numarray, as
others suggested. When I do allocate lists the initial values 0:n-1 are
rarely what I want, so I use

ivec = n*[None]

so that if I use a list element before intializing it, for example

ivec[0] += 1

I get an error message

File "xxnone.py", line 2, in ?
ivec[0] += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

This is in the same spirit as Python's (welcome) termination of a
program when one tries to use an uninitalized scalar variable.
 
J

Jim

F. Petitjean said:
Le Wed, 13 Apr 2005 16:46:53 +0100, Jim a écrit :


Numeric arrays can be serialized from/to files through pickles :
import Numeric as N
help(N.load)
help(N.dump)
(and it is space efficient)
Yeah thanks. I'm generating them using Matlab though so I'd have to get
the format the same. I use Matlab because I get the results I want. When
I get to know Python + scipy etc. better I might remove that step.

Thanks again

Jim
 
J

Jim

Steven said:
Hmmm... Maybe the scipy package?

I think scipy.io.read_array might help, but I've never used it.

STeVe
Sounds promising.

I only got Numeric because I wanted scipy but I've hardly explored it as
I kept running into problems even with the complicated examples cut and
paste into a file ;)

Oh yeah, I wanted to explore the GA module but no docs :( and I got busy
doing other stuff.

Thanks

Jim
 
J

Jim

ivec = n*[None]

so that if I use a list element before intializing it, for example

ivec[0] += 1

I get an error message

File "xxnone.py", line 2, in ?
ivec[0] += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

This is in the same spirit as Python's (welcome) termination of a
program when one tries to use an uninitalized scalar variable.

I feel foolish that I forgot about *. I've just started with Python then
took 2 weeks off. I'll explore pre-allocation when I'm back up to speed.

Yep, I use None a lot.

Thanks

Jim
 
D

Dan Christensen

Bill Mill said:
Bill Mill said:
I would profile your app to see that it's your append which is taking
ages, but to preallocate a list of strings would look like:

["This is an average length string" for i in range(approx_length)]

I don't think there's any point putting strings into the preallocated
list. A list is just an array of pointers to objects, so any object
will do fine for preallocation, no matter what the list will be used for.
My guess is that it won't help to preallocate, but time it and let us
know. A test to back my guess:

import timeit, math

def test1():
lst = [0 for i in range(100000)]
for i in xrange(100000):
lst = math.sin(i) * i

def test2():
lst = []
for i in xrange(100000):
lst.append(math.sin(i) * i)

....

The results change slightly when I actually insert an integer, instead
of a float, with lst = i and lst.append(i):

09:14 AM ~$ python test.py
time1: 3.352000
time2: 3.672000


If you use

lst = range(100000)

or even better

lst = [None]*100000

then test1 is more than twice as fast as test2:

time1: 2.437730
time2: 5.308054

(using python 2.4).

Your code

lst = [0 for i in range(100000)]

made python do an extra 100000-iteration loop.

Dan
 
J

John Machin

Thanks for the suggestions. I guess I must ensure that this is my bottle
neck.
<code>
def readFactorsIntoList(self,filename,numberLoads):

1. "numberLoads" is not used.
factors = []
f = open(self.basedir + filename,'r')
line = f.readline()
tokens = line.split()
columns = len(tokens)
if int(columns) == number:

2. "columns" is already an int (unless of course you've redefined
"len"!). Doing int(columns) is pointless.
3. What is "number"? Same as "numberLoads"?
4. Please explain in general what is the layout of your file and in
particular, what is the significance of the first line of the file and
of the above "if" test.
for line in f:
factor = []
tokens = line.split()
for i in tokens:
factor.append(float(i))

4. "factor" is built and then not used any more??
factors.append(loadFactor)

5. What is "loadFactor"? Same as "factor"?
else:
for line in f:
tokens = line.split()
factors.append([float(tokens[0])] * number)

6. You throw away any tokens in the line after the first??
return factors
</code>

OK. I've just tried with 4 lines and the code works.

Which code works? The code you posted? Please define "works".

With 11000 lines it
uses all CPU for at least 30 secs. There must be a better way.

Perhaps after you post the code that you've actually run, and
explained what your file layout is, and what you are trying to
achieve, then we can give you some meaningful help.

Cheers,

John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top