Anyone happen to have optimization hints for this loop?

D

dp_pearce

I have some code that takes data from an Access database and processes
it into text files for another application. At the moment, I am using
a number of loops that are pretty slow. I am not a hugely experienced
python user so I would like to know if I am doing anything
particularly wrong or that can be hugely improved through the use of
another method.

Currently, all of the values that are to be written to file are pulled
from the database and into a list called "domainVa". These values
represent 3D data and need to be written to text files using line
breaks to seperate 'layers'. I am currently looping through the list
and appending a string, which I then write to file. This list can
regularly contain upwards of half a million values...

count = 0
dmntString = ""
for z in range(0, Z):
for y in range(0, Y):
for x in range(0, X):
fraction = domainVa[count]
dmntString += " "
dmntString += fraction
count = count + 1
dmntString += "\n"
dmntString += "\n"
dmntString += "\n***\n

dmntFile = open(dmntFilename, 'wt')
dmntFile.write(dmntString)
dmntFile.close()

I have found that it is currently taking ~3 seconds to build the
string but ~1 second to write the string to file, which seems wrong (I
would normally guess the CPU/Memory would out perform disc writing
speeds).

Can anyone see a way of speeding this loop up? Perhaps by changing the
data format? Is it wrong to append a string and write once, or should
hold a file open and write at each instance?

Thank you in advance for your time,

Dan
 
D

Diez B. Roggisch

dp_pearce said:
I have some code that takes data from an Access database and processes
it into text files for another application. At the moment, I am using
a number of loops that are pretty slow. I am not a hugely experienced
python user so I would like to know if I am doing anything
particularly wrong or that can be hugely improved through the use of
another method.

Currently, all of the values that are to be written to file are pulled
from the database and into a list called "domainVa". These values
represent 3D data and need to be written to text files using line
breaks to seperate 'layers'. I am currently looping through the list
and appending a string, which I then write to file. This list can
regularly contain upwards of half a million values...

count = 0
dmntString = ""
for z in range(0, Z):
for y in range(0, Y):
for x in range(0, X):
fraction = domainVa[count]
dmntString += " "
dmntString += fraction
count = count + 1
dmntString += "\n"
dmntString += "\n"
dmntString += "\n***\n

dmntFile = open(dmntFilename, 'wt')
dmntFile.write(dmntString)
dmntFile.close()

I have found that it is currently taking ~3 seconds to build the
string but ~1 second to write the string to file, which seems wrong (I
would normally guess the CPU/Memory would out perform disc writing
speeds).

Can anyone see a way of speeding this loop up? Perhaps by changing the
data format? Is it wrong to append a string and write once, or should
hold a file open and write at each instance?

Don't use in-place adding to concatenate strings. It might lead to
quadaratic behavior.

Use the "".join()-idiom instead:

dmntStrings = []
.....
dmntStrings.append("\n")
.....
dmntFile.write("".join(dmntStrings))

Diez
 
T

Terry Reedy

Diez said:
dp_pearce wrote:

count = 0
dmntString = ""
for z in range(0, Z):
for y in range(0, Y):
for x in range(0, X):
fraction = domainVa[count]
dmntString += " "
dmntString += fraction

Minor point, just construct " "+domain[count] all at once
Don't use in-place adding to concatenate strings. It might lead to
quadaratic behavior.

Semantically, repeated extension of an immutable is inherently
quadratic. And it is for strings in Python until 2.6 or possibly 2.5
(not sure), when more sophisticated code was added because people kept
falling into this trap. But since the more sophisticated code 'cheats'
by mutating the immutable (with an algorithm similar to list.extend),I
am pretty sure it will only be used when there is only one reference to
the string and the extension is with +=, so that the extension can
reliably be done in-place without changing semantics. Thus, Python
programmers should still learn the following.
Use the "".join()-idiom instead:

(Or upgrade)
dmntStrings = []
....
dmntStrings.append("\n")
....
dmntFile.write("".join(dmntStrings))

Note that the minor point above will cut the number of things to be
joined nearly in half.

tjr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top