Possible memory leak?

T

Tuvas

I have a function in a program that works something like this.

def load_pic_data(width,heigth,inpdat, filt=TRUE):
data=''
total=0
tnum=0
size=100
for y in range(0,heigth):
row=''
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
row=row+chr(num/64)
data=data+row


The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation. This will
later be displayed as an image. However, I've noticed the following
about this code. I was noticing when I took a small picture, it went
really fast, but a larger picture took forever to run through. I added
a print statement to the y portion of the code to see where it was
getting hung up. I noticed that it appears to be running slower as time
goes on. I did a time.time() timestamp to verify this, and had it
confirmed. Any ideas as to what I could do to make it run faster? Note
that if this process is repeated, it runs equally slow.What can I do to
make it run faster? I suspect if I somehow dealocate the row statement
after it's done, that it will run faster, and the data variable when
it's done, but I just don't know how to do so.Thanks!
 
F

Fredrik Lundh

Tuvas said:
I have a function in a program that works something like this.

def load_pic_data(width,heigth,inpdat, filt=TRUE):
data=''
total=0
tnum=0
size=100
for y in range(0,heigth):
row=''
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
row=row+chr(num/64)
data=data+row

The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation. This will
later be displayed as an image. However, I've noticed the following
about this code. I was noticing when I took a small picture, it went
really fast, but a larger picture took forever to run through. I added
a print statement to the y portion of the code to see where it was
getting hung up. I noticed that it appears to be running slower as time
goes on.

hint: what does
row=row+chr(num/64)

do, and how often is it executed ?

</F>
 
S

Steven D'Aprano

The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation. This will
later be displayed as an image. However, I've noticed the following
about this code. I was noticing when I took a small picture, it went
really fast, but a larger picture took forever to run through.

for i in range(10):
for j in range(10):
# do stuff with i and j
# this loops 100 times.

for i in range(1000):
for j in range(1000):
# do stuff with i and j
# this loops 1,000,000.

Of course the code gets slower as the image size increases.

But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python has to
copy BOTH strings. As row gets huge, this takes longer and longer to do.

A rule of thumb I use is, never add more than two strings together. Maybe
three. Certainly not more than four. Or five.

But absolutely not millions of strings, which is what you are doing.

The best way to handle this is to turn row into a list, and then once, at
the very end, convert the list into a string.

Instead of this:

row = ""
# processing in a loop...
row = row + chr(num/64)
# finished processing
print row

do this instead:

row = []
# processing in a loop...
row.append(chr(num/64)
# finished processing
row = "".join(row) # convert to a string in one hit
print row

You should find that runs much faster.
I suspect if I somehow dealocate the row statement
after it's done, that it will run faster, and the data variable when
it's done, but I just don't know how to do so.

Once row and data are no longer being used, Python automatically removes
them and reclaims the memory they used. You rarely need to worry about
that.
 
P

Paul Rubin

Steven D'Aprano said:
row = []
# processing in a loop...
row.append(chr(num/64)
# finished processing
row = "".join(row) # convert to a string in one hit
print row

Probably better to use StringIO or the array module.
 
S

Stephen Kellett

In message said:
But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python has to
copy BOTH strings. As row gets huge, this takes longer and longer to do.

A rule of thumb I use is, never add more than two strings together. Maybe
three. Certainly not more than four. Or five.

But absolutely not millions of strings, which is what you are doing.

You would be able to visualize this process very well using Python
Memory Validator and Python Performance Validator.

http://www.softwareverify.com

Stephen
 
T

Travis E. Oliphant

Tuvas said:
I have a function in a program that works something like this.

def load_pic_data(width,heigth,inpdat, filt=TRUE):
data=''
total=0
tnum=0
size=100
for y in range(0,heigth):
row=''
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
row=row+chr(num/64)
data=data+row


The purpose of this part of a program is to take a 14 bit numerical
representation and convert it to an 8 bit representation.

This is exactly the kind of thing NumPy (http://numeric.scipy.org) is
ideal for. If that's of interest to you, then I can offer particular
suggestions...

-Travis
 
T

Tuvas

Hmm. The problem is that the required form for the image is as a string
of characters to convert with the tkimage interface, at least, as I
understood it. Perhaps an array would work, I'll try that when I get
ahold of the computer in question (One thing required is a linux only
library, and I don't have access to a linux machine at the moment...)

However, I would like to make a few more statements. The max image size
is 1024x1024. The length of time it takes to do the row statement
increased fairly substationally every time it was ran, in the order of
~.5%. That seems small, but when you run it 1M times, the last
operation will take 5000 times longer than the original, or about
~1sec. And that's just for one row, the entire process would take an
eternity...

And the real kicker, when doing this with smaller images, ei, about
128x128, the second image starts of at the same point of slowness as
the first one. EI, the last operation of the first image took .01
seconds to complete (It started, let's say, around .005), for instance,
the next one would start at that length of time, and end at .02 or so,
the third picture would be taking that long for each row, and so on. It
only does this on one particular computer (I've only had the chance to
run it on 2 machines to date, BTW.)

There is a reason why the rows are pieced together as is, I must say,
but it's a tad bit complicated... I'll just defend it without giving a
real reason.

Thanks for the help!
 
T

Tuvas

Oh, I should also mention, I used a memory monitor and saw the amount
of memory being used go up with time, even when the function ended,
meaning I did the 10 128x128 pictures, never was any memory dealocated
until I exited the program.
 
S

Steven D'Aprano

Tuvas said:
Oh, I should also mention, I used a memory monitor and saw the amount
of memory being used go up with time, even when the function ended,
meaning I did the 10 128x128 pictures, never was any memory dealocated
until I exited the program.

Are you really trying to say that the amount of memory
being used is going up even when the function is not
being used?

If so, then at least part of your problem exists in
another part of your program.

If you are monitoring the amount of memory being
allocated to Python, then no, Python won't generally
return memory to the operating system while it is still
running. But the memory allocated to your data
structures will be reclaimed by Python when they are
not in use.
 
S

Steven D'Aprano

Tuvas said:
However, I would like to make a few more statements. The max image size
is 1024x1024. The length of time it takes to do the row statement
increased fairly substationally every time it was ran, in the order of
~.5%. That seems small, but when you run it 1M times, the last
operation will take 5000 times longer than the original, or about
~1sec. And that's just for one row, the entire process would take an
eternity...

I've just gone back and taken another look at your
original post. You haven't posted the actual code you
are using, have you? The code you posted doesn't
actually return a result, nor does it set a global
variable. As far as I can tell, it just churns for a
while, processing hard, and then throws the result away.

So I'm guessing that it isn't actually your production
code. Did you really think we would be able to debug
some code that you haven't shown us? I mean, some of
the folks on comp.lang.python are really good, but even
they aren't that good *wink*
 
G

Giovanni Bajo

Steven said:
But the real killer is this one line:

row=row+chr(num/64)

Bad, bad BAD idea. Every time you add two strings together, Python
has to copy BOTH strings. As row gets huge, this takes longer and
longer to do.

This is no longer true as of CPython 2.4 though. I'm not sure which version the
OP was using because he didn't say.
 
S

Steven D'Aprano

Giovanni said:
Steven D'Aprano wrote:




This is no longer true as of CPython 2.4 though. I'm not sure which version the
OP was using because he didn't say.

No, that's not correct. You are making a false
assumption. This is from the "What's New" for Python 2.4:

String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

We really don't know what the optimization recognises,
how it works, or how fast a difference it makes.
Writing poor code, and trusting an implementation-
specific optimization to make it run "faster" (how much
faster?) is always a dangerous strategy.

Better to avoid the temptation if you can, appreciate
the improved performance for those times when you do
string concatenation, but continue to use the join
method anytime you are adding large numbers of strings.
 
S

Steven D'Aprano

Replying to myself... the first step towards perdition...
We really don't know what the optimization recognises, how it works, or
how fast a difference it makes.

Of course, by "we" I mean "those of us who haven't seen
and understood the CPython source code, or run detailed
benchmarks".
 
F

Fredrik Lundh

Steven said:
Steven D'Aprano wrote:




This is no longer true as of CPython 2.4 though. I'm not sure which version the
OP was using because he didn't say.

No, that's not correct. You are making a false
assumption. This is from the "What's New" for Python 2.4:

String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

it only means that if the interpreter can make sure that else is
using the target string, it's modified in place.

however, the string type doesn't use overallocation (beyond the
8-byte alignment provided by the memory allocator), so this fix
only avoids extra copying in a few specific cases. O(n*n/8) is
still quadratic...

</F>
 
T

Tuvas

FYI, to all who asked, I was indeed just simply monitering the system
memory. I changed my approach to one that uses arrays and simply joins
the statements together at the end, it seems to have improved the
speed. However, for some reason it still takes a small eternity to
process on one computer, and not the other. The oddest thing of all is
that the one that is taking longer is on a better computer. I have been
using Linux, running Python 2.4.

The modified version of my code is now as follows: (Note, a few small
changes have been made to simplify things, however, these things don't
apply to a full-scale picture, so the shouldn't slow anything down in
the slightest.)

def load_pic_data(width,heigth,inpdat, filt=TRUE):
ldata=[]
total=0
tnum=0
size=100
array=[]
for y in range(0,heigth):
row=[]
ts=time.time()
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
tba=chr(num/64)
row.append(tba)
srow=''.join(row)
ldata.append(srow)
print y,time.time()-ts
data=''.join(ldata)

There is considerable more to the function, however, I've traced the
slowest part to this one.
Note the statement "print y, time.time()-ts". A few of the outputs are
as follows, with the 1024x1024 image.

1 .0633
2. .07005
3. .06698
20 .07925
30 .08410
100 .16255
200 .270895
500 .59182
900 1.06439
Note that at the time I wrote this, 900 was the highest avaliable.

Note that the value seems to be consistant when a few close ones are
observed, but over time, it increases, until the program is running
very slowly. For some reason it does this on one computer, and not
another, and I believe the 2 computers have identical python
configuration, ei, same libraries, version, etc. Both are the same type
of linux as well. I no longer am joining the strings one at a time, but
only at the end. What could be the source of such an odd problem? I
understand that the large image will take longer to process, but I
would think that the relationship should be more or less linear with
the size, and not exponential. Thanks for all of the help till now!
 
G

Giovanni Bajo

Steven said:
No, that's not correct. You are making a false
assumption.

Did you ever try to measure the code yourself?
This is from the "What's New" for Python 2.4:

String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

We really don't know what the optimization recognises,
how it works, or how fast a difference it makes.
Writing poor code, and trusting an implementation-
specific optimization to make it run "faster" (how much
faster?) is always a dangerous strategy.

The code is poor by your definition of poor. In my definition, it used to be
poor and it's not anymore. Since I was interested in exactly WHICH
circumstances the optimization was performed, I investigated and I know the
answer. I also know that the optimization *will* apply to the OP code. But
maybe you want some numbers:

------- foo.py -----
def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

def iters2(n):
L = []
for i in xrange(n):
L.append(chr(i%64))
return "".join(L)
------- foo.py -----

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(10000)"
100 loops, best of 3: 10.2 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(20000)"
100 loops, best of 3: 20.5 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(40000)"
100 loops, best of 3: 40.9 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters(80000)"
100 loops, best of 3: 81.6 msec per loop



[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(10000)"
100 loops, best of 3: 10.9 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(20000)"
100 loops, best of 3: 22.2 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(40000)"
100 loops, best of 3: 44 msec per loop

[D:\Work]python -m timeit -n100 -s "import foo" "foo.iters2(80000)"
100 loops, best of 3: 89.9 msec per loop


So, look, it's even faster than the solution you're proposing.
 
T

Tuvas

Very interesting results with the last test. I guess I go back to my
other code then, even if it is only a hair faster, it's still faster...

It's odd, I just ran another test. There's 2 ways I can call my
load_pic function, first of all, through taking a picture, secondly by
loading a picture. For some reason, the exact same function takes the
same amount of time when the load picture function is used, but not
when called from the take picture function. Odd, isn't it... I don't
know why the time would increase for one, but not for the other. The
data is passed in exactly the same format, it's just really odd... Oh
well, I'll get something figured out...
 
F

Fredrik Lundh

Giovanni said:
------- foo.py -----
def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

def iters2(n):
L = []
for i in xrange(n):
L.append(chr(i%64))
return "".join(L)
------- foo.py -----

So, look, it's even faster than the solution you're proposing.

since you know the length, you can preallocate the list

def iters3(n):
L = [None]*n
for i in xrange(n):
L = chr(i%64)
return "".join(L)

or use a preallocated array

def iters4(n):
L = array.array("B", [0])*n
for i in xrange(n):
L = i%64
return L.tostring()

on my machine, the last one is twice as fast as your "even faster"
solution under 2.4. in earlier versions, it's just under 5 times faster.

for the OP's problem, a PIL-based solution would probably be ~100
times faster than the array solution, but that's another story.

</F>
 
T

Tuvas

Fredrik said:
Giovanni said:
------- foo.py -----
def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

def iters2(n):
L = []
for i in xrange(n):
L.append(chr(i%64))
return "".join(L)
------- foo.py -----

So, look, it's even faster than the solution you're proposing.

since you know the length, you can preallocate the list

def iters3(n):
L = [None]*n
for i in xrange(n):
L = chr(i%64)
return "".join(L)

or use a preallocated array

def iters4(n):
L = array.array("B", [0])*n
for i in xrange(n):
L = i%64
return L.tostring()

on my machine, the last one is twice as fast as your "even faster"
solution under 2.4. in earlier versions, it's just under 5 times faster.

for the OP's problem, a PIL-based solution would probably be ~100
times faster than the array solution, but that's another story.


What do you mean by a PIL based solution? The reason I need to get the
data into the string list is so I can pump it into PIL to give me my
image... If PIL has a way to make it easier, I do not know it, but
would like to know it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top