Possible memory leak?

G

Giovanni Bajo

Fredrik said:
since you know the length, you can preallocate the list

def iters3(n):
L = [None]*n
for i in xrange(n):
L = chr(i%64)
return "".join(L)

or use a preallocated array

def iters4(n):
L = array.array("B", [0])*n
for i in xrange(n):
L = i%64
return L.tostring()


Of course. I was just trying to make a point about string accumulation being
O(n) and not O(n^2). You can get better factors with other solutions like the
ones you propose, but even the "simple" implementation (and most readable and
Pythonic, IMHO) behaves "well" under CPython 2.4.
 
S

Steven D'Aprano

Very interesting results with the last test. I guess I go back to my
other code then, even if it is only a hair faster, it's still faster...

Can you say "premature optimization"?

Have you timed your code with realistic data? Not somebody else's code.
Not some cut back piece of sample code with only three lines. Unless you
have timed YOUR code, you don't know what's faster.

Anyway, do you really care about "optimizing" code that takes an hour to
run to 59 minutes, 59 seconds, 997 milliseconds? Or, at the other extreme,
from 80 milliseconds to 77? Do you think anyone will notice?

I hope you do speed up your code, whatever way it takes. But I'm guessing
that if you aren't using the array module, you aren't getting anywhere
*near* as fast as you could be. (And that is JUST a guess -- time it to be
sure.)
It's odd, I just ran another test. There's 2 ways I can call my
load_pic function, first of all, through taking a picture, secondly by
loading a picture. For some reason, the exact same function takes the
same amount of time when the load picture function is used, but not
when called from the take picture function. Odd, isn't it... I don't
know why the time would increase for one, but not for the other. The
data is passed in exactly the same format, it's just really odd... Oh
well, I'll get something figured out...

What are you timing? Are you timing the execution time of the code, or the
elapsed time? They are not the same thing. Hint: think about your
operating system and what it is doing. Can you say "preemptive
multitasking"? Are you measuring network events, memory paging,
screen savers, disk I/O? Are the files you are loading from disk badly
fragmented?

If your code is not optimized for memory, and hence pages a lot, you
should be including that time in your measurements. But you probably don't
want to count elapsed time when the screen saver kicks in.
 
T

Tim Peters

[Fredrik Lundh]
...
for the OP's problem, a PIL-based solution would probably be ~100
times faster than the array solution, but that's another story.
[Tuvas]
What do you mean by a PIL based solution? The reason I need to get the
data into the string list is so I can pump it into PIL to give me my
image... If PIL has a way to make it easier, I do not know it, but
would like to know it.

If your data is in an array.array, you can pass that directly to PIL's
(1.1.4) Image.frombuffer() constructor
 
S

Steven D'Aprano

Of course. I was just trying to make a point about string accumulation being
O(n) and not O(n^2).

But according to Fredrik, string accumulation is still quadratic, even
with the optimizations added to Python 2.4. Quoting:

"it only means that if the interpreter can make sure that else is
using the target string, it's modified in place.

however, the string type doesn't use overallocation (beyond the
8-byte alignment provided by the memory allocator), so this fix
only avoids extra copying in a few specific cases. O(n*n/8) is
still quadratic..."

I presume Fredrik meant to say "nothing else".

Or have I misunderstood something?
 
S

Steven D'Aprano

Did you ever try to measure the code yourself?

I'm still running Python 2.3, it would give misleading results.

This is from the "What's New" for Python 2.4:

String concatenations in statements of the form s = s +
"abc" and s += "abc" are now performed more efficiently
in certain circumstances. This optimization won't be
present in other Python implementations such as Jython,
so you shouldn't rely on it; using the join() method of
strings is still recommended when you want to
efficiently glue a large number of strings together.
[end quote]

Note the "more efficiently in CERTAIN CIRCUMSTANCES"
[emphasis added]? That certainly does not mean that
concatenating large numbers of strings is no longer
slow. It just means that *sometimes* (almost always?
often? occasionally?) it isn't as slow as it used to be.

We really don't know what the optimization recognises,
how it works, or how fast a difference it makes.
Writing poor code, and trusting an implementation-
specific optimization to make it run "faster" (how much
faster?) is always a dangerous strategy.

The code is poor by your definition of poor. In my definition, it used to be
poor and it's not anymore. Since I was interested in exactly WHICH
circumstances the optimization was performed, I investigated and I know the
answer. I also know that the optimization *will* apply to the OP code.

I think you are being overly optimistic about the OP's actual code.

It looks to me that the code posted can't possibly be his production code.
Both versions he has posted lack a return statement. He isn't posting his
actual code, only a modified version of it. He's even admitted it: "Note,
a few small changes have been made to simplify things, however, these
things don't apply to a full-scale picture, so the shouldn't slow anything
down in the slightest." Famous last words. If I had a dollar for every
time somebody said "I made some changes, but they shouldn't change
anything"...


But maybe you want some numbers: ....
So, look, it's even faster than the solution you're proposing.

But your test code isn't equivalent to the OP's code. I don't know if it
will make a difference, but his code looks more like this:

def iters3(n,m):
data = ''
for i in xrange(n):
row = ''
for j in xrange(m):
row += chr(j%64)
data += row
return data

while yours is:

def iters(n):
s = ''
for i in xrange(n):
s += chr(i%64)
return s

I'm happy to agree that your code is optimized to the point it is
marginally faster than the list/join algorithm. But does iters3 optimize
as well? I don't know.

Even if it does, what generalisation do you learn from that? What do you
predict about this one?

def iters4(n):
s = ''
D = {}
for i in xrange(n):
s += chr(i%64)
D = s
return s

At what level of complexity should the Python coder stop relying on
compiler optimizations to turn slow code into fast?

You have saved a handful of milliseconds for one particular version of
Python, at the cost of performance plummeting like a stone if the code
ever gets run under an older version, or on Jython, PyPy or IronPython, or
if the algorithm is used in a slightly more complicated way. I would much
rather have consistently good performance, even if not quite the fastest
possible, than a small speed increase on one platform and terrible
performance on everything else.
 
F

Fredrik Lundh

Steven said:
But according to Fredrik, string accumulation is still quadratic, even
with the optimizations added to Python 2.4. Quoting:

"it only means that if the interpreter can make sure that else is
using the target string, it's modified in place.

however, the string type doesn't use overallocation (beyond the
8-byte alignment provided by the memory allocator), so this fix
only avoids extra copying in a few specific cases. O(n*n/8) is
still quadratic..."

I presume Fredrik meant to say "nothing else".

correct.

the only way to get amortized O(n) behaviour is by adding small
strings to the end of a larger string, and hope that the memory
allocator can satisfy all reallocations without too much copying.

the list implementation, in contrast, uses controlled overallocation
and can therefore guarantee amortized O(n) even if realloc always
fails to reallocate in place.

</F>
 
T

Tuvas

The times that I posted was the time that it took to perform ONE row
iteration. As you can see, the time was going up, fairly dramatically.
Why on earth could it be doing this? I understand the the time will
fluctuate somewhat depending upon what else the CPU is doing, but, why
is the base time increasing so much for one machine doing something one
way, and not for another machine appearently identically configured
doing the same operation? That is the great question I have. Still, it
seems as though it cannot be solved as to why its doing this. The only
thing I can think of that might be different is perhaps these results
came off of a slightly newer version of python. One great problem is
the data is read in streamlining, ei, the data enters this function as
a list (Or tuple, I always mix those two up, but the one that uses
[]'s. not ()'s). Oh well, the solution will come, eventually. Thanks
for the help!
 
T

Tuvas

I have made one confirmation. The only identifiable difference that I
have seen is that one runs on python 2.4.2, and the other 2.4.1. Oddly
enough, it's the first one that's the one that is having more problems
than the second... Why that is, I don't know. It still could be
something else, but...
 
M

Marc 'BlackJack' Rintsch

The modified version of my code is now as follows: (Note, a few small
changes have been made to simplify things, however, these things don't
apply to a full-scale picture, so the shouldn't slow anything down in
the slightest.)

def load_pic_data(width,heigth,inpdat, filt=TRUE):
ldata=[]
total=0
tnum=0
size=100
array=[]
for y in range(0,heigth):
row=[]
ts=time.time()
for x in range(0,width):
index=2*(x+y*width)
num=ord(inpdat[index+1])*256+ord(inpdat[index])
if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))
if(num<0):
num=0
if(num>255*64):
num=255*64
tba=chr(num/64)
row.append(tba)
srow=''.join(row)
ldata.append(srow)
print y,time.time()-ts
data=''.join(ldata)

Do you always work on the whole `width` and `height` of the picture? If
yes, do you really need the `x` and `y` coordinate? The snipped above
would work if you just process all values as one sequence.

And using `array.array` for `inpdat` and the result may speed things up
too::

inpdat_array = array('H', inpdat)
data = array('c')
for index in xrange(width * heigth):
num = inpdat_array[index]
if vfilter.get() and d_filter and filt:
num = round(num - d_filter[index])
if num < 0:
num = 0
if num > 255 * 64:
num = 255 * 64
data.append(num // 64)
return data.tostring()

If you don't need the `index` somewhere in the loop you can even get rid
of it by always subtract the filter value and supply a "zero filter" if
the condition of the first ``if`` is not fulfilled::

from itertools import izip, repeat

inpdat_array = array('H', inpdat)
data = array('c')

if not (vfilter.get() and d_filter and filt):
filter_values = repeat(0)
else:
filter_values = d_filter

for num, filter_value in izip(inpdat_array, filter_values):
num = round(inpdat_array[index] - filter_value)
if num < 0:
num = 0
if num > 255 * 64:
num = 255 * 64
data.append(num // 64)
return data.tostring()

Ciao,
Marc 'BlackJack' Rintsch
 
T

Tuvas

The reason it's done in width and heigth is that there is some other
non-related processing functions that were going on in the mean time
with the problem. I found the source of the slow-down, when 2
non-important lines of code were commented out, the program worked
perfectly.

if(vfilter.get() and d_filter and filt):
num=round((num-(d_filter[index/2])))

I don't know why these lines of code are causing a slow-down, but, it
is for some reason. What I will do is just simply combine these 3
variables (Which won't move image to image) into one variable, that
will not have the problem that I've seen. But, thanks for all of your
help!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,270
Latest member
TopCryptoTwitterChannels_

Latest Threads

Top