Memory leak ??

K

Kim Petersen

Memory leak - malloc/free implementation - GC kicking in late - know bug
- or ?

Using python-2.2.2-26 on RH9 (shrike) x86 -fully patched

The following program slowly eats up more and more memory when run on
large datasets... can anyone tell what the trouble is?

i've run it up to 240000 recsets so far - and it eats about .1% of my
mem pr. 1000 (doesn't really matter how much does it?).

--
Med Venlig Hilsen / Regards

Kim Petersen - Kyborg A/S (Udvikling)
IT - Innovationshuset
Havneparken 2
7100 Vejle
Tlf. +4576408183 || Fax. +4576408188

#!/usr/bin/python
#
# Created: 13:32 10/07-2003 by Kim Petersen <[email protected]>
#
# $Id$
from __future__ import generators
import gzip
import re

err1=re.compile("^'ERROR:\s+(.*?)' in '(.*)'\s*$")

def iterator(file):
buffer=[]
while 1:
if not buffer:
buffer=file.readlines(1000)
line=buffer[0]
del buffer[0]
if not line:
raise
yield line

def getrec(lines):
result=[]
while 1:
line=lines.next().rstrip()
if not line: break
result.append(line)
if not result: return None
(error,dataset)=(result[:-1],eval(result[-1]))
error=''.join(error)[16:]
return error,dataset

if __name__ == "__main__":
import sys

lines=iterator(gzip.open("error.txt.gz"))
i=0
while 1:
if (i%1000)==0:
sys.stdout.write("%-10.10d\r" % (i,))
sys.stdout.flush()
rec=getrec(lines)
if not rec: break
(errline,dataset)=rec
if not err1.match(errline):
sys.stdout.write("%s\n" % (errline,))
sys.stdout.write("%-10.10d\r" % (i,))
sys.stdout.flush()
i+=1
sys.stdout.write("%-10.10d\n" % (i,))
sys.stdout.flush()

# Local Variables:
# tab-width: 3
# py-indent-offset: 3
# End:
 
A

A.M. Kuchling

Using python-2.2.2-26 on RH9 (shrike) x86 -fully patched

The following program slowly eats up more and more memory when run on
large datasets... can anyone tell what the trouble is?

Your code uses eval(), which is pretty heavyweight because it has to
tokenize, parse, and then evaluate the string. There have been a few memory
leaks in eval(), and perhaps you're running into one of them. Try using
int() or float() to convert strings to numbers instead of eval. As a bonus,
your program will be faster and much more secure (could an attacker tweak
your logfiles so you end up eval()ing os.unlink('/etc/passwd')?).

In general, using eval() is almost always a mistake; few programs need to
take arbitrary expressions as input.

--amk
 
K

Kim Petersen

A.M. Kuchling said:
Your code uses eval(), which is pretty heavyweight because it has to
tokenize, parse, and then evaluate the string. There have been a few memory
leaks in eval(), and perhaps you're running into one of them. Try using
int() or float() to convert strings to numbers instead of eval. As a bonus,
your program will be faster and much more secure (could an attacker tweak
your logfiles so you end up eval()ing os.unlink('/etc/passwd')?).

Thank you very much - it was eval()

this solved my trouble (calling get_list instead of eval) - is there a
more generic/efficient way of solving reading a list/expression? (i know
this one will fail for some strings for instance):

def get_value(str):
str=str.strip()
if str.lower()=='none':
return None
elif str[0] in ['"',"'"]:
return str[1:-1]
else:
if str[-1]=='j':
return complex(str)
elif '.' in str or 'e' in str:
return float(str)
else:
return int(str)

def get_list(str):
try:
if str[0]=='(':
robj=tuple
else:
robj=list
items=str.strip()[1:-1].split(', ')
return robj(map(get_value,items))
except:
traceback.print_exc()
print str
return []

--
Med Venlig Hilsen / Regards

Kim Petersen - Kyborg A/S (Udvikling)
IT - Innovationshuset
Havneparken 2
7100 Vejle
Tlf. +4576408183 || Fax. +4576408188
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top