memory error

B

Bart Nessux

def windows():
import os
excludes = ['hiberfil.sys', 'ipnathlp.dll', 'helpctr.exe', 'etc',
'etc', 'etc']
size_list = []
for root, dirs, files in os.walk('/'):
total = [x for x in files if x not in excludes]
for t in total:
s = file(os.path.join(root,t))
size = s.read()
size_list.append(size)
s.close()

windows()

The above function crashes with a memory error on Windows XP Pro at the
'size = s.read()' line. Page File usage (normally ~120 MB) will rise to
300+ MB and pythonw.exe will consume about 200 MB of actual ram right
before the crash. The machine has 512 MB of ram and is doing nothing
else while running the script.

I've written the script several ways, all with the same result. I've
noticed that a binary read 'rb' consumes almost twice as much physical
memory and causes the crash to happen quicker, but that's about it.

Initially, I wanted to use Python to open every file on the system (that
could be opened) and read the contents so I'd know the size of the file
and then add all of the reads to a list that I'd sum up. Basically
attempt to add up all bytes on the machine's disk drive.

Any ideas on what I'm doing wrong or suggestions on how to do this
differently?

Thanks,

Bart
 
H

Heiko Wundram

Am Mittwoch, 2. Juni 2004 15:11 schrieb Bart Nessux:
size = s.read()

You read the complete content of the file here. size will not contain the
length of the file, but the complete file data. What you want is either
len(s.read()) (which is sloooooooooow), or have a look at os.path.getsize().
size_list.append(size)

This appends the complete file to the list. And as such should explain the
memory usage you're seeing...

HTH!

Heiko.
 
B

Benjamin Niemann

Bart said:
def windows():
import os
excludes = ['hiberfil.sys', 'ipnathlp.dll', 'helpctr.exe', 'etc',
'etc', 'etc']
size_list = []
for root, dirs, files in os.walk('/'):
total = [x for x in files if x not in excludes]
for t in total:
s = file(os.path.join(root,t))
size = s.read()
size_list.append(size)
s.close()

windows()

The above function crashes with a memory error on Windows XP Pro at the
'size = s.read()' line. Page File usage (normally ~120 MB) will rise to
300+ MB and pythonw.exe will consume about 200 MB of actual ram right
before the crash. The machine has 512 MB of ram and is doing nothing
else while running the script.

I've written the script several ways, all with the same result. I've
noticed that a binary read 'rb' consumes almost twice as much physical
memory and causes the crash to happen quicker, but that's about it.

Initially, I wanted to use Python to open every file on the system (that
could be opened) and read the contents so I'd know the size of the file
and then add all of the reads to a list that I'd sum up. Basically
attempt to add up all bytes on the machine's disk drive.

Any ideas on what I'm doing wrong or suggestions on how to do this
differently?
Your building an array containing the *contents* of all your files.
If you really need to use read(), use "size = len(s.read())", but this
still requires to read and hold a complete file at a time in memory (and
probably chokes when it stumbles over your divx collection ;)
I think using os.stat() should be better...
 
F

fishboy

def windows():
import os
excludes = ['hiberfil.sys', 'ipnathlp.dll', 'helpctr.exe', 'etc',
'etc', 'etc']
size_list = []
for root, dirs, files in os.walk('/'):
total = [x for x in files if x not in excludes]
for t in total:
s = file(os.path.join(root,t))
size = s.read()
size_list.append(size)
s.close()

windows()

Yeah, what the other guys said about os.stat and os.path.getsize.
Also, if you really want to read the actual file into memory, just get
small chunks and add those up.

Like (untested):

numberofbytes = 0
CHUNKSIZE = 4096
for root,dirs, files in os.walk('/'):
for name in files:
if name not in excludes:
f = file(os.path.join(root,name))
while 1:
s = f.read(CHUNKSIZE)
if not s:
f.close()
break
numberofbytes += len(s)

this way you never have more than 4k of data in memory at once.
(well it might be 8k, I dont know enought about the internals to tell
you when the previous 's' is garbage collected.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,533
Members
45,007
Latest member
OrderFitnessKetoCapsules

Latest Threads

Top