memory error

Discussion in 'Python' started by Bart Nessux, Jun 2, 2004.

  1. Bart Nessux

    Bart Nessux Guest

    def windows():
    import os
    excludes = ['hiberfil.sys', 'ipnathlp.dll', 'helpctr.exe', 'etc',
    'etc', 'etc']
    size_list = []
    for root, dirs, files in os.walk('/'):
    total = [x for x in files if x not in excludes]
    for t in total:
    s = file(os.path.join(root,t))
    size = s.read()
    size_list.append(size)
    s.close()

    windows()

    The above function crashes with a memory error on Windows XP Pro at the
    'size = s.read()' line. Page File usage (normally ~120 MB) will rise to
    300+ MB and pythonw.exe will consume about 200 MB of actual ram right
    before the crash. The machine has 512 MB of ram and is doing nothing
    else while running the script.

    I've written the script several ways, all with the same result. I've
    noticed that a binary read 'rb' consumes almost twice as much physical
    memory and causes the crash to happen quicker, but that's about it.

    Initially, I wanted to use Python to open every file on the system (that
    could be opened) and read the contents so I'd know the size of the file
    and then add all of the reads to a list that I'd sum up. Basically
    attempt to add up all bytes on the machine's disk drive.

    Any ideas on what I'm doing wrong or suggestions on how to do this
    differently?

    Thanks,

    Bart
     
    Bart Nessux, Jun 2, 2004
    #1
    1. Advertising

  2. Am Mittwoch, 2. Juni 2004 15:11 schrieb Bart Nessux:
    > size = s.read()


    You read the complete content of the file here. size will not contain the
    length of the file, but the complete file data. What you want is either
    len(s.read()) (which is sloooooooooow), or have a look at os.path.getsize().

    > size_list.append(size)


    This appends the complete file to the list. And as such should explain the
    memory usage you're seeing...

    HTH!

    Heiko.
     
    Heiko Wundram, Jun 2, 2004
    #2
    1. Advertising

  3. Bart Nessux wrote:
    > def windows():
    > import os
    > excludes = ['hiberfil.sys', 'ipnathlp.dll', 'helpctr.exe', 'etc',
    > 'etc', 'etc']
    > size_list = []
    > for root, dirs, files in os.walk('/'):
    > total = [x for x in files if x not in excludes]
    > for t in total:
    > s = file(os.path.join(root,t))
    > size = s.read()
    > size_list.append(size)
    > s.close()
    >
    > windows()
    >
    > The above function crashes with a memory error on Windows XP Pro at the
    > 'size = s.read()' line. Page File usage (normally ~120 MB) will rise to
    > 300+ MB and pythonw.exe will consume about 200 MB of actual ram right
    > before the crash. The machine has 512 MB of ram and is doing nothing
    > else while running the script.
    >
    > I've written the script several ways, all with the same result. I've
    > noticed that a binary read 'rb' consumes almost twice as much physical
    > memory and causes the crash to happen quicker, but that's about it.
    >
    > Initially, I wanted to use Python to open every file on the system (that
    > could be opened) and read the contents so I'd know the size of the file
    > and then add all of the reads to a list that I'd sum up. Basically
    > attempt to add up all bytes on the machine's disk drive.
    >
    > Any ideas on what I'm doing wrong or suggestions on how to do this
    > differently?

    Your building an array containing the *contents* of all your files.
    If you really need to use read(), use "size = len(s.read())", but this
    still requires to read and hold a complete file at a time in memory (and
    probably chokes when it stumbles over your divx collection ;)
    I think using os.stat() should be better...
     
    Benjamin Niemann, Jun 2, 2004
    #3
  4. Bart Nessux

    fishboy Guest

    On Wed, 02 Jun 2004 09:11:13 -0400, Bart Nessux
    <> wrote:

    >def windows():
    > import os
    > excludes = ['hiberfil.sys', 'ipnathlp.dll', 'helpctr.exe', 'etc',
    >'etc', 'etc']
    > size_list = []
    > for root, dirs, files in os.walk('/'):
    > total = [x for x in files if x not in excludes]
    > for t in total:
    > s = file(os.path.join(root,t))
    > size = s.read()
    > size_list.append(size)
    > s.close()
    >
    >windows()


    Yeah, what the other guys said about os.stat and os.path.getsize.
    Also, if you really want to read the actual file into memory, just get
    small chunks and add those up.

    Like (untested):

    numberofbytes = 0
    CHUNKSIZE = 4096
    for root,dirs, files in os.walk('/'):
    for name in files:
    if name not in excludes:
    f = file(os.path.join(root,name))
    while 1:
    s = f.read(CHUNKSIZE)
    if not s:
    f.close()
    break
    numberofbytes += len(s)

    this way you never have more than 4k of data in memory at once.
    (well it might be 8k, I dont know enought about the internals to tell
    you when the previous 's' is garbage collected.)

    ><{{{*>
     
    fishboy, Jun 2, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mahesh Prasad
    Replies:
    1
    Views:
    723
    Tom Wells
    Feb 22, 2004
  2. Cy Huckaba
    Replies:
    1
    Views:
    1,161
    Xie Xiao
    Jun 26, 2003
  3. Julián Sanz García

    RAM Memory or virual memory

    Julián Sanz García, Nov 12, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    910
    Julián Sanz García
    Nov 12, 2004
  4. humbleaptience
    Replies:
    0
    Views:
    5,496
    humbleaptience
    Feb 22, 2006
  5. Todd
    Replies:
    4
    Views:
    554
    Jeff Higgins
    Sep 5, 2007
Loading...

Share This Page