how long a Str can be used in this python code segment?

Discussion in 'Python' started by Stephen.Wu, Feb 1, 2010.

  1. Stephen.Wu

    Stephen.Wu Guest

    tmp=file.read() (very huge file)
    if targetStr in tmp:
    print "find it"
    else:
    print "not find"
    file.close()

    I checked if file.read() is huge to some extend, it doesn't work, but
    could any give me some certain information on this prolbem?
     
    Stephen.Wu, Feb 1, 2010
    #1
    1. Advertisements

  2. Stephen.Wu

    Chris Rebert Guest

    On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <> wrote:
    > tmp=file.read() (very huge file)
    > if targetStr in tmp:
    >    print "find it"
    > else:
    >    print "not find"
    > file.close()
    >
    > I checked if file.read() is huge to some extend, it doesn't work, but
    > could any give me some certain information on this prolbem?


    If the file's contents is larger than available memory, you'll get a
    MemoryError. To avoid this, you can read the file in by chunks (or if
    applicable, by lines) and see if each chunk/line matches.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
     
    Chris Rebert, Feb 1, 2010
    #2
    1. Advertisements

  3. Stephen.Wu

    Gary Herron Guest

    Stephen.Wu wrote:
    > tmp=file.read() (very huge file)
    > if targetStr in tmp:
    > print "find it"
    > else:
    > print "not find"
    > file.close()
    >
    > I checked if file.read() is huge to some extend, it doesn't work, but
    > could any give me some certain information on this prolbem?
    >
    >


    Python has no specific limit on string size other than memory size and
    perhaps 32 bit address space and so on. However, if your file size is
    even a fraction of that size, you should not attempt to read it all into
    memory at once. Is there not a way to process your file in batches of a
    reasonable size?

    Gary Herron
     
    Gary Herron, Feb 1, 2010
    #3
  4. Stephen.Wu

    Stephen.Wu Guest

    On Feb 1, 5:26 pm, Chris Rebert <> wrote:
    > On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <> wrote:
    > > tmp=file.read() (very huge file)
    > > if targetStr in tmp:
    > >    print "find it"
    > > else:
    > >    print "not find"
    > > file.close()

    >
    > > I checked if file.read() is huge to some extend, it doesn't work, but
    > > could any give me some certain information on this prolbem?

    >
    > If the file's contents is larger than available memory, you'll get a
    > MemoryError. To avoid this, you can read the file in by chunks (or if
    > applicable, by lines) and see if each chunk/line matches.
    >
    > Cheers,
    > Chris
    > --http://blog.rebertia.com


    actually, I just use file.read(length) way, i just want to know what
    exactly para of length I should set, I'm afraid length doesn't equal
    to the amount of physical memory after trials...
     
    Stephen.Wu, Feb 1, 2010
    #4
  5. Stephen.Wu, 01.02.2010 10:17:
    > tmp=file.read() (very huge file)
    > if targetStr in tmp:
    > print "find it"
    > else:
    > print "not find"
    > file.close()
    >
    > I checked if file.read() is huge to some extend, it doesn't work, but
    > could any give me some certain information on this prolbem?


    Others have already pointed out that reading the entire file into memory is
    not a good idea. Try reading chunks repeatedly instead.

    As it appears that you simply try to find out if a file contains a specific
    byte sequence, you might find acora interesting:

    http://pypi.python.org/pypi/acora

    Also note that there are usually platform optimised tools available to
    search content in files, e.g. grep. It's basically impossible to beat their
    raw speed even with hand-tuned Python code, so running the right tool using
    the subprocess module might be a solution.

    Stefan
     
    Stefan Behnel, Feb 1, 2010
    #5
  6. Stephen.Wu

    MRAB Guest

    Chris Rebert wrote:
    > On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <> wrote:
    >> tmp=file.read() (very huge file)
    >> if targetStr in tmp:
    >> print "find it"
    >> else:
    >> print "not find"
    >> file.close()
    >>
    >> I checked if file.read() is huge to some extend, it doesn't work, but
    >> could any give me some certain information on this prolbem?

    >
    > If the file's contents is larger than available memory, you'll get a
    > MemoryError. To avoid this, you can read the file in by chunks (or if
    > applicable, by lines) and see if each chunk/line matches.
    >

    If you're processing in chunks then you also need to consider the
    possibility that what you're looking for crosses a chunk boundary, of
    course. It's an easy case to miss! :)
     
    MRAB, Feb 1, 2010
    #6
  7. Le Mon, 01 Feb 2010 01:33:09 -0800, Stephen.Wu a écrit :
    >
    > actually, I just use file.read(length) way, i just want to know what
    > exactly para of length I should set, I'm afraid length doesn't equal to
    > the amount of physical memory after trials...


    There's no exact length you "should" set, just set something big enough
    that looping doesn't add any noticeable overhead, but small enough that
    it doesn't take too much memory. Something between 64kB and 1MB sounds
    reasonable.
     
    Antoine Pitrou, Feb 1, 2010
    #7
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David
    Replies:
    2
    Views:
    655
    Thomas G. Marshall
    Aug 3, 2003
  2. Trevor

    sizeof(str) or sizeof(str) - 1 ?

    Trevor, Apr 3, 2004, in forum: C Programming
    Replies:
    9
    Views:
    955
    CBFalconer
    Apr 10, 2004
  3. Sullivan WxPyQtKinter

    It is fun.the result of str.lower(str())

    Sullivan WxPyQtKinter, Mar 7, 2006, in forum: Python
    Replies:
    5
    Views:
    517
    Tim Roberts
    Mar 9, 2006
  4. Stefan Ram

    str.equals(null) or str==null ?

    Stefan Ram, Jul 31, 2006, in forum: Java
    Replies:
    21
    Views:
    15,438
    Oliver Wong
    Aug 3, 2006
  5. maestro
    Replies:
    1
    Views:
    432
    Chris
    Aug 11, 2008
  6. Casey Hawthorne
    Replies:
    1
    Views:
    927
    Arne Vajhøj
    Mar 18, 2009
  7. Marco
    Replies:
    6
    Views:
    1,733
    Marco
    May 17, 2012
  8. Mark Janssen
    Replies:
    0
    Views:
    269
    Mark Janssen
    Apr 12, 2013
Loading...