how long a Str can be used in this python code segment?

Discussion in 'Python' started by Stephen.Wu, Feb 1, 2010.

  1. Stephen.Wu

    Stephen.Wu Guest

    tmp=file.read() (very huge file)
    if targetStr in tmp:
    print "find it"
    else:
    print "not find"
    file.close()

    I checked if file.read() is huge to some extend, it doesn't work, but
    could any give me some certain information on this prolbem?
    Stephen.Wu, Feb 1, 2010
    #1
    1. Advertising

  2. Stephen.Wu

    Chris Rebert Guest

    On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <> wrote:
    > tmp=file.read() (very huge file)
    > if targetStr in tmp:
    >    print "find it"
    > else:
    >    print "not find"
    > file.close()
    >
    > I checked if file.read() is huge to some extend, it doesn't work, but
    > could any give me some certain information on this prolbem?


    If the file's contents is larger than available memory, you'll get a
    MemoryError. To avoid this, you can read the file in by chunks (or if
    applicable, by lines) and see if each chunk/line matches.

    Cheers,
    Chris
    --
    http://blog.rebertia.com
    Chris Rebert, Feb 1, 2010
    #2
    1. Advertising

  3. Stephen.Wu

    Gary Herron Guest

    Stephen.Wu wrote:
    > tmp=file.read() (very huge file)
    > if targetStr in tmp:
    > print "find it"
    > else:
    > print "not find"
    > file.close()
    >
    > I checked if file.read() is huge to some extend, it doesn't work, but
    > could any give me some certain information on this prolbem?
    >
    >


    Python has no specific limit on string size other than memory size and
    perhaps 32 bit address space and so on. However, if your file size is
    even a fraction of that size, you should not attempt to read it all into
    memory at once. Is there not a way to process your file in batches of a
    reasonable size?

    Gary Herron
    Gary Herron, Feb 1, 2010
    #3
  4. Stephen.Wu

    Stephen.Wu Guest

    On Feb 1, 5:26 pm, Chris Rebert <> wrote:
    > On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <> wrote:
    > > tmp=file.read() (very huge file)
    > > if targetStr in tmp:
    > >    print "find it"
    > > else:
    > >    print "not find"
    > > file.close()

    >
    > > I checked if file.read() is huge to some extend, it doesn't work, but
    > > could any give me some certain information on this prolbem?

    >
    > If the file's contents is larger than available memory, you'll get a
    > MemoryError. To avoid this, you can read the file in by chunks (or if
    > applicable, by lines) and see if each chunk/line matches.
    >
    > Cheers,
    > Chris
    > --http://blog.rebertia.com


    actually, I just use file.read(length) way, i just want to know what
    exactly para of length I should set, I'm afraid length doesn't equal
    to the amount of physical memory after trials...
    Stephen.Wu, Feb 1, 2010
    #4
  5. Stephen.Wu, 01.02.2010 10:17:
    > tmp=file.read() (very huge file)
    > if targetStr in tmp:
    > print "find it"
    > else:
    > print "not find"
    > file.close()
    >
    > I checked if file.read() is huge to some extend, it doesn't work, but
    > could any give me some certain information on this prolbem?


    Others have already pointed out that reading the entire file into memory is
    not a good idea. Try reading chunks repeatedly instead.

    As it appears that you simply try to find out if a file contains a specific
    byte sequence, you might find acora interesting:

    http://pypi.python.org/pypi/acora

    Also note that there are usually platform optimised tools available to
    search content in files, e.g. grep. It's basically impossible to beat their
    raw speed even with hand-tuned Python code, so running the right tool using
    the subprocess module might be a solution.

    Stefan
    Stefan Behnel, Feb 1, 2010
    #5
  6. Stephen.Wu

    MRAB Guest

    Chris Rebert wrote:
    > On Mon, Feb 1, 2010 at 1:17 AM, Stephen.Wu <> wrote:
    >> tmp=file.read() (very huge file)
    >> if targetStr in tmp:
    >> print "find it"
    >> else:
    >> print "not find"
    >> file.close()
    >>
    >> I checked if file.read() is huge to some extend, it doesn't work, but
    >> could any give me some certain information on this prolbem?

    >
    > If the file's contents is larger than available memory, you'll get a
    > MemoryError. To avoid this, you can read the file in by chunks (or if
    > applicable, by lines) and see if each chunk/line matches.
    >

    If you're processing in chunks then you also need to consider the
    possibility that what you're looking for crosses a chunk boundary, of
    course. It's an easy case to miss! :)
    MRAB, Feb 1, 2010
    #6
  7. Le Mon, 01 Feb 2010 01:33:09 -0800, Stephen.Wu a écrit :
    >
    > actually, I just use file.read(length) way, i just want to know what
    > exactly para of length I should set, I'm afraid length doesn't equal to
    > the amount of physical memory after trials...


    There's no exact length you "should" set, just set something big enough
    that looping doesn't add any noticeable overhead, but small enough that
    it doesn't take too much memory. Something between 64kB and 1MB sounds
    reasonable.
    Antoine Pitrou, Feb 1, 2010
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page