how long a Str can be used in this python code segment?

S

Stephen.Wu

tmp=file.read() (very huge file)
if targetStr in tmp:
print "find it"
else:
print "not find"
file.close()

I checked if file.read() is huge to some extend, it doesn't work, but
could any give me some certain information on this prolbem?
 
C

Chris Rebert

tmp=file.read() (very huge file)
if targetStr in tmp:
   print "find it"
else:
   print "not find"
file.close()

I checked if file.read() is huge to some extend, it doesn't work, but
could any give me some certain information on this prolbem?

If the file's contents is larger than available memory, you'll get a
MemoryError. To avoid this, you can read the file in by chunks (or if
applicable, by lines) and see if each chunk/line matches.

Cheers,
Chris
 
G

Gary Herron

Stephen.Wu said:
tmp=file.read() (very huge file)
if targetStr in tmp:
print "find it"
else:
print "not find"
file.close()

I checked if file.read() is huge to some extend, it doesn't work, but
could any give me some certain information on this prolbem?

Python has no specific limit on string size other than memory size and
perhaps 32 bit address space and so on. However, if your file size is
even a fraction of that size, you should not attempt to read it all into
memory at once. Is there not a way to process your file in batches of a
reasonable size?

Gary Herron
 
S

Stephen.Wu

If the file's contents is larger than available memory, you'll get a
MemoryError. To avoid this, you can read the file in by chunks (or if
applicable, by lines) and see if each chunk/line matches.

Cheers,
Chris
--http://blog.rebertia.com

actually, I just use file.read(length) way, i just want to know what
exactly para of length I should set, I'm afraid length doesn't equal
to the amount of physical memory after trials...
 
S

Stefan Behnel

Stephen.Wu, 01.02.2010 10:17:
tmp=file.read() (very huge file)
if targetStr in tmp:
print "find it"
else:
print "not find"
file.close()

I checked if file.read() is huge to some extend, it doesn't work, but
could any give me some certain information on this prolbem?

Others have already pointed out that reading the entire file into memory is
not a good idea. Try reading chunks repeatedly instead.

As it appears that you simply try to find out if a file contains a specific
byte sequence, you might find acora interesting:

http://pypi.python.org/pypi/acora

Also note that there are usually platform optimised tools available to
search content in files, e.g. grep. It's basically impossible to beat their
raw speed even with hand-tuned Python code, so running the right tool using
the subprocess module might be a solution.

Stefan
 
M

MRAB

Chris said:
If the file's contents is larger than available memory, you'll get a
MemoryError. To avoid this, you can read the file in by chunks (or if
applicable, by lines) and see if each chunk/line matches.
If you're processing in chunks then you also need to consider the
possibility that what you're looking for crosses a chunk boundary, of
course. It's an easy case to miss! :)
 
A

Antoine Pitrou

Le Mon, 01 Feb 2010 01:33:09 -0800, Stephen.Wu a écrit :
actually, I just use file.read(length) way, i just want to know what
exactly para of length I should set, I'm afraid length doesn't equal to
the amount of physical memory after trials...

There's no exact length you "should" set, just set something big enough
that looping doesn't add any noticeable overhead, but small enough that
it doesn't take too much memory. Something between 64kB and 1MB sounds
reasonable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top