Loading contents behind the scenes

S

s0suk3

Hi, I wanted to know how cautious it is to do something like:

f = file("filename", "rb")
f.read()

for a possibly huge file. When calling f.read(), and not doing
anything with the return value, what is Python doing internally? Is it
loading the content of the file into memory (regardless of whether it
is discarding it immediately)?

In my case, what I'm doing is sending the return value through a
socket:

sock.send(f.read())

Is that gonna make a difference (memory-wise)? I guess I'm just
concerned with whether I can do a file.read() for any file in the
system in an efficient and memory-kind way, and with low overhead in
general. (For one thing, I'm not loading the contents into a
variable.)

Not that I'm saying that loading a huge file into memory will horribly
crash the system, but it's good to try to program in the safest way
possibly. For example, if you try something like this in the
interpreter on a Windows machine, everything will start working
slowly, and you'll likely have to reboot the OS:

s = ((("abc" * 999999) * 999999) * 99999) * 999999
 
S

s0suk3

I am not a Python interpreter developer, but as user, yes I'd expect that to
happen. The method doesn't know you are not doing anything with its return
value.


Doesn't matter. You allocate a string in which the contents is loaded (the
return value of 'f.read()', and you hand over (a reference to) that string to
the 'send()' method.

Note that memory is allocated by data *values*, not by *variables* in Python
(they are merely references to values).


Depends on your system, and your biggest file.

At a 32 bit platform, anything bigger than about 4GB (usually already at around
3GB) will crash the program for the simple reason that you are running out of
address space to store bytes in.

To fix, read and write blocks by specifying a block-size in the 'read()' call.

I see... Thanks for the reply.

So what would be a good approach to solve that problem? The best I can
think of is something like:

MAX_BUF_SIZE = 100000000 # about 100 MBs

f = file("filename", "rb")
f.seek(0, 2) # relative to EOF
length = f.tell()
bPos = 0

while bPos < length:
f.seek(bPos)
bPos += sock.send(f.read(MAX_BUF_SIZE))
 
D

Diez B. Roggisch

I see... Thanks for the reply.

So what would be a good approach to solve that problem? The best I can
think of is something like:

You are aware that read() takes an int-argument to limit the number of bytes
returned, and of course advances the internal seek-pointer for you?

Diez
 
M

MRAB

I see... Thanks for the reply.

So what would be a good approach to solve that problem? The best I can
think of is something like:

MAX_BUF_SIZE = 100000000 # about 100 MBs

f = file("filename", "rb")
f.seek(0, 2) # relative to EOF
length = f.tell()
bPos = 0

while bPos < length:
f.seek(bPos)
bPos += sock.send(f.read(MAX_BUF_SIZE))

I would go with:

f = file("filename", "rb")
while True:
data = f.read(MAX_BUF_SIZE)
if not data:
break
sock.sendall(data)
 
G

Gabriel Genellina

On May 22, 3:20 pm, (e-mail address removed) wrote:
I would go with:

f = file("filename", "rb")
while True:
data = f.read(MAX_BUF_SIZE)
if not data:
break
sock.sendall(data)

Another way is to use the shutil module:

fin = open("filename", "rb")
fout = sock.makefile()
shutil.copyfileobj(fin, fout)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,680
Members
48,796
Latest member
Greg L.

Latest Threads

Top