Questions about file object and close()

J

John Marshall

Hi,

Does anyone see a problem with doing:
data = file("tata").read()

Each time this is done, I see a new file
descriptor allocated (Linux) but not
released.

1) Will there ever be a point where I
will have a problem with file
descriptors because the garbage
collector has _not_ yet collected the
file objects?

2) When I subclassed the file object as
follows:
-----
class MyFile(file):
def close(self):
print "MyFile.close()"
file.close(self)
-----
and did a simple 'MyFile("tata")' I did not see
a call to MyFile.close(). Am I wrong to have
expected MyFile.close() to have been called?

3) There is no file.__del__() as far as I
can tell at the Python level. Are files
opened by the calls above properly
closed when the objects are destroyed
and collected?

Thanks,
John
 
P

Peter Hansen

John said:
Hi,

Does anyone see a problem with doing:
data = file("tata").read()

Each time this is done, I see a new file
descriptor allocated (Linux) but not
released.

1) Will there ever be a point where I
will have a problem with file
descriptors because the garbage
collector has _not_ yet collected the
file objects?

Should be easy to check. Write a loop which
does that many times. There are a finite
number of file descriptors available, so if
it's going to fail, it will fail fairly
quickly.
3) There is no file.__del__() as far as I
can tell at the Python level. Are files
opened by the calls above properly
closed when the objects are destroyed
and collected?

Yes, but you can only count on this happening
in the CPython implementation. Nevertheless,
it's still widely considered more than just good style
to explicitly close your files within a finally
clause, even in CPython where technically you don't
have to in most cases:

f = file("tata")
try:
data = f.read()
finally:
f.close()

The above is quite robust and should be your model
for all file access, at least until you're much more
experienced with Python.

One should use the open().read() idiom only in small
utility scripts and other such short-running applications.

-Peter
 
J

John Marshall

Should be easy to check. Write a loop which
does that many times. There are a finite
number of file descriptors available, so if
it's going to fail, it will fail fairly
quickly.

I did do this and it did not fail. My concern
was since the close() is not done explicitly
by me, and does not seem to be called in
a file.__del__() or otherwise, I was not
sure. I want to be sure! Given your comment
below about the CPython implementation, there
is no guarantee which seems unreasonable for
such an operation.

It seems to me that a file.__del__() _should_
call a file.close() to make sure that the file
is closed as a clean up procedure before
releasing the object. I cannot see why this
would not be the prescribed behavior and thus
my question. Isn't that what __del__
(destructors) are supposed to handle--cleaning
up?
Yes, but you can only count on this happening
in the CPython implementation. Nevertheless,
it's still widely considered more than just good style
to explicitly close your files within a finally
clause, even in CPython where technically you don't
have to in most cases:

f = file("tata")
try:
data = f.read()
finally:
f.close()

The above is quite robust and should be your model
for all file access, at least until you're much more
experienced with Python.

How would more experience change this? Assuming I
am catching any exceptions I am interested in, why
wouldn't the following be just as good?
try:
data = file("tata").read()
except:
...

One should use the open().read() idiom only in small
utility scripts and other such short-running applications.

I don't see why this is so only for small scripts. As
I question above, why doesn't the file object clean up
after itself as a guaranteed course of action?

Of course, I could implement my own file object to
guarantee the clean up and be on my way. But I am
still surprised at what I am seeing.

Thanks,
John
 
P

Peter Hansen

John said:
It seems to me that a file.__del__() _should_
call a file.close() to make sure that the file
is closed as a clean up procedure before
releasing the object.

I believe it does, but I tried your experiment
with subclassing file and didn't ever see a
call to close, so I can only assume that the
built-in __del__() is actually just calling the
builtin close() and bypassing my overridden close(),
although there could also be some other magic about
how files behave that explains this.
I don't see why this is so only for small scripts. As
I question above, why doesn't the file object clean up
after itself as a guaranteed course of action?

The issue is that although __del__ is calling
close, there is no guarantee in Python about when
__del__ is run, nor in fact that it will ever be
run. (If nothing else, a call to os._exit() will
always bypass normal shutdown.) In Jython, for
example, there is no reference counting the way CPython
does it, so __del__ methods are called only when the
object is garbage collected. When does that happen?
There's no guarantee: if you haven't explicitly closed
the file, it might not get closed until the interpreter
is shutting down (if then).

In CPython, you at least (currently) have sort of a
guarantee that the file will be closed when the object
is destroyed, which because of reference counting will
happen as soon as you "del file" or rebind the name
to another object, or whatever.

So in CPython, it is working properly (and you shouldn't
run out of file descriptors unless you are into
complicated code where the file objects are being kept
in cyclical data structures that cannot be reclaimed
through simple reference counting) but I cannot explain
why we don't see a subclass's close() method get called
when __del__ does, as it must, get called.

-Peter
 
S

Scott David Daniels

John said:
It seems to me that a file.__del__() _should_
> .... how he wishes it were designed ....
Isn't that what __del__ (destructors) are supposed to handle
> --cleaning up?

Just in case you are actually asking, and not simply complaining:

The existence of a __del__ method affects when a garbage collect
may remove an object (and may in some cases delay it). In Jython
(and on top of any system doing its own garbage collection),
there may be no control over when an object is deallocated.
A close on an output file may finalize I/O that has been considered
a mistake, and it might cause an I/O error that will prevent the rest
of the program from executing.
I don't see why this is so only for small scripts. As
I question above, why doesn't the file object clean up
after itself as a guaranteed course of action?

If you really want, define a function like:

def contents(filename):
source = open(filename)
try:
return source.read()
finally:
source.close()

and then you can make your small uses clear.


-Scott David Daniels
(e-mail address removed)
 
J

John Marshall

I believe it does, but I tried your experiment
with subclassing file and didn't ever see a
call to close, so I can only assume that the
built-in __del__() is actually just calling the
builtin close() and bypassing my overridden close(),
although there could also be some other magic about
how files behave that explains this.

I took a look at the filemodule.c code and in the
file_dealloc() which is registered as the destructor,
a close() is done on the file as your surmised.
In fact, the file_dealloc() does not even call
the file_close() but simply does an OS close().
The issue is that although __del__ is calling
close, there is no guarantee in Python about when
__del__ is run, nor in fact that it will ever be
run.

In CPython, you at least (currently) have sort of a
guarantee that the file will be closed when the object
is destroyed, which because of reference counting will
happen as soon as you "del file" or rebind the name
to another object, or whatever.

So in CPython, it is working properly (and you shouldn't
run out of file descriptors unless you are into
complicated code where the file objects are being kept
in cyclical data structures that cannot be reclaimed
through simple reference counting) but I cannot explain
why we don't see a subclass's close() method get called
when __del__ does, as it must, get called.

Thanks for the explanation. After reading some of
the comp.lang.python stuff it seems that an improvement
to the Python FAQ would be worthwhile (unless I've
missed something else in it). The FAQ (1.6.11) says,
The del statement does not necessarily call
__del__ -- it simply decrements the object's
reference count, and if this reaches zero __del__
is called.

This really does communicate that Python (not
unambiguously CPython only, as of 2004-12-09) will
call __del__ when refcount == 0.

It seems that if one cannot expect __del__ to be
called when nothing references it (using any GC
approach) then its only use is to free
memory/objects not to release other resources such
as file descriptors, socket descriptors, etc.
Thus the recommendation to explicitly do a
file.close() after reading.

Thanks,
John
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top