Q on explicitly calling file.close

K

kj

There's something wonderfully clear about code like this:

# (1)
def spam(filename):
for line in file(filename):
do_something_with(line)

It is indeed pseudo-codely beautiful. But I gather that it is not
correct to do this, and that instead one should do something like

# (2)
def spam(filename):
fh = file(filename)
try:
for line in fh:
do_something_with(line)
finally:
fh.close()

....or alternatively, if the with-statement is available:

# (3)
def spam(filename):
with file(filename) as fh:
for line in fh:
do_something_with(line)

Mind you, (3) is almost as simple as (1) (only one additional line),
but somehow it lacks (1)'s direct simplicity. (And it adds one
more indentation level, which I find annoying.) Furthermore, I
don't recall ever coming across either (2) or (3) "in the wild",
even after reading a lot of high-quality Python code (e.g. standard
library modules).

Finally, I was under the impression that Python closed filehandles
automatically when they were garbage-collected. (In fact (3)
suggests as much, since it does not include an implicit call to
fh.close.) If so, the difference between (1) and (3) does not seem
very big. What am I missing here?

kynn
 
M

MRAB

kj said:
There's something wonderfully clear about code like this:

# (1)
def spam(filename):
for line in file(filename):
do_something_with(line)

It is indeed pseudo-codely beautiful. But I gather that it is not
correct to do this, and that instead one should do something like

# (2)
def spam(filename):
fh = file(filename)
try:
for line in fh:
do_something_with(line)
finally:
fh.close()

...or alternatively, if the with-statement is available:

# (3)
def spam(filename):
with file(filename) as fh:
for line in fh:
do_something_with(line)

Mind you, (3) is almost as simple as (1) (only one additional line),
but somehow it lacks (1)'s direct simplicity. (And it adds one
more indentation level, which I find annoying.) Furthermore, I
don't recall ever coming across either (2) or (3) "in the wild",
even after reading a lot of high-quality Python code (e.g. standard
library modules).

Finally, I was under the impression that Python closed filehandles
automatically when they were garbage-collected. (In fact (3)
suggests as much, since it does not include an implicit call to
fh.close.) If so, the difference between (1) and (3) does not seem
very big. What am I missing here?
CPython uses reference counting, so an object is garbage collected as
soon as there are no references to it, but that's just an implementation
detail.

Other implementations, such as Jython and IronPython, don't use
reference counting, so you don't know when an object will be garbage
collected, which means that the file might remain open for an unknown
time afterwards in case 1 above.

Most people use CPython, so it's not surprising that case 1 is so
common.
 
D

Dave Angel

kj said:
There's something wonderfully clear about code like this:

# (1)
def spam(filename):
for line in file(filename):
do_something_with(line)

It is indeed pseudo-codely beautiful. But I gather that it is not
correct to do this, and that instead one should do something like

# (2)
def spam(filename):
fh = file(filename)
try:
for line in fh:
do_something_with(line)
finally:
fh.close()

...or alternatively, if the with-statement is available:

# (3)
def spam(filename):
with file(filename) as fh:
for line in fh:
do_something_with(line)

Mind you, (3) is almost as simple as (1) (only one additional line),
but somehow it lacks (1)'s direct simplicity. (And it adds one
more indentation level, which I find annoying.) Furthermore, I
don't recall ever coming across either (2) or (3) "in the wild",
even after reading a lot of high-quality Python code (e.g. standard
library modules).

Finally, I was under the impression that Python closed filehandles
automatically when they were garbage-collected. (In fact (3)
suggests as much, since it does not include an implicit call to
fh.close.) If so, the difference between (1) and (3) does not seem
very big. What am I missing here?

kynn
We have to distinguish between reference counted and garbage collected.
As MRAB says, when the reference count goes to zero, the file is
immediately closed, in CPython implementation. So all three are
equivalent on that platform.

But if you're not sure the code will run on CPython, then you have to
have something that explicitly catches the out-of-scopeness of the file
object. Both your (2) and (3) do that, with different syntaxes.

DaveA
 
R

r

We have to distinguish between reference counted and garbage collected.  
As MRAB says, when the reference count goes to zero, the file is
immediately closed, in CPython implementation.  So all three are
equivalent on that platform.

But if you're not sure the code will run on CPython, then you have to
have something that explicitly catches the out-of-scopeness of the file
object.  Both your (2) and (3) do that, with different syntaxes.

DaveA

Stop being lazy and close the file. You don't want open file objects
just floating around in memory. Even the docs says something like
"yes, python will free the memory associated with a file object but
you can never *really* be sure *when* this will happen, so just
explicitly close the damn thing!". Besides, you can't guarantee that
any data has been written without calling f.flush() or f.close()
first. What if your program crashes and no data is written? huh?

I guess i could put my pants on by jumping into both legs at the same
time thereby saving one step, but i my fall down and break my arm. I
would much rather just use the one leg at a time approach...
 
D

Dennis Lee Bieber

...or alternatively, if the with-statement is available:

# (3)
def spam(filename):
with file(filename) as fh:
for line in fh:
do_something_with(line)
said:
Finally, I was under the impression that Python closed filehandles
automatically when they were garbage-collected. (In fact (3)
suggests as much, since it does not include an implicit call to
fh.close.) If so, the difference between (1) and (3) does not seem
very big. What am I missing here?

In the case of the with construct, in effect the with statement IS
equivalent to:

fh = file(filename)
for line in fh:
do something...
fh.close()

with the proper safeguards to ensure the close() is called. Essentially,
anything "opened" by a with clause is "closed" when the block is left.
 
T

Tim Chase

CPython uses reference counting, so an object is garbage collected as
soon as there are no references to it, but that's just an implementation
detail.

Other implementations, such as Jython and IronPython, don't use
reference counting, so you don't know when an object will be garbage
collected, which means that the file might remain open for an unknown
time afterwards in case 1 above.

Most people use CPython, so it's not surprising that case 1 is so
common.

Additionally, many scripts just use a small number of files (say,
1-5 files) so having a file-handle open for the duration of the
run it minimal overhead.

On the other hand, when processing thousands of files, I always
explicitly close each file to make sure I don't exhaust some
file-handle limit the OS or interpreter may enforce.

-tkc
 
R

r

True, but i find the with statement (while quite useful in general
practice) is not a "cure all" for situations that need and exception
caught. In that case the laborious finger wrecking syntax of "f.close
()" must be painstaking typed letter by painful letter.

f-.-c-l-o-s-e-(-)

It's just not fair ;-(
 
S

Steven D'Aprano

Finally, I was under the impression that Python closed filehandles
automatically when they were garbage-collected. (In fact (3) suggests
as much, since it does not include an implicit call to fh.close.) If so,
the difference between (1) and (3) does not seem very big. What am I
missing here?

(1) Python the language will close file handles, but doesn't guarantee
when. Some implementations (e.g. CPython) will close them immediately the
file object goes out of scope. Others (e.g. Jython) will close them
"eventually", which may be when the program exists.

(2) If the file object never goes out of scope, say because you've stored
a reference to it somewhere, the file will never be closed and you will
leak file handles. Since the OS only provides a finite number of them,
any program which uses large number of files is at risk of running out.

(3) For quick and dirty scripts, or programs that only use one or two
files, relying on the VM to close the file is sufficient (although lazy
in my opinion *wink*) but for long-running applications using many files,
or for debugging, you may want more control over what happens when.
 
K

kj

In said:
(3) For quick and dirty scripts, or programs that only use one or two
files, relying on the VM to close the file is sufficient (although lazy
in my opinion *wink*)

It's not a matter of laziness or industriousness, but rather of
code readability. The real problem here is not the close() per
se, but rather all the additional machinery required to ensure that
the close happens. When the code is working with multiple file
handles simultaneously, one ends up with a thicket of try/finally's
that makes the code just *nasty* to look at. E.g., even with only
two files, namely an input and an output file, compare:

def nice(from_, to_):
to_h = file(to_, "w")
for line in file(from_):
print >> to_h, munge(line)

def nasty(from_, to_):
to_h = file(to_, "w")
try:
from_h = file(from_)
try:
for line in from_h:
print >> to_h, munge(line)
finally:
from_h.close()
finally:
to_h.close()

I leave to your imagination the joys of reading the code for
hairy(from_, to_, log_), where log_ is a third file to collect
warning messages.

kynn
 
S

Steven D'Aprano

In <[email protected]> Steven D'Aprano


It's not a matter of laziness or industriousness, but rather of code
readability. The real problem here is not the close() per se, but
rather all the additional machinery required to ensure that the close
happens. When the code is working with multiple file handles
simultaneously, one ends up with a thicket of try/finally's that makes
the code just *nasty* to look at.

Yep, that's because dealing with the myriad of things that *might* (but
probably won't) go wrong when dealing with files is *horrible*. Real
world code is almost always much nastier than the nice elegant algorithms
we hope for.

Most people know they have to deal with errors when opening files. The
best programmers deal with errors when writing to files. But only a few
of the most pedantic coders even attempt to deal with errors when
*closing* the file. Yes, closing the file can fail. What are you going to
do about it? At the least, you should notify the user, then continue.
Dying with an uncaught exception in the middle of processing millions of
records is Not Cool. But close failures are so rare that we just hope
we'll never experience one.

It really boils down to this... do you want to write correct code, or
elegant code?
 
T

Terry Reedy

Stephen said:
This is precisely why the with statement exists; to provide a cleaner
way to wrap a block in setup and teardown functions. Closing is one.
Yeah, you get some extra indentation-- but you sorta have to live with
it if you're worried about correct code. I think it's a good compromise
between your examples of nasty and nice :)

def compromise(from_, to_):
with file(to_) as to_h:
with file(from_) as from_h:
for line in from_h:
print >> to_h, munge(line)

It's just too bad that 'with' doesn't support multiple separate "x as y"
clauses.

The developers already agreed with you ;-).

"With more than one item, the context managers are processed as if
multiple with statements were nested:

with A() as a, B() as b:
suite
is equivalent to

with A() as a:
with B() as b:
suite
Changed in version 3.1: Support for multiple context expressions.
"

(I suspect this will also be in 2.7)

Terry Jan Reedy
 
J

Jan Kaliszewski

05-09-2009 r said:
i find the with statement (while quite useful in general
practice) is not a "cure all" for situations that need and exception
caught.

In what sense?

I think that:

with open(...) as f:
foo...

is equivalent to:

f = open(...)
try:
foo...
finally:
f.close()

Obviously it doesn't substitute catching with 'except', but I don't
see how it could disturb that.

Cheers,
*j
 
R

r

In what sense?

*ahem*! in the sense that the with statement (while quite useful in
general practice) is not a "cure all" for situations that need and
exception caught ;-)
I think that:

    with open(...) as f:
       foo...

is equivalent to:

    f = open(...)
    try:
        foo...
    finally:
        f.close()

Obviously it doesn't substitute catching with 'except'

My sentiments exactly...?


Get Enlightened -> http://jjsenlightenments.blogspot.com/
 
G

Gabriel Genellina

But what does that even mean? What's the -problem- that it isn't
'curing'?
That 'with' doesn't completely replace all uses of 'try'? It was never
meant
to-- that seems like a completely different problem area. But, if you do
want it to handle exception catching, that's just a question of making a
context manager that does so.

import contextlib

@contextlib.contextmanager
def file_with_errorhandling(filename, mode,
exceptions=(IOError,OSError)):
fp = file(filename, mode)
try:
yield fp
except exceptions:
print "Handle file-operation errors gracefully here."
fp.close()

with file_with_errorhandling(filename, 'r') as fp:
fp.read() # or whatever

True, the context manager provided by file objects just lets exceptions
propagate up but that's usually the desired behavior. You can make
context
managers for particular problems you run across that handle exceptions if
you want.

Note that to correctly emulate the file built-in context manager,
fp.close() should be inside a finally clause.
The with statement isn't about never having to type try again, I don't
think.

The with statement is intended as a replacement for common try/finally
blocks, not try/except blocks as the OP seems to imply.
 
R

r

These days I've actually got the syntax and spelling memorized -
I can type "close()" without needing to look it up!

+1

You are so right David! I think some people around here need to look
up "code reuse". Here are a couple of simple templates for our friends
to study...

def read_file(fname, mode='rb'):
'''open file and return contents'''
try:
f = open(fname, mode)
s = f.read()
return s
except:
return 0
finally:
f.close()

def write_file(fname, s, mode='wb'):
'''open file, truncate, and write string'''
try:
f = open(fname, mode)
f.write(s)
return 1
except:
return 0
finally:
f.close()


#-- Extra Credit --#
Create an append_file() function that takes a <filename> and <string>
as args and appends to the file returning 1 on success, and 0 on
failure.

#-- Double Extra Creidit --#
 
C

Charles Yeomans

+1

You are so right David! I think some people around here need to look
up "code reuse". Here are a couple of simple templates for our friends
to study...

def read_file(fname, mode='rb'):
'''open file and return contents'''
try:
f = open(fname, mode)
s = f.read()
return s
except:
return 0
finally:
f.close()

def write_file(fname, s, mode='wb'):
'''open file, truncate, and write string'''
try:
f = open(fname, mode)
f.write(s)
return 1
except:
return 0
finally:
f.close()


Unfortunately, both of these simple templates have the following
problem -- if open fails, a NameError will be raised from the finally
block.


def read_file(fname, mode='rb'):
'''open file and return contents'''
f = open(fname, mode)
try:
return f.read()

finally:
f = f.close()

I removed the except block because I prefer exceptions to error codes.

def write_file(fname, s, mode='wb'):
'''open file, truncate, and write string'''
f = open(fname, mode)
try:
f.write(s)

finally:
f = f.close()

In addition to fixing the latent bug in the second simple template, I
took the opportunity to correct your heinous violation of command-
query separation.


Charles Yeomans
 
R

r

Unfortunately, both of these simple templates have the following  
problem -- if open fails, a NameError will be raised from the finally  
block.
(snip)
I removed the except block because I prefer exceptions to error codes.

how will the caller know an exception has occurred? What if logic
depends on the validation that a file *had* or *had not* been written
too, huh?
In addition to fixing the latent bug in the second simple template, I  
took the opportunity to correct your heinous violation of command-
query separation.

Charles Yeomans

Oh I see! But what happens if the filename does not exist? What then?
"open" will blow chucks thats what! Here is a version for our paranoid-
schizophrenic-sadomasochist out there...

def egor_read_file(fname, mode='rb'):
print 'yes, master'
try:
f = open(fname, mode=mode)
except IOError:
return (0, 'But, the file no open master!')

try:
s = f.read()
except NameError:
return (0, 'the file still no open master!')

try:
f.close()
except:
print 'That file sure is tricky master!

return (s, 'Whew! here is the file contents, master')

MRAB wrote:
You should've used raw strings. :)

rats!, you got me on that one :)
 
G

Gabriel Genellina

En Thu, 10 Sep 2009 08:26:16 -0300, David C. Ullrich
Well first, we agree that putting the open() in the try part of a
try-finally is wrong. try-finally is supposed to ensure that
_allocated_ resources are cleaned up.

What you do below may work. But it's essentially throwing
out exception handling and using error codes instead. There
are plenty of reasons why exceptions are preferred. The
standard thing is this:

def UseResource(rname):
r = get(rname)
try:
r.use()
finally
r.cleanup()

And it is so widely used that it got its own syntax (the with statement)
and library support (contextlib, for creating custom context managers). In
any decent version of Python this idiom becomes:

with get(rname) as r:
r.use

assuming get(rname) returns a suitable object (that defines __enter__ and
__exit__)
What's above seems simpler. More important, if you do
it this way then you _always_ have to check the return
value of egor_read_file and take appropriate action -
complicates the code everywhere the function is called
as well as making the function more complicated.
Doing it as in UseResource() above you don't need
to worry about whether UseResource() failed
_except_ in situations where you're certain that
that's the right level to catch the error.

That's the standard argument showing why structured exceptions are a Good
Thing. I'd say everyone should be aware of that, giving the ample usage of
exceptions in Python, but looks like people requires a reminder from time
to time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top