when should I explicitely close a file?

G

gelonida

Hi,


I've been told, that following code snippet is not good.


open("myfile","w").write(astring) , because I'm neither explicitely
closing nor using the new 'with' syntax.

What exactly is the impact of not closing the file explicitely
(implicitley with a 'with' block)?


Even with my example
I'd expected to get an exception raised if not all data could have
been written.

I'd also expected, that all write data is flushed as soon as the
filehandle is out of scope (meaning in the next line of my source
code).


Thanks for explaining me exactly what kind of evil I could encounter
with not explicitely closing.
 
C

Chris Rebert

Hi,

I've been told, that following code snippet is not good.

open("myfile","w").write(astring) , because I'm neither explicitely
closing nor using the new 'with' syntax.

What exactly is the impact of not closing the file explicitely
(implicitley with a 'with' block)?

Even with my example
I'd expected to get an exception raised if not all data could have
been written.

I'd also expected, that all write data is flushed as soon as the
filehandle is out of scope (meaning in the next line of my source
code).

That extremely-quick responsiveness of the garbage-collection
machinery is only guaranteed by CPython, not the language
specification itself, and indeed some of the other implementations
*explicitly don't* make that guarantee (and hence the data may not get
flushed in a timely manner on those implementations). And portability
of code is encouraged, hence the admonishment you encountered.

Cheers,
Chris
 
S

Steven D'Aprano

Hi,


I've been told, that following code snippet is not good.


open("myfile","w").write(astring) , because I'm neither explicitely
closing nor using the new 'with' syntax.

What exactly is the impact of not closing the file explicitely
(implicitley with a 'with' block)?


Your data may not be actually written to disk until the file closes. If
you have a reference loop, and Python crashes badly enough, the garbage
collector may never run and the file will never be closed, hence you will
get data loss.

If you are running something other than CPython (e.g. IronPython or
Jython) then the file might not be closed until your program exits. If
you have a long-running program that opens many, many files, it is
possible for you to run out of system file handles and be unable to open
any more.

Best practice is to explicitly close the file when you are done with it,
but for short scripts, I generally don't bother. Laziness is a virtue :)

But for library code and larger applications, I always explicitly close
the file, because I want to control exactly when the file is closed
rather than leave it up to the interpreter. I don't know if my code might
one day end up in a long-running Jython application so I try to code
defensively and avoid even the possibility of a problem.

Even with my example
I'd expected to get an exception raised if not all data could have been
written.

Generally if you get an exception while trying to *close* a file, you're
pretty much stuffed. What are you going to do? How do you recover?

My feeling is that you're probably safe with something as simple as

file("myfile", "w").write("my data\n")

but if you do something like

some_data_structure.filehandle = file("myfile", "w")
some_data_structure.filehandle.write("my data\n")
# ... lots more code here

and some_data_structure keeps the file open until the interpreter shuts
down, there *might* be rare circumstances where you won't be notified of
an exception, depending on the exact circumstances of timing of when the
file gets closed. In the worst case, the file might not be closed until
the interpreter is shutting down *and* has already dismantled the
exception infrastructure, and so you can't get an exception. I don't know
enough about the Python runtime (particularly about how it works during
shutdown) to know how real this danger is, but if it is a danger, I bet
it involves __del__ methods.


I'd also expected, that all write data is flushed as soon as the
filehandle is out of scope (meaning in the next line of my source code).

This is only guaranteed with CPython, not other implementations.

My feeling is that explicit closing is pedantic and careful, implicit
closing is lazy and easy. You make your choice and take your chance :)
 
C

Chris Rebert

What about open('foo', 'w').close().
Does it have the same problems?

Well, no, but that's only because it's a pointless no-op that doesn't
really do anything besides possibly throwing an exception (e.g. if the
script didn't have write access to the current directory).

Cheers,
Chris
 
D

Dave Angel

gelonida said:
Hi,


I've been told, that following code snippet is not good.


open("myfile","w").write(astring) , because I'm neither explicitely
closing nor using the new 'with' syntax.

What exactly is the impact of not closing the file explicitely
(implicitley with a 'with' block)?


Even with my example
I'd expected to get an exception raised if not all data could have
been written.

I'd also expected, that all write data is flushed as soon as the
filehandle is out of scope (meaning in the next line of my source
code).


Thanks for explaining me exactly what kind of evil I could encounter
with not explicitely closing.
Evil? No. Just undefined behavior.

The language does NOT guarantee that a close or even a flush will occur
when an object "goes out of scope." This is the same in Python as it is
in Java. There's also no exception for data not being flushed.

In one particular implementation of Python, called CPython, there are
some things that tend to help. So if you're sure you're always going to
be using this particular implementation, and understand what the
restrictions are, then go ahead and be sloppy. Similarly, on some OS
systems, files are flushed when a process ends. So if you know your
application is only going to run on those environments, you might not
bother closing files at the end of execution.

It all depends on how restrictive your execution environment is going to be.

DaveA
 
R

Ryan Kelly

Well, no, but that's only because it's a pointless no-op that doesn't
really do anything besides possibly throwing an exception (e.g. if the
script didn't have write access to the current directory).

Actually, it will create the file if it doesn't exist, and truncate it
to zero length if it does.


Ryan
 
L

Lawrence D'Oliveiro

In message
gelonida said:
I've been told, that following code snippet is not good.

open("myfile","w").write(astring) ...

I do that for reads, but never for writes.

For writes, you want to give a chance for write errors to raise an exception
and alert the user, instead of failing silently, to avoid inadvertent data
loss. Hence the explicit close.
 
L

Lie Ryan

In message


I do that for reads, but never for writes.

For writes, you want to give a chance for write errors to raise an exception
and alert the user, instead of failing silently, to avoid inadvertent data
loss. Hence the explicit close.

In short, in case of doubt, just be explicit.

Since in python nothing is guaranteed about implicit file close, you
must always explicitly close it.
 
L

Lawrence D'Oliveiro

Since in python nothing is guaranteed about implicit file close ...

It is guaranteed that objects with a reference count of zero will be
disposed. In my experiments, this happens immediately.
 
S

Steven D'Aprano

It is guaranteed that objects with a reference count of zero will be
disposed.

Not all Python implementations have reference counts at all, e.g. Jython
and IronPython. Neither of those close files immediately.

In my experiments, this happens immediately.

Are your experiments done under PyPy, CLPython, or Pynie?
 
A

Alf P. Steinbach

* Lawrence D'Oliveiro:
It is guaranteed that objects with a reference count of zero will be
disposed.

Only in current CPython.

In my experiments, this happens immediately.

Depends what you mean, but even in current CPython destruction of a local can be
postponed indefinitely if a reference to the stack frame is kept somewhere.

And that happens, for example, when an exception is raised (until the handler
completes, but it doesn't necessarily complete for a Very Long Time).


Cheers & hth.,

- Alf
 
A

Adam Tauno Williams

It is guaranteed that objects with a reference count of zero will be
disposed. In my experiments, this happens immediately.

A current implementation specific detail. Always close files.
Otherwise, in the future, or on a different run-time, your code will
break.
 
L

Lawrence D'Oliveiro

Chris said:
Experiment with an implementation other than CPython and prepare to be
surprised.

Any implementation that doesn’t do reference-counting is brain-damaged.
 
S

Steven D'Aprano

Any implementation that doesn’t do reference-counting is brain-damaged.

Funny, that's exactly what other people say about implementations that
*do* use reference counting.
 
A

Adam Tauno Williams

Any implementation that doesn’t do reference-counting is brain-damaged.

Why? There are much better ways to do memory management / garbage
collection; especially when dealing with large applications.
 
A

Alf P. Steinbach

* Adam Tauno Williams:

Depends on what the statement was meant to mean.

But for a literal context-free interpretation e.g. the 'sys.getrefcount'
function is not documented as CPython only and thus an implementation that
didn't do reference counting would not be a conforming Python implementation.

Whether it uses reference counting to destroy objects at earliest opportunity is
another matter.

There are much better ways to do memory management / garbage
collection; especially when dealing with large applications.

Depends on whether you're talking about Python implementations or as a matter of
general principle, and depends on how you define "better", "large" and so on.

On its own it's a pretty meaningless statement.

But although a small flame war erupted the last time I mentioned this, I think a
case can be made that Python is not designed for programming-in-the-large. And
that the current CPython scheme is eminently suitable for small scripts. But it
has its drawbacks, especially considering the various ways that stack frames can
be retained, and considering the documentation of 'gc.garbage', ...

"Objects that have __del__() methods and are part of a reference cycle cause
the entire reference cycle to be uncollectable, including objects not
necessarily in the cycle but reachable only from it."

.... which means that a programming style assuming current CPython semantics and
employing RAII can be detrimental in a sufficiently large system.


Cheers & hth.,

- Alf
 
S

Steven D'Aprano

But for a literal context-free interpretation e.g. the 'sys.getrefcount'
function is not documented as CPython only and thus an implementation
that didn't do reference counting would not be a conforming Python
implementation.

Since Jython and IronPython are conforming Python implementations, and
Guido has started making policy decisions specifically to support these
other implementations (e.g. the language feature moratorium, PEP 3003), I
think we can assume that this is a documentation bug.

However, a Python implementation that always returned 0 for
sys.getrefcount would technically satisfy the word of the documentation,
if not the spirit.
 
A

Alf P. Steinbach

* Steven D'Aprano:
Since Jython and IronPython are conforming Python implementations, and
Guido has started making policy decisions specifically to support these
other implementations (e.g. the language feature moratorium, PEP 3003), I
think we can assume that this is a documentation bug.

The documentation for Jython specifies the same for 'sys.getrefcount'.

However, testing:

<output>
*sys-package-mgr*: processing new jar, 'C:\Program Files\jython2.5.1\jython.jar'
*sys-package-mgr*: processing new jar, 'C:\Program
Files\Java\jre6\lib\resources.jar'
*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\rt.jar'
*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\jsse.jar'
*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\jce.jar'
*sys-package-mgr*: processing new jar, 'C:\Program Files\Java\jre6\lib\charsets.jar'
*sys-package-mgr*: processing new jar, 'C:\Program
Files\Java\jre6\lib\ext\dnsns.jar'
*sys-package-mgr*: processing new jar, 'C:\Program
Files\Java\jre6\lib\ext\localedata.jar'
*sys-package-mgr*: processing new jar, 'C:\Program
Files\Java\jre6\lib\ext\sunjce_provider.jar'
*sys-package-mgr*: processing new jar, 'C:\Program
Files\Java\jre6\lib\ext\sunmscapi.jar'
*sys-package-mgr*: processing new jar, 'C:\Program
Files\Java\jre6\lib\ext\sunpkcs11.jar'
A created
Traceback (most recent call last):
File "c:\test\refcount.py", line 17, in <module>
writeln( str( sys.getrefcount( a ) - 1 ) )
AttributeError: 'systemstate' object has no attribute 'getrefcount'
However, a Python implementation that always returned 0 for
sys.getrefcount would technically satisfy the word of the documentation,
if not the spirit.

Yes.

OK, learned something new: I though Jython actually implemented getrefcount.

The Jython docs says it does...


Cheers,

- Alf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,141
Latest member
BlissKeto
Top