Allowing ref counting to close file items bad style?

D

Dan

Is this discouraged?:

for line in open(filename):
<do something with line>

That is, should I do this instead?:

fileptr = open(filename)
for line in fileptr:
<do something with line>
fileptr.close()

Can I count on the ref count going to zero to close the file?

How about a write case? For example:

class Foo(list):
def __init__(self):
self.extend([1, 2, 3, 4])
def write(self, fileptr):
for item in self:
fileptr.write("%s\n" % item)

foo_obj = Foo()
foo_obj.write(open("the.file", "w"))

Is my data safer if I explicitly close, like this?:
fileptr = open("the.file", "w")
foo_obj.write(fileptr)
fileptr.close()

I understand that the upcoming 'with' statement will obviate this
question, but how about without 'with'?

/Dan
 
P

Paul Rubin

Dan said:
Is this discouraged?:

for line in open(filename):
<do something with line>
Yes.

Can I count on the ref count going to zero to close the file?

You really shouldn't. It's a CPython artifact.
I understand that the upcoming 'with' statement will obviate this
question, but how about without 'with'?

f = open(filename)
try:
for line in f:
<do something with line>
finally:
f.close()
 
S

sjdevnull

Paul said:
You really shouldn't. It's a CPython artifact.

I disagree, somewhat. No, you shouldn't count on the "ref count" per
se going to 0. And you shouldn't count on the file object being GC'd
_immediately_ after the last reference is destroyed. You should be able
to rely on it being GC'd at some point in the not-horribly-distant
future, though.

Doing an explicit .close() is not normally useful and muddies the code
(and introduces more lines for potential bugs to infest).

And yes, I know that the language spec technically allows for no GC at
all--it's a QOI issue, not a spec issue, but any implementation that
didn't GC would be useless as a general Python platform (perhaps useful
for specific embedded uses, but programming for such an environment
would be different from programming for rational python platforms in
bigger ways than this).

(And personally I think the benefits to programmers of guaranteeing
ref-counting semantics would outweigh the additional headaches for
Jython and other alternative implementations).
 
P

Paul Rubin

I disagree, somewhat. No, you shouldn't count on the "ref count" per
se going to 0. And you shouldn't count on the file object being GC'd
_immediately_ after the last reference is destroyed. You should be able
to rely on it being GC'd at some point in the not-horribly-distant
future, though.

Is there something in the language specification that says I should be
able to rely on something like that? In Jython, for example, I think
GC is handled totally by the underlying JVM and therefore totally up
to the Java implementation.
Doing an explicit .close() is not normally useful and muddies the code
(and introduces more lines for potential bugs to infest).

Yes, the "with" statement is the right way to do it.
And yes, I know that the language spec technically allows for no GC at
all--it's a QOI issue, not a spec issue, but any implementation that
QOI?

didn't GC would be useless as a general Python platform (perhaps useful

GC's typically track memory allocation but not file handle allocation.
If you're opening a lot of files, you could run out of fd's before the
GC ever runs.
(And personally I think the benefits to programmers of guaranteeing
ref-counting semantics would outweigh the additional headaches for
Jython and other alternative implementations).

Yes, "with" (doing an implicit close guaranteed to happen at the right
time) takes care of it properly.
 
F

Fredrik Lundh

Dan said:
Is this discouraged?:

for line in open(filename):
<do something with line>

That is, should I do this instead?:

fileptr = open(filename)
for line in fileptr:
<do something with line>
fileptr.close()

depends on the use case; in a small program that you know will only read
a few files, you can leave it to the system (especially on CPython). if
you're about to process large number of files, or you're writing files,
it's usually better to be explicit.

note that to be really safe, you should use try/finally:

f = open(filename)
try:
f.write(...)
finally:
f.close()

</F>
 
S

sjdevnull

Paul said:
Is there something in the language specification that says I should be
able to rely on something like that?

No, as I said I know the language spec doesn't require any GC at all.
In Jython, for example, I think
GC is handled totally by the underlying JVM and therefore totally up
to the Java implementation.

Sure. But most Java GCs are pretty reasonable and for typical code
will run periodically (what I call the not-horribly-distant future).
Yes, the "with" statement is the right way to do it.
Ugh.


QOI?

Sorry, I had introduced and defined it earlier but wound up editing out
that sentence. Quality of implementation.
GC's typically track memory allocation but not file handle allocation.
If you're opening a lot of files, you could run out of fd's before the
GC ever runs.

Yes, if you're opening lots of files quickly without giving the GC time
to work then you may be stuck having to use some hack to support
non-refcounting implementations (or simply deciding that the cost of
doing so is not worth supporting implementations with nondeterministic
GC). Yet another reason I said:
Yes, "with" (doing an implicit close guaranteed to happen at the right
time) takes care of it properly.

In many cases, that's adding additional programmer burden to duplicate
information about an object's lifetime that's already in the code. In
simple cases, it uglifies the code with what should be an unnecessary
statement (and adds additional layers of indentation).

Guaranteeing ref-counting semantics at least for local variables when
you return from a function makes for more readable code and makes life
easier on the programmer.

It's obvious to the reader that in code like:

def myFunc(filename):
f = open(filename, 'r')
for line in f:
# do something not using f

that f is used only in myFunc. Indeed, such scoping is a big part of
the point of having functions, and having to duplicate scope
declarations (via with statements or anything else) is broken.

Having f destructed at least when the function returns makes for more
readable code and fewer mistakes. CPython's refcounting behaves very
nicely in this regard, and Python programmers would be much better
served IMO if the language required at least this level of
sophistication from the GC (if not full ref-counting).
 
P

Paul Rubin

Sure. But most Java GCs are pretty reasonable and for typical code
will run periodically (what I call the not-horribly-distant future).

If your system allows max 100 files open and you're using 98 of them,
then "horribly distant future" can be awfully close by.

Ref counting is a rather primitive GC technique and implementations
shouldn't be stuck having to use it.
It's obvious to the reader that in code like:

def myFunc(filename):
f = open(filename, 'r')
for line in f:
# do something not using f

That's not obvious except by recognizing the idiom and knowing the
special semantics of files. Otherwise, look at

def myOtherFunc(x):
a = SomeClass(x) # make an instance of some class
b = a.foo()
# do something with b

One can't say for sure that 'a' can be destructed when the above
function finishes. Maybe a.foo() saved a copy of its 'self' argument
somewhere. It's the same thing with your file example: "for line in f"
calls f's iter method and them repeatedly calls f's next method.
Those methods could have side effects that save f somewhere.
Having f destructed at least when the function returns makes for more
readable code and fewer mistakes. CPython's refcounting behaves very
nicely in this regard,

The ref counting only works if it applies to all the lower scopes and
not just the local scope. That means you can't use any other type of GC.
 
S

sjdevnull

Paul said:
Ref counting is a rather primitive GC technique

I disagree strongly with this assertion. It's not as efficient overall
as other GC implementations, but it's not a case of "less efficient to
do the same task". Reference counting buys you deterministic GC in the
pretty common case where you do not have circular references--and
determinism is very valuable to programmers. Other GCs be faster, but
they don't actually accomplish the same task.

I can come up with plenty of "superior" algorithms for all kinds of
tasks if I'm not bound to any particular semantics, but losing
correctness for speed is rarely a good idea.
 
M

Matthew Woodcraft

Dan said:
Is this discouraged?:

for line in open(filename):
<do something with line>

That is, should I do this instead?:

fileptr = open(filename)
for line in fileptr:
<do something with line>
fileptr.close()

One reason to use close() explicitly is to make sure that errors are
reported properly.

If you use close(), an error from the operating system will cause an
exception at a well-defined point in your code. With the implicit
close, an error will probably cause a message to be spewed to stderr
and you might never know about it.

If (as in your example) the file was open for reading only, errors from
close() are unlikely. But I do not think they are guaranteed not to
occur. If you were writing to the file, checking for errors on close()
is indispensable.

-M-
 
P

Paul Rubin

I disagree strongly with this assertion. It's not as efficient overall
as other GC implementations, but it's not a case of "less efficient to
do the same task". Reference counting buys you deterministic GC in the
pretty common case where you do not have circular references--and
determinism is very valuable to programmers. Other GCs be faster, but
they don't actually accomplish the same task.

GC is supposed to create the illusion that all objects stay around
forever. It releases unreachable objects since the application can't
tell whether those objects are gone or not.

Closing a file is a state change in which stuff is supposed to
actually happen (buffers flushed, CLOSE message sent over socket,
etc.) That's independent of releasing it. In your example
(simplified):

def func(x):
f = open_some_file(x)
# do stuff with f

it might even be that the open call saves the file handle somewhere,
maybe for logging purposes. You presumably still want it closed at
function exit. The GC can't possibly do that for you. Relying on GC
to close files is simply a kludge that Python users have been relying
on, because doing it "manually" has been messy prior to 2.5.
I can come up with plenty of "superior" algorithms for all kinds of
tasks if I'm not bound to any particular semantics, but losing
correctness for speed is rarely a good idea.

Then don't write incorrect code that relies on the GC's implementation
accidents to make it work ;-). PEP 343 really is the right way to
handle this.
 
S

sjdevnull

Paul said:
GC is supposed to create the illusion that all objects stay around
forever. It releases unreachable objects since the application can't
tell whether those objects are gone or not.

No, that's not true of all GC implementations. Refcounting
implementations give much nicer deterministic guarantees.
 
?

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

Is this discouraged?:

for line in open(filename):
<do something with line>

In theory, it is. In practice, that is the way Python code is written
because it more natural and to the point. Not just for hacked together
scripts, lots of third party modules includes code like "data =
open(filename).read()" and similar idioms.
Is my data safer if I explicitly close, like this?:
fileptr = open("the.file", "w")
foo_obj.write(fileptr)
fileptr.close()

Have you ever experienced a problem caused by not explicitly closing
your file handles?
 
D

Dan

BJörn Lindqvist said:
Have you ever experienced a problem caused by not explicitly closing
your file handles?

No. If I had, I wouldn't have asked the question. It seems to work,
but can I really count on it?

I am a sample of one (In that happy place that Brooks described as
quadrant 1, where a person writes programs for himself to be run on his
own computer. Or perhaps to be run by a handful of co-workers.) Such a
small sample isn't statistically significant; a 100% success rate
doesn't mean much. I've also never had a burned CD go bad, but I know
that they do.

/Dan
 
S

sjdevnull

Dan said:
No. If I had, I wouldn't have asked the question. It seems to work,
but can I really count on it?

In CPython, you can rely on file objects to be closed when the last
reference to them is removed. In general, the language spec does not
guarantee that so if you use Jython or other implementations you cannot
rely on files being closed on last reference. (see my other posts in
this thread for why I think the language spec should be changed to
guarantee the ref-counting semantics at least in simple cases).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,756
Messages
2,569,540
Members
45,025
Latest member
KetoRushACVFitness

Latest Threads

Top