Allowing ref counting to close file items bad style?

Discussion in 'Python' started by Dan, Aug 30, 2006.

  1. Dan

    Dan Guest

    Is this discouraged?:

    for line in open(filename):
    <do something with line>

    That is, should I do this instead?:

    fileptr = open(filename)
    for line in fileptr:
    <do something with line>
    fileptr.close()

    Can I count on the ref count going to zero to close the file?

    How about a write case? For example:

    class Foo(list):
    def __init__(self):
    self.extend([1, 2, 3, 4])
    def write(self, fileptr):
    for item in self:
    fileptr.write("%s\n" % item)

    foo_obj = Foo()
    foo_obj.write(open("the.file", "w"))

    Is my data safer if I explicitly close, like this?:
    fileptr = open("the.file", "w")
    foo_obj.write(fileptr)
    fileptr.close()

    I understand that the upcoming 'with' statement will obviate this
    question, but how about without 'with'?

    /Dan

    --
    dedded att verizon dott net
    Dan, Aug 30, 2006
    #1
    1. Advertising

  2. Dan

    Paul Rubin Guest

    Dan <> writes:
    > Is this discouraged?:
    >
    > for line in open(filename):
    > <do something with line>


    Yes.

    > Can I count on the ref count going to zero to close the file?


    You really shouldn't. It's a CPython artifact.

    > I understand that the upcoming 'with' statement will obviate this
    > question, but how about without 'with'?


    f = open(filename)
    try:
    for line in f:
    <do something with line>
    finally:
    f.close()
    Paul Rubin, Aug 30, 2006
    #2
    1. Advertising

  3. Dan

    Dan Guest

    Paul Rubin wrote:
    > Dan <> writes:
    >> Is this discouraged?:
    >>
    >> for line in open(filename):
    >> <do something with line>

    >
    > Yes.


    Well, not what I wanted to hear, but what I expected.

    Thanks,
    Dan

    --
    dedded att verizon dott net
    Dan, Aug 30, 2006
    #3
  4. Dan

    Guest

    Paul Rubin wrote:
    > Dan <> writes:
    > > Is this discouraged?:
    > >
    > > for line in open(filename):
    > > <do something with line>

    >
    > Yes.
    >
    > > Can I count on the ref count going to zero to close the file?

    >
    > You really shouldn't. It's a CPython artifact.


    I disagree, somewhat. No, you shouldn't count on the "ref count" per
    se going to 0. And you shouldn't count on the file object being GC'd
    _immediately_ after the last reference is destroyed. You should be able
    to rely on it being GC'd at some point in the not-horribly-distant
    future, though.

    Doing an explicit .close() is not normally useful and muddies the code
    (and introduces more lines for potential bugs to infest).

    And yes, I know that the language spec technically allows for no GC at
    all--it's a QOI issue, not a spec issue, but any implementation that
    didn't GC would be useless as a general Python platform (perhaps useful
    for specific embedded uses, but programming for such an environment
    would be different from programming for rational python platforms in
    bigger ways than this).

    (And personally I think the benefits to programmers of guaranteeing
    ref-counting semantics would outweigh the additional headaches for
    Jython and other alternative implementations).
    , Aug 30, 2006
    #4
  5. Dan

    Paul Rubin Guest

    "" <> writes:
    > I disagree, somewhat. No, you shouldn't count on the "ref count" per
    > se going to 0. And you shouldn't count on the file object being GC'd
    > _immediately_ after the last reference is destroyed. You should be able
    > to rely on it being GC'd at some point in the not-horribly-distant
    > future, though.


    Is there something in the language specification that says I should be
    able to rely on something like that? In Jython, for example, I think
    GC is handled totally by the underlying JVM and therefore totally up
    to the Java implementation.

    > Doing an explicit .close() is not normally useful and muddies the code
    > (and introduces more lines for potential bugs to infest).


    Yes, the "with" statement is the right way to do it.

    > And yes, I know that the language spec technically allows for no GC at
    > all--it's a QOI issue, not a spec issue, but any implementation that


    QOI?

    > didn't GC would be useless as a general Python platform (perhaps useful


    GC's typically track memory allocation but not file handle allocation.
    If you're opening a lot of files, you could run out of fd's before the
    GC ever runs.

    > (And personally I think the benefits to programmers of guaranteeing
    > ref-counting semantics would outweigh the additional headaches for
    > Jython and other alternative implementations).


    Yes, "with" (doing an implicit close guaranteed to happen at the right
    time) takes care of it properly.
    Paul Rubin, Aug 30, 2006
    #5
  6. Dan wrote:

    > Is this discouraged?:
    >
    > for line in open(filename):
    > <do something with line>
    >
    > That is, should I do this instead?:
    >
    > fileptr = open(filename)
    > for line in fileptr:
    > <do something with line>
    > fileptr.close()


    depends on the use case; in a small program that you know will only read
    a few files, you can leave it to the system (especially on CPython). if
    you're about to process large number of files, or you're writing files,
    it's usually better to be explicit.

    note that to be really safe, you should use try/finally:

    f = open(filename)
    try:
    f.write(...)
    finally:
    f.close()

    </F>
    Fredrik Lundh, Aug 30, 2006
    #6
  7. Dan

    Guest

    Paul Rubin wrote:
    > "" <> writes:
    > > I disagree, somewhat. No, you shouldn't count on the "ref count" per
    > > se going to 0. And you shouldn't count on the file object being GC'd
    > > _immediately_ after the last reference is destroyed. You should be able
    > > to rely on it being GC'd at some point in the not-horribly-distant
    > > future, though.

    >
    > Is there something in the language specification that says I should be
    > able to rely on something like that?


    No, as I said I know the language spec doesn't require any GC at all.

    > In Jython, for example, I think
    > GC is handled totally by the underlying JVM and therefore totally up
    > to the Java implementation.


    Sure. But most Java GCs are pretty reasonable and for typical code
    will run periodically (what I call the not-horribly-distant future).

    > > Doing an explicit .close() is not normally useful and muddies the code
    > > (and introduces more lines for potential bugs to infest).

    >
    > Yes, the "with" statement is the right way to do it.


    Ugh.

    > > And yes, I know that the language spec technically allows for no GC at
    > > all--it's a QOI issue, not a spec issue, but any implementation that

    >
    > QOI?


    Sorry, I had introduced and defined it earlier but wound up editing out
    that sentence. Quality of implementation.

    > > didn't GC would be useless as a general Python platform (perhaps useful

    >
    > GC's typically track memory allocation but not file handle allocation.
    > If you're opening a lot of files, you could run out of fd's before the
    > GC ever runs.


    Yes, if you're opening lots of files quickly without giving the GC time
    to work then you may be stuck having to use some hack to support
    non-refcounting implementations (or simply deciding that the cost of
    doing so is not worth supporting implementations with nondeterministic
    GC). Yet another reason I said:

    > > (And personally I think the benefits to programmers of guaranteeing
    > > ref-counting semantics would outweigh the additional headaches for
    > > Jython and other alternative implementations).

    >
    > Yes, "with" (doing an implicit close guaranteed to happen at the right
    > time) takes care of it properly.


    In many cases, that's adding additional programmer burden to duplicate
    information about an object's lifetime that's already in the code. In
    simple cases, it uglifies the code with what should be an unnecessary
    statement (and adds additional layers of indentation).

    Guaranteeing ref-counting semantics at least for local variables when
    you return from a function makes for more readable code and makes life
    easier on the programmer.

    It's obvious to the reader that in code like:

    def myFunc(filename):
    f = open(filename, 'r')
    for line in f:
    # do something not using f

    that f is used only in myFunc. Indeed, such scoping is a big part of
    the point of having functions, and having to duplicate scope
    declarations (via with statements or anything else) is broken.

    Having f destructed at least when the function returns makes for more
    readable code and fewer mistakes. CPython's refcounting behaves very
    nicely in this regard, and Python programmers would be much better
    served IMO if the language required at least this level of
    sophistication from the GC (if not full ref-counting).
    , Aug 30, 2006
    #7
  8. Dan

    Paul Rubin Guest

    "" <> writes:
    > Sure. But most Java GCs are pretty reasonable and for typical code
    > will run periodically (what I call the not-horribly-distant future).


    If your system allows max 100 files open and you're using 98 of them,
    then "horribly distant future" can be awfully close by.

    > > > (And personally I think the benefits to programmers of guaranteeing
    > > > ref-counting semantics would outweigh the additional headaches for
    > > > Jython and other alternative implementations).


    Ref counting is a rather primitive GC technique and implementations
    shouldn't be stuck having to use it.

    > It's obvious to the reader that in code like:
    >
    > def myFunc(filename):
    > f = open(filename, 'r')
    > for line in f:
    > # do something not using f


    That's not obvious except by recognizing the idiom and knowing the
    special semantics of files. Otherwise, look at

    def myOtherFunc(x):
    a = SomeClass(x) # make an instance of some class
    b = a.foo()
    # do something with b

    One can't say for sure that 'a' can be destructed when the above
    function finishes. Maybe a.foo() saved a copy of its 'self' argument
    somewhere. It's the same thing with your file example: "for line in f"
    calls f's iter method and them repeatedly calls f's next method.
    Those methods could have side effects that save f somewhere.

    > Having f destructed at least when the function returns makes for more
    > readable code and fewer mistakes. CPython's refcounting behaves very
    > nicely in this regard,


    The ref counting only works if it applies to all the lower scopes and
    not just the local scope. That means you can't use any other type of GC.
    Paul Rubin, Aug 30, 2006
    #8
  9. Dan

    Guest

    Paul Rubin wrote:
    > "" <> writes:
    > > > > (And personally I think the benefits to programmers of guaranteeing
    > > > > ref-counting semantics would outweigh the additional headaches for
    > > > > Jython and other alternative implementations).

    >
    > Ref counting is a rather primitive GC technique


    I disagree strongly with this assertion. It's not as efficient overall
    as other GC implementations, but it's not a case of "less efficient to
    do the same task". Reference counting buys you deterministic GC in the
    pretty common case where you do not have circular references--and
    determinism is very valuable to programmers. Other GCs be faster, but
    they don't actually accomplish the same task.

    I can come up with plenty of "superior" algorithms for all kinds of
    tasks if I'm not bound to any particular semantics, but losing
    correctness for speed is rarely a good idea.
    , Aug 30, 2006
    #9
  10. Dan <> wrote:
    > Is this discouraged?:
    >
    > for line in open(filename):
    > <do something with line>
    >
    > That is, should I do this instead?:
    >
    > fileptr = open(filename)
    > for line in fileptr:
    > <do something with line>
    > fileptr.close()


    One reason to use close() explicitly is to make sure that errors are
    reported properly.

    If you use close(), an error from the operating system will cause an
    exception at a well-defined point in your code. With the implicit
    close, an error will probably cause a message to be spewed to stderr
    and you might never know about it.

    If (as in your example) the file was open for reading only, errors from
    close() are unlikely. But I do not think they are guaranteed not to
    occur. If you were writing to the file, checking for errors on close()
    is indispensable.

    -M-
    Matthew Woodcraft, Aug 30, 2006
    #10
  11. Dan

    Paul Rubin Guest

    "" <> writes:
    > I disagree strongly with this assertion. It's not as efficient overall
    > as other GC implementations, but it's not a case of "less efficient to
    > do the same task". Reference counting buys you deterministic GC in the
    > pretty common case where you do not have circular references--and
    > determinism is very valuable to programmers. Other GCs be faster, but
    > they don't actually accomplish the same task.


    GC is supposed to create the illusion that all objects stay around
    forever. It releases unreachable objects since the application can't
    tell whether those objects are gone or not.

    Closing a file is a state change in which stuff is supposed to
    actually happen (buffers flushed, CLOSE message sent over socket,
    etc.) That's independent of releasing it. In your example
    (simplified):

    def func(x):
    f = open_some_file(x)
    # do stuff with f

    it might even be that the open call saves the file handle somewhere,
    maybe for logging purposes. You presumably still want it closed at
    function exit. The GC can't possibly do that for you. Relying on GC
    to close files is simply a kludge that Python users have been relying
    on, because doing it "manually" has been messy prior to 2.5.

    > I can come up with plenty of "superior" algorithms for all kinds of
    > tasks if I'm not bound to any particular semantics, but losing
    > correctness for speed is rarely a good idea.


    Then don't write incorrect code that relies on the GC's implementation
    accidents to make it work ;-). PEP 343 really is the right way to
    handle this.
    Paul Rubin, Aug 31, 2006
    #11
  12. Dan

    Guest

    Paul Rubin wrote:
    > "" <> writes:
    > > I disagree strongly with this assertion. It's not as efficient overall
    > > as other GC implementations, but it's not a case of "less efficient to
    > > do the same task". Reference counting buys you deterministic GC in the
    > > pretty common case where you do not have circular references--and
    > > determinism is very valuable to programmers. Other GCs be faster, but
    > > they don't actually accomplish the same task.

    >
    > GC is supposed to create the illusion that all objects stay around
    > forever. It releases unreachable objects since the application can't
    > tell whether those objects are gone or not.


    No, that's not true of all GC implementations. Refcounting
    implementations give much nicer deterministic guarantees.
    , Aug 31, 2006
    #12
  13. On 8/30/06, Dan <> wrote:
    > Is this discouraged?:
    >
    > for line in open(filename):
    > <do something with line>


    In theory, it is. In practice, that is the way Python code is written
    because it more natural and to the point. Not just for hacked together
    scripts, lots of third party modules includes code like "data =
    open(filename).read()" and similar idioms.

    > Is my data safer if I explicitly close, like this?:
    > fileptr = open("the.file", "w")
    > foo_obj.write(fileptr)
    > fileptr.close()


    Have you ever experienced a problem caused by not explicitly closing
    your file handles?

    --
    mvh Björn
    =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=, Aug 31, 2006
    #13
  14. Dan

    Dan Guest

    BJörn Lindqvist wrote:
    > On 8/30/06, Dan <> wrote:
    >> Is my data safer if I explicitly close, like this?:
    >> fileptr = open("the.file", "w")
    >> foo_obj.write(fileptr)
    >> fileptr.close()

    >
    > Have you ever experienced a problem caused by not explicitly closing
    > your file handles?
    >


    No. If I had, I wouldn't have asked the question. It seems to work,
    but can I really count on it?

    I am a sample of one (In that happy place that Brooks described as
    quadrant 1, where a person writes programs for himself to be run on his
    own computer. Or perhaps to be run by a handful of co-workers.) Such a
    small sample isn't statistically significant; a 100% success rate
    doesn't mean much. I've also never had a burned CD go bad, but I know
    that they do.

    /Dan

    --
    dedded att verizon dott net
    Dan, Sep 1, 2006
    #14
  15. Dan

    Guest

    Dan wrote:
    > BJo:rn Lindqvist wrote:
    > > On 8/30/06, Dan <> wrote:
    > >> Is my data safer if I explicitly close, like this?:
    > >> fileptr = open("the.file", "w")
    > >> foo_obj.write(fileptr)
    > >> fileptr.close()

    > > Have you ever experienced a problem caused by not explicitly closing
    > > your file handles?

    >
    > No. If I had, I wouldn't have asked the question. It seems to work,
    > but can I really count on it?


    In CPython, you can rely on file objects to be closed when the last
    reference to them is removed. In general, the language spec does not
    guarantee that so if you use Jython or other implementations you cannot
    rely on files being closed on last reference. (see my other posts in
    this thread for why I think the language spec should be changed to
    guarantee the ref-counting semantics at least in simple cases).
    , Sep 1, 2006
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    337
  2. Replies:
    22
    Views:
    754
    peter koch
    Apr 30, 2008
  3. Replies:
    6
    Views:
    346
    James Kanze
    Apr 29, 2008
  4. rantingrick
    Replies:
    44
    Views:
    1,204
    Peter Pearson
    Jul 13, 2010
  5. Navindra Umanee

    strong ref from weak ref?

    Navindra Umanee, Feb 12, 2005, in forum: Ruby
    Replies:
    2
    Views:
    143
    Navindra Umanee
    Feb 12, 2005
Loading...

Share This Page