does python have useless destructors?

  • Thread starter Michael P. Soulier
  • Start date
A

Aahz

Objects with a __del__ method shall be reference counted. When the
reference count reaches zero, the __del__ method shall be called, and
any subobjects that have a __del__ method shall also be unreferenced.

The point at which the memory allocated to the object is freed is
largely irrelevant. The point is that there's a predictable time at
which __del__ is called. This is what enables the RAII idiom.

Now, this could be tricky to implement because we are now separating
the concepts of "destruction" and "finalization". But it's certainly
not impossible, and it would add a powerful new concept to the
language. So I don't think the idea should be rejected out of hand.

Is this clearer?

Problem is, that's exactly the situation we currently have in CPython, so
I don't see what the improvement is. Are you suggesting that Jython
change its semantics?
 
M

Marcin 'Qrczak' Kowalczyk

I'm also not talking about modifying the garbage collector (if any).
I'm talking about a mechanism that is *independent* of the garbage
collector. To briefly resummarize:

Objects with a __del__ method shall be reference counted. When the
reference count reaches zero, the __del__ method shall be called, and
any subobjects that have a __del__ method shall also be unreferenced.

This is unimplementable without a CPU overhead equivalent to maintaining
reference counts for all objects.
 
M

Marcus Alanen

Hannu said:
It's not that simple when you compare it to C++ RAII idiom,
and the above code is actually wrong. If open() raises an
exception, myfile hasn't yet been assigned and myfile.close()
will raise another, unwanted exception of type "NameError".
The correct idiom is:

myfile = file("myfilepath", "w")
try:
myfile.write(reallybigbuffer)
finally:
myfile.close()

Does anybody know when "myfile" is created if it hasn't been introduced
previously? I.e. is Python guaranteed to create the variable, call the
function, then assign, or call the function, create variable, then
assign? In the latter case, we (at least technically) have a very small
chance that the creation of the variable fails, for some reason or the
other.

By the way, in one of our projects we obviously use several different
resources, so I wrote a simple/stupid script that checks for a resource
acquisition (here "file"), immediately followed by an appropriate "try"
followed at some point with "finally" and then a corresponding
destructor call (here "close"). Simple, but finds lot's of errors.

Marcus
 
I

Isaac To

Humpty> I have been hearing a lot of reference to Jython. This is yet
Humpty> another example how coupling languages can stifle their
Humpty> progress: C++ is stifled by its need for compatilibity with C,
Humpty> now clearly Python is becoming stifled by a need for
Humpty> compatibility with Jython.

Jython is not a separate language. It is just our favourite Python
language, running under the Java virtual machine. Perhaps it is "stifling"
the development of the Python language, but if it is, it is because we
explicitly *don't* want to introduce language dependency (i.e., don't depend
on C-Python implementation) rather than that we want to depend on a certain
language. Different people will have different idea about whether this is a
good thing. For me, I'd say that I prefer finding a different solution to
problems arising from the unspecified finalization behaviour, because
specifying the finalization time will more or less remove a use-case of the
Python language completely, and I do think that being able to use Python
within Java and able to use Java objects from Jython code without additional
"glue code" is something that should be dearly treasured. It is especially
the case because the lack of specification about when finalization happens
is, most of the time, not an issue at all.

Regards,
Isaac.
 
I

Isaac To

David> Isaac, I think you've missed the main thrust of my suggestion
David> here. I'm not talking about "locally scoped objects" such as C++
David> has. I'm also not talking about modifying the garbage collector
David> (if any). I'm talking about a mechanism that is *independent* of
David> the garbage collector. To briefly resummarize:

David> Objects with a __del__ method shall be reference counted. When
David> the reference count reaches zero, the __del__ method shall be
David> called, and any subobjects that have a __del__ method shall also
David> be unreferenced.

I'll love to have an easy way to have some objects reference counted and
some not. But it, quite simply put, doesn't work. If you have an object
with a __del__ method, you want it to be reference counted. But what will
happen if instance of another class holds a reference to it? It gotta be
reference counted as well, otherwise that object will not be deallocated and
an extra reference to the instance of the object that desire reference count
will prevent it from being deallocated. And what if instance of yet another
class holds a reference to it? It, again, has to be reference counted also.
And don't forget that most of such type of objects are actually very generic
objects that are everywhere in the language, dictionaries about the
variables in particular. And then, what if that object does not declare a
__del__? Actually, at least all the references of that object must go away
whether or not a __del__ is declared, otherwise the scheme won't work
because other objects are still holding references to your object. So
essentially, your suggestion would make all objects reference counted, which
is exactly what C-Python is doing, and is exactly what Jython is determined
*not* to do (because doing so will require an extra layer of abstraction
when on Java objects).

Regards,
Isaac.
 
R

Roger Binns

Martin said:
"the same thing as shutdown" means that you clear out
all modules.

I meant that during shutdown the modules are forcibly
garbage collected, and to a certain extent Python
treats their names like weak references (the names
don't disappear, the object they point to goes
to None).

My comment was in the context of what should the GC
do when it is faced with a "hard" problem, such as
cycles and objects with __del__ methods. At the moment
CPython gives up.

I was suggesting that it could instead do something
like keep names but point fields/objects at None
instead. CPython doesn't give up during shutdown.

Roger
 
D

David Turner

Carl Banks said:
Wrong. It would be almost impossible in Python.


Python can never rely on someone not taking a reference to your
resource-owning object. Anyone can store an open file object in a
list, for example, and there might even be good reasons to do it.
Thus, you can never guarantee that the garbage collector will pick up
the object when the function exits, even if it were perfect.

Carl, I'm sorry, but you've *completely* missed the point here.

(a) I've stated several times already that we're *not* talking about
__del__ being called if/when the object is garbage collected. Garbage
collection/finalization are completely different things to
destruction.

(b) So what if multiple references to a deterministic object can be
made? It really doesn't matter. The point is that the destruction of
the object is still predictable.

OTOH, if you indiscreetly finalize (for example, calling close on a
file object) any resource-owning object as soon as the function that
created it exits, not waiting for garbage collector to pick it up,
then you've pretty much stepped on the toes of someone else
it.

That is *not* what I suggested. Please read my suggestion again.

It seems to me that you are failing to taking the possibility that
these thing can happen into account. But, in Python, they can and do.
Finalizing an object as soon as the name binding it goes out of scope
is not always the right thing to do.

I never said it was.


I'm not going to bother responding to the rest of your post. Please
read my original suggestion again. In case you can't find it, here is
yet another precis:

Objects that define __del__ shall have a reference count, which is
incremented when names are bound to the object and decremented when
those names go out of scope. The __del__ method is called when the
reference count reaches zero. This mechanism is orthogonal to garbage
collection.

Regards
David Turner
 
M

Marcin 'Qrczak' Kowalczyk

Objects that define __del__ shall have a reference count, which is
incremented when names are bound to the object and decremented when
those names go out of scope. The __del__ method is called when the
reference count reaches zero. This mechanism is orthogonal to garbage
collection.

It's not statically known which variables hold objects which define __del__.
This implies that you must walk over all local variables of all function
activations, in addition to GC overhead, and you must manage reference
counts in all assignments. I'm afraid it's unacceptable.
 
D

Donn Cave

Quoth Carl Banks <[email protected]>:
....
| These are silly examples, of course, but with more intricate stuff (or
| with code by other people who are always less disciplined than you)
| this can become a real problem. Face it, people, it's ludicrous to
| rely on the garbage collector to finalize stuff for us.

`Face it?' We're not the ones talking about preposterous hypothetical
cases and hand-waving claims about intricate code written by people
who don't know what they're doing. There's boatloads of Python code
with constructs like the text = open(file, 'r').read() usage proposed
in this thread, and it's truly no problem. If you want to make that
into a laborious exercise in defensive programming, that's your problem.

It is a fact that there are some issues that have to be understood for
more general use of finalization, but that's part of understanding
Python's storage model. That model is in the end no simpler or more
intuitive than any other language - really worse than most - but it's
functional. At the cost of occasionally baffling the newcomer, it lets
us write modular code that doesn't impose on the caller to manage objects.
I don't need to know whether my function is the sole user of an object
and it falls to me to free it when I'm done, because the system takes
care of that. I get it, I use it, I forget about it. I realized what
a huge architectural difference that makes when I tried to translate a
non-trivial program to C. If the choice is either to abandon that
advantage for resources other than (small chunks of) memory, or learn
how to deal with the tricky corner cases, I'll take the latter.

Donn Cave, (e-mail address removed)
 
C

Carl Banks

David said:
Objects that define __del__ shall have a reference count, which is
incremented when names are bound to the object and decremented when
those names go out of scope. The __del__ method is called when the
reference count reaches zero. This mechanism is orthogonal to garbage
collection.

Are you aware that CPython already does this?

I assumed that you knew this, and I assumed that you were aware of the
limitations of reference counting (in particular, that you cannot rely
on a reference count ever going to zero). That is why I took your
statement to mean something else.

But if I assumed incorrectly, then you should know this: reference
counting simply doesn't take away the need to explicitly release
resources, unless you don't care about robustness (that thing you
claimed RAII is). It works for the common case most of the time, but
the danger always lurks that a reference could get trapped somewhere,
or a cycle could arise.

If you don't want that stuff happening, then you better use the
explicit finally block, reference counted finalization or not.
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Roger said:
My comment was in the context of what should the GC
do when it is faced with a "hard" problem, such as
cycles and objects with __del__ methods. At the moment
CPython gives up.

So in short: you are not proposing an implementable
solution. As a result, Peter Hansen's comment stands:

"Because nobody has yet proposed
a workable solution to the several conflicting
requirements"

A "workable solution" comes with a precise algorithm
as to what action to take in what order.

Regards,
Martin
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

Marcus said:
Does anybody know when "myfile" is created if it hasn't been introduced
previously? I.e. is Python guaranteed to create the variable, call the
function, then assign, or call the function, create variable, then
assign?

Neither, nor. Creation of the variable and assigning to it is an atomic
action for a global variable.

Local variables are created when the function starts.
In the latter case, we (at least technically) have a very small
chance that the creation of the variable fails, for some reason or the
other.

In the example, the chance is very high that the variable does not get
set, i.e. if open() fails with an exception.

It *might* be that the assignment fails because it runs out of memory.

In either case, Python raises an exception, and subsequent statements
are not executed.

Regards,
Martin
 
C

Carl Banks

Donn said:
Quoth Carl Banks <[email protected]>:
...
| These are silly examples, of course, but with more intricate stuff (or
| with code by other people who are always less disciplined than you)
| this can become a real problem. Face it, people, it's ludicrous to
| rely on the garbage collector to finalize stuff for us.

`Face it?' We're not the ones talking about preposterous hypothetical
cases and hand-waving claims about intricate code written by people
who don't know what they're doing.

I was talking about intricate (as in slightly more intricate than a
simple open and close) code written by people who know what they're
doing but have to use code from other people who also claim to "know
what they're doing."

There's boatloads of Python code
with constructs like the text = open(file, 'r').read() usage proposed
in this thread, and it's truly no problem.

That's about as complex as you can get and still make that claim. As
soon as the object gets bound to something, problems can (and will)
happen.


[snip]
I don't need to know whether my function is the sole user of an object
and it falls to me to free it when I'm done, because the system takes
care of that. I get it, I use it, I forget about it.

The problem is, you can't always afford to forget about it. Sometimes
you have to make sure that at this point in the program, this resource
has been released.

If you're relying on garbage collection to do that for you, you're
asking for trouble.
 
J

John J. Lee

Brian van den Broek said:
I'm still learning Python as a first language since some BASIC quite
some time ago, so my level of knowledge/understanding is not too
sophisticated. From that standpoint, I am wondering why the code that
Michael P. Soulier provided above would worry an experienced Python
programmer.

If you have a file open for writing when the process exits, the data
you've .write()ten isn't necessarily guaranteed to actually get
written to disk. (Apparently, whether it does or not depends on the
OS)

So, if an exception occurs after the first .write(), and there's no
finally: there to .close() your file, you might unexpectedly lose the
data that's already been written.

More generally, you might hang on to a resource (much) longer than
necessary, which is a problem if that resource is expensive (because
it's a scarce resource). For example, a database connection.

[...]
I had thought I was being careful and smart by always checking for
filepath existence and always explicitly closing files, but I am
wondering what red flag I'm overlooking.
[...]

Explicitly checking for a file's existence is another Bad Thing,
because of race conditions. I think Alex Martelli has written about
this here before (google for LBYL and EAFP).


John
 
?

=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=

John said:
If you have a file open for writing when the process exits, the data
you've .write()ten isn't necessarily guaranteed to actually get
written to disk. (Apparently, whether it does or not depends on the
OS)

So, if an exception occurs after the first .write(), and there's no
finally: there to .close() your file, you might unexpectedly lose the
data that's already been written.

That is not true: the data is not lost. The file is closed eventually
(e.g. when Python exits), in which case the data is flushed to disk.

Regards,
Martin
 
R

Roger Binns

Martin said:
So in short: you are not proposing an implementable
solution.

Actually I did. I am not saying this is not a hard
problem. It *is* a hard problem. It is also a very real
problem as we have seen handled in various ways in
various languages.

But at one point generating code for algorithms was hard,
and then compilers were invented. And memory management
was hard and GC was invented. Storage was hard and
filesystems were invented.

The whole point of languages is to make stuff easier.
And Python excels at letting the machine deal with
machine issues. None of the tens of thousands of lines
of Python code I have written even knows what byte
ordering a machine uses, or even its natural word
size.

So here is a hard problem. Feel free to prove that it
is insoluable (in the computer science sense), or lets
pick a solution that improves things.

As a programmer I would be very happy to get an exception
or similar diagnostic if I create a case that Python
can't currently cope with. I can change my code to fix
it. But making every consumer of every resource including
indirectly deal with something the machine should figure
out is silly. They used to do that once about memory
and fortunately those days are behind us.

And if you want an example of how complex this can get,
try and modify the XML-RPC stuff to work over SSL and
due multiple requests/responses per connection. There
are several layers of libraries and external resources
involved, and currently a lot of code written by very
smart people that tries real hard, has to implement
its own reference counting in Python, and doesn't work
over SSL or handle more than one request per connection.

Roger
 
J

John J. Lee

Martin v. Löwis said:
That is not true: the data is not lost. The file is closed eventually
(e.g. when Python exits), in which case the data is flushed to disk.

I guess you're right.

Points for anyone able to guess what I was thinking of (I don't know
myself :)...


John
 
A

Aahz

The whole problem that this thread is about is that Python has this
bizarre scheme that an object will be garbage collected *unless* you
add a __del__ method, at which point the docs imply you will be lucky
for garbage collection to *ever* happen on the object.

Please keep the distinction between refcounting and GC clear. The
business with __del__ breaking memory management *only* occurs when you
need to use GC because you've got a cycle. Historically, Python didn't
have GC, and you had leaky memory whenever you had cycles. Now we've
got a much-improved situation.
 
A

Aahz

The huge advantage that the RAII approach holds in this respect is
that the user of the library just does what comes naturally - for
example, he declares a file object and uses it. He would have done
that anyway. He doesn't need to know whether or not it's a RAII
object that needs a "with" or "using" or "dispose" or "try/finally"
clause.

Do you know any RAII approach that does not depend on stack-based locals
semantics? Python's global objects are extremely useful, and unless you
want to essentially create a new language, anything remotely resembling
pure RAII in the language core won't work.
 
D

David Turner

Isaac To said:
Jython is not a separate language. It is just our favourite Python
language, running under the Java virtual machine. Perhaps it is "stifling"
the development of the Python language, but if it is, it is because we
explicitly *don't* want to introduce language dependency (i.e., don't depend
on C-Python implementation) rather than that we want to depend on a certain
language. Different people will have different idea about whether this is a
good thing. For me, I'd say that I prefer finding a different solution to
problems arising from the unspecified finalization behaviour, because
specifying the finalization time will more or less remove a use-case of the
Python language completely, and I do think that being able to use Python
within Java and able to use Java objects from Jython code without additional
"glue code" is something that should be dearly treasured. It is especially
the case because the lack of specification about when finalization happens
is, most of the time, not an issue at all.

You don't have to specify the finalization time in order to make the
destructors work. Destruction and finalization are *different
things*.

The D programming language somehow contrives to have both garbage
collection and working destructors. So why can't Python?


Regards
David Turner
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top