Distinguishing active generators from exhausted ones

M

Michal Kwiatkowski

Hi,

Is there a way to tell if a generator has been exhausted using pure
Python code? I've looked at CPython sources and it seems that
something like "active"/"exhausted" attribute on genobject is missing
from the API. For the time being I am using a simple C extension to
look at f_stacktop pointer of the generator frame, which seems to
differentiate active generators from exhausted ones. See
http://bazaar.launchpad.net/~ruby/pythoscope/support-python2.3/annotate/286/pythoscope/_util.c#L16
for complete source code.

I may be missing something obvious here. Is there a better way to tell
if a given generator object is still active or not?

Cheers,
mk
 
J

Jason Tackaberry

Is there a way to tell if a generator has been exhausted using pure
Python code? I've looked at CPython sources and it seems that

Upon a cursory look, after a generator 'gen' is exhausted (meaning
gen.next() has raised StopIteration), it seems that gen.gi_frame will be
None.

Cheers,
Jason.
 
M

Michal Kwiatkowski

Upon a cursory look, after a generator 'gen' is exhausted (meaning
gen.next() has raised StopIteration), it seems that gen.gi_frame will be
None.

Only in Python 2.5 or higher though. I need to support Python 2.3 and
2.4 as well, sorry for not making that clear in the original post.

Cheers,
mk
 
H

Hendrik van Rooyen

Hi,

Is there a way to tell if a generator has been exhausted using pure
Python code? I've looked at CPython sources and it seems that
something like "active"/"exhausted" attribute on genobject is missing
from the API. For the time being I am using a simple C extension to
look at f_stacktop pointer of the generator frame, which seems to
differentiate active generators from exhausted ones. See
http://bazaar.launchpad.net/~ruby/pythoscope/support-python2.3/annotate/286
/pythoscope/_util.c#L16 for complete source code.

I may be missing something obvious here. Is there a better way to tell
if a given generator object is still active or not?

Is there a reason why you cannot just call the next method and handle
the StopIteration when it happens?

- Hendrik
 
M

Michal Kwiatkowski

    foo = the_generator_object
    try:
        do_interesting_thing_that_needs(foo.next())
    except StopIteration:
        generator_is_exhausted()

In other words, don't LBYL, because it's EAFP. Whatever you need to do
that requires the next item from the generator, do that; you'll get a
specific exception if the generator is exhausted.

The thing is I don't need the next item. I need to know if the
generator has stopped without invoking it. Why - you may ask. Well,
the answer needs some explaining.

I'm working on the Pythoscope project (http://pythoscope.org) and I
use tracing mechanisms of CPython (sys.settrace) to capture function
calls and values passed to and from them. Now, the problem with
generators is that when they are ending (i.e. returning instead of
yielding) they return a None, which is in fact indistinguishable from
"yield None". That means I can't tell if the last captured None was in
fact yielded or is a bogus value which should be rejected. Let me show
you on an example.

import sys

def trace(frame, event, arg):
if event != 'line':
print frame, event, arg
return trace

def gen1():
yield 1
yield None

def gen2():
yield 1

sys.settrace(trace)
print "gen1"
g1 = gen1()
g1.next()
g1.next()
print "gen2"
g2 = gen2()
[x for x in g2]
sys.settrace(None)

The first generator isn't finished, it yielded 1 and None. Second one
is exhausted after yielding a single value (1). The problem is that,
under Python 2.4 or 2.3 both invocations will generate the same trace
output. So, to know whether the last None was actually a yielded value
I need to know if a generator is active or not.

Your solution, while gives me an answer, is not acceptable because
generators can cause side effects (imagine call to launch_rockets()
before the next yield statement ;).

Cheers,
mk
 
A

Aahz

Only in Python 2.5 or higher though. I need to support Python 2.3 and
2.4 as well, sorry for not making that clear in the original post.

Are you sure? It appears to work in Python 2.4; I don't have time to
check 2.3.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
 
T

Terry Reedy

Michal said:
The thing is I don't need the next item. I need to know if the
generator has stopped without invoking it.

Write a one-ahead iterator class, which I have posted before, that sets
..exhausted to True when next fails.

tjr
 
G

greg

Michal said:
The first generator isn't finished, it yielded 1 and None. Second one
is exhausted after yielding a single value (1). The problem is that,
under Python 2.4 or 2.3 both invocations will generate the same trace
output.

This seems to be a deficiency in the trace mechanism.
There really ought to be a 'yield' event to distinguish
yields from returns.

You could put in a feature request on python-dev
concerning this.
 
S

Steven D'Aprano

Write a one-ahead iterator class, which I have posted before, that sets
.exhausted to True when next fails.


And hope that the generator doesn't have side-effects...
 
T

Terry Reedy

Steven said:
And hope that the generator doesn't have side-effects...

If run to exhastion, the same number of side-effects happen.
The only difference is that they each happen once step happpen sooner.
For reading a file that is irrelevant. Much else, and the iterator is
not just an iterator.

tjr
 
S

Steven D'Aprano

If run to exhastion, the same number of side-effects happen. The only
difference is that they each happen once step happpen sooner. For
reading a file that is irrelevant. Much else, and the iterator is not
just an iterator.

I believe the OP specifically said he needs to detect whether or not an
iterator is exhausted, without running it to exhaustion, so you shouldn't
assume that the generator has been exhausted.

When it comes to side-effects, timing matters. For example, a generator
which cleans up after it has run (deleting temporary files, closing
sockets, etc.) will leave the environment in a different state if run to
exhaustion than just before exhaustion. Even if you store the final
result in a one-ahead class, you haven't saved the environment, and that
may be significant.

(Of course, it's possible that it isn't significant. Not all differences
make a difference.)

The best advice is, try to avoid side-effects, especially in generators.
 
D

Dennis Lee Bieber

If run to exhastion, the same number of side-effects happen.
The only difference is that they each happen once step happpen sooner.
For reading a file that is irrelevant. Much else, and the iterator is
not just an iterator.
Why am I seeing visions of Pascal's pre-read I/O scheme (and the
confusion that gives -- accessing the first data item before doing an
explicit read)
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

Michal Kwiatkowski

Are you sure? It appears to work in Python 2.4; I don't have time to
check 2.3.

No, it does not work in Python 2.4. gi_frame can be None only in
Python 2.5 and higher.

Via "What’s New in Python 2.5" (http://docs.python.org/whatsnew/
2.5.html):

"""
Another even more esoteric effect of this change: previously, the
gi_frame attribute of a generator was always a frame object. It’s now
possible for gi_frame to be None once the generator has been
exhausted.
"""

Cheers,
mk
 
A

Aahz

No, it does not work in Python 2.4. gi_frame can be None only in
Python 2.5 and higher.

You're right, I guess I must have made a boo-boo when I was switching
versions.
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Many customs in this life persist because they ease friction and promote
productivity as a result of universal agreement, and whether they are
precisely the optimal choices is much less important." --Henry Spencer
 
T

Terry Reedy

Steven said:
I believe the OP specifically said he needs to detect whether or not an
iterator is exhausted, without running it to exhaustion, so you shouldn't
assume that the generator has been exhausted.

I believe the OP said he needs to determine whether or not an iterator
(specifically generator) is exhausted without consuming an item when it
is not. That is slightly different. The wrapper I suggested makes that
easy. I am obviously not assuming exhaustion when there is a .exhausted
True/False flag to check.

There are two possible definition of 'exhausted': 1) will raise
StopIteration on the next next() call; 2) has raised StopIteration at
least once. The wrapper converts 2) to 1), which is to say, it obeys
definition 1 once the underlying iteration has obeyed definition 2.

Since it is trivial to set 'exhausted=True' in the generator user code
once StopIteration has been raised (meaning 2), I presume the OP wants
the predictive meaning 1).

Without a iterator class wrapper, I see no way to predict what a
generator will do (expecially, raise StopIteration the first time)
without analyzing its code and local variable state.

I said in response to YOU that once exhaustion has occurred, then the
same number of side effects would have occurred.
When it comes to side-effects, timing matters.

Sometimes. And I admitted that possibility (slight garbled).

For example, a generator
which cleans up after it has run (deleting temporary files, closing
sockets, etc.) will leave the environment in a different state if run to
exhaustion than just before exhaustion. Even if you store the final
result in a one-ahead class, you haven't saved the environment, and that
may be significant.

Of course, an eager-beaver generator written to be a good citizen might
well close resources as soon as it knows *they* are exhausted, long
before *it* yields the last items from the in-memory last block read.
For all I know, file.readlines could do such.

Assuming that is not the case, the cleanup will not happen until the
what turns out to be the final item is requested from the wrapper. Once
cleanup has happened, .exhausted will be set to True. If proper
processing of even the last item requires that cleanup not have
happened, then that and prediction of exhaustion are incompatible. One
who wants both should write an iterator class instead of generator function.
(Of course, it's possible that it isn't significant. Not all differences
make a difference.)

The best advice is, try to avoid side-effects, especially in generators.

Agreed.

Terry Jan Reedy
 
M

Michal Kwiatkowski

There are two possible definition of 'exhausted': 1) will raise
StopIteration on the next next() call; 2) has raised StopIteration at
least once. The wrapper converts 2) to 1), which is to say, it obeys
definition 1 once the underlying iteration has obeyed definition 2.

Since it is trivial to set 'exhausted=True' in the generator user code
once StopIteration has been raised (meaning 2), I presume the OP wants
the predictive meaning 1).

No, I meant the second meaning (i.e. generator is exhausted when it
has returned instead of yielding).

While, as you showed, it is trivial to create a generator that will
have the "exhausted" flag, in my specific case I have no control over
the user code. I have to use what the Python genobject API gives me
plus the context of the trace function.

Cheers,
mk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,012
Latest member
RoxanneDzm

Latest Threads

Top