Boolean value of generators

T

Tony

I have been using generators for the first time and wanted to check for
an empty result. Naively I assumed that generators would give
appopriate boolean values. For example

def xx():
l = []
for x in l:
yield x

y = xx()
bool(y)


I expected the last line to return False but it actually returns True.
Is there anyway I can enhance my generator or iterator to have the
desired effect?

Regards

Tony Middleton.
 
C

Cameron Simpson

| I have been using generators for the first time and wanted to check for
| an empty result. Naively I assumed that generators would give
| appopriate boolean values. For example
|
| def xx():
| l = []
| for x in l:
| yield x
|
| y = xx()
| bool(y)
|
|
| I expected the last line to return False but it actually returns True.
| Is there anyway I can enhance my generator or iterator to have the
| desired effect?

The generator is not the same as the values it yields.
What you're doing is like this:
... return False
... False

In your code, xx() returns a generator object. It is not None, nor any
kind of "false"-ish value. So bool() returns True.

The generator hasn't even _run_ at that point, so nobody has any idea if
iterating over it will return an empty sequence.

What you want is something like this:

values = list(xx())
bool(values)

or more clearly:

gen = xx()
values = list(gen)
bool(values)

You can see here that you actually have to iterate over the generator
before you know if it is (will be) empty.

Try this program:

def lines():
print "opening foo"
for line in open("foo"):
yield line
print "closing foo"

print "get generator"
L = lines()
print "iterate"
text = list(L)
print "done"

For me it does this:

get generator
iterate
opening foo
closing foo
done

You can see there that the generator _body_ doesn't even run until you
start the iteration.

Does this clarify things for you?

Cheers,
--
Cameron Simpson <[email protected]> DoD#743
http://www.cskk.ezoshosting.com/cs/

Winter is gods' way of telling us to polish.
- Peter Harper <[email protected]> <[email protected]>
 
P

Peter Otten

Tony said:
I have been using generators for the first time and wanted to check for
an empty result. Naively I assumed that generators would give
appopriate boolean values. For example

def xx():
l = []
for x in l:
yield x

y = xx()
bool(y)


I expected the last line to return False but it actually returns True.
Is there anyway I can enhance my generator or iterator to have the
desired effect?

* What would you expect

def f():
if random.randrange(2):
yield 42

print bool(f())

to print? Schrödinger's Cat?

* You can wrap your generator into an object that reads one item in advance.
A slightly overengineered example:

http://code.activestate.com/recipes/577361-peek-ahead-an-iterator/

* I would recommend that you avoid the above approach. Pythonic solutions
favour EAFP (http://docs.python.org/glossary.html#term-eafp) over look-
before-you-leap:

try:
value = next(y)
except StopIteration:
print "ran out of values"
else:
do_something_with(value)

or

value = next(y, default)

Peter
 
C

Carl Banks

I have been using generators for the first time and wanted to check for
an empty result.  Naively I assumed that generators would give
appopriate boolean values.  For example

def xx():
  l = []
  for x in l:
    yield x

y = xx()
bool(y)

I expected the last line to return False but it actually returns True.
Is there anyway I can enhance my generator or iterator to have the
desired effect?

In general, the only way to test if a generator is empty is to try to
consume an item. (It's possible to write an iterator that consumes an
item and caches it to be returned on the next next(), and whose
boolean status indicates if there's an item left. I would guess the
recipe Peter Otten pointed you to does that.)

The unfortunate thing about this is that functions written to iterate
over sequences that test if the sequence is empty with a boolean test
cannot be used with generators, and will fail silently. This hurts
duck typing.

This became an issue some releases ago (2.4, I think) when someone
decided duck typing was a good thing and so it would be a good idea if
iterators that did know if they were empty had a boolean status
indicating as such. GvR angrily told them to change it back next
release. I have to agree with GvR here: at least this way there is a
simple rule whether boolean test works. (Sequences return boolean
status indicating if they're empty; other iterators return True.) The
better thing would be if boolean wasn't used to test for emptiness at
all; the whole concept of booleans in Python is overloaded and that
hurts duck typing.



Carl Banks
 
C

Carl Banks

* I would recommend that you avoid the above approach. Pythonic solutions
favour EAFP (http://docs.python.org/glossary.html#term-eafp) over look-
before-you-leap:

try:
    value = next(y)
except StopIteration:
    print "ran out of values"
else:
    do_something_with(value)

or

value = next(y, default)


Good idea but not always convenient. Sometimes you have to perform
some setup ahead of time if there are any items, and must not perform
that setup if the there are no items. It's a PITA to use EAFP for
that, which is why an iterator that consumes and caches can be a
useful thing.


Carl Banks
 
P

Paul Rubin

Carl Banks said:
In general, the only way to test if a generator is empty is to try to
consume an item. (It's possible to write an iterator that consumes an
item and caches it to be returned on the next next(), and whose
boolean status indicates if there's an item left. ...)

I remember thinking that Python would be better off if all generators
automatically cached an item, so you could test for emptiness, look
ahead at the next item without consuming it, etc. This might have been
a good change to make in Python 3.0 (it would have broken compatibility
with 2.x) but it's too late now.
 
A

Albert Hopkins

I have been using generators for the first time and wanted to check for
an empty result. Naively I assumed that generators would give
appopriate boolean values. For example

def xx():
l = []
for x in l:
yield x

y = xx()
bool(y)

As people have already mentioned, generators are objects and objects
(usually) evaluate to True.

There may be times, however, that a generator may "know" that it
doesn't/isn't/won't generate any values, and so you may be able to
override boolean evaluation. Consider this example:

class DaysSince(object):
def __init__(self, day):
self.day = day
self.today = datetime.date.today()

def __nonzero__(self):
if self.day > self.today:
return False
return True

def __iter__(self):
one_day = datetime.timedelta(1)
new_day = self.day
while True:
new_day = new_day + one_day
if new_day <= self.today:
yield new_day
else:
break

g1 = DaysSince(datetime.date(2010, 10, 10))
print bool(g1)
for day in g1:
print day

g2 = DaysSince(datetime.date(2011, 10, 10))
print bool(g2)
for day in g2:
print
day
 
T

Tim Chase

I remember thinking that Python would be better off if all generators
automatically cached an item, so you could test for emptiness, look
ahead at the next item without consuming it, etc. This might have been
a good change to make in Python 3.0 (it would have broken compatibility
with 2.x) but it's too late now.

Generators can do dangerous things...I'm not sure I'd *want* to
have Python implicitly cache generators without an explicit
wrapper to request it:

import os
from fnmatch import fnmatch

def delete_info(root, pattern):
for path, dirs, files in os.walk(root):
for fname in files:
if fnmatch(fname, pattern):
full_path = os.path.join(path, fname)
info = gather_info(full_path)
os.unlink(full_path)
yield full_path, info

location = '/'
user_globspec = '*.*'
deleter = delete_info(location, user_globspec)
if some_user_condition_determined_after_generator_creation:
for path, info in deleter:
report(path, info)

-tkc
 
S

Steven D'Aprano

Generators can do dangerous things...I'm not sure I'd *want* to have
Python implicitly cache generators without an explicit wrapper to
request it:

I'm sure that I DON'T want it. It would be a terrible change.

(1) Generators are lightweight. Adding overhead to cache the next value
adds value only for a small number of uses, but adds weight to all
generators.

(2) Generators are simple. There is a clear and obvious distinction
between "create the generator object by calling the generator function"
and "call the generated values by iterating over the generator object".
Admittedly the language is a bit clumsy, but the concept is simple -- you
have a generator function that you call, and it returns an iterable
object that yields values. Simple and straightforward. Caching blurs this
distinction -- calling the function also produces the first object,
caching it and hiding any StopIteration.

(3) Generators with side-effects. I know, I know, if you write functions
with side-effects, you're in a state of sin already, but there's no need
for Python to make it worse.

(4) Expensive generators. The beauty of generators is that they produce
values on demand. Making all generators cache their first value means
that you pay that cost even if you end up never needing the first value.

(5) Time dependent output of generators. The values yielded can depend on
the time at which you invoke the generator. Caching plays havoc with that.


None of this is meant to say "Never cache generator output", that would
be a silly thing to say. If you need an iterator with look-ahead, that
knows whether it is empty or not, go right ahead and use one. But don't
try to force it on everyone.
 
P

Paul Rubin

Steven D'Aprano said:
(4) Expensive generators. The beauty of generators is that they produce
values on demand. Making all generators cache their first value means
that you pay that cost even if you end up never needing the first value.

You wouldn't generate the cached value ahead of time. You'd just
remember the last generated value so that you could use it again.
Sort of like getc/ungetc.

An intermediate measure might be to have a stdlib wrapper that added
caching like this to an arbitrary generator. I've written such things a
few times in various clumsy ways. Having the caching available in the C
code would eliminate a bunch of indirection.
 
S

Steven D'Aprano

There may be times, however, that a generator may "know" that it
doesn't/isn't/won't generate any values, and so you may be able to
override boolean evaluation. Consider this example:
[snip example]


This is a good example, but it's not a generator, it's an iterator :)

The two are similar in that they both produce values lazily, as required,
but generators are a special case of iterators: generators are a special
form of the function syntax which returns a lightweight and simple
iterator. Iterators are more general. They're an interface rather than a
type, so any class you build which matches the iterator protocol is an
iterator, but only a function with a yield is a generator.

Other than this nit-pic, your idea of making a custom iterator with a
__nonzero__ method is an excellent idea.
 
T

Tim Chase

(3) Generators with side-effects. I know, I know, if you write functions
with side-effects, you're in a state of sin already, but there's no need
for Python to make it worse.

(4) Expensive generators. The beauty of generators is that they produce
values on demand. Making all generators cache their first value means
that you pay that cost even if you end up never needing the first value.

I'd consider "expensive generators" a subset (or at least
intersecting) "generators with side-effects"...that side-effect
being time-consumed. Either way, I'm pretty firmly with you in
the "don't do it by default; let me explicitly wrap it if I want
it" camp.

-tkc
 
S

Steve Howell

You wouldn't generate the cached value ahead of time.  You'd just
remember the last generated value so that you could use it again.
Sort of like getc/ungetc.

An intermediate measure might be to have a stdlib wrapper that added
caching like this to an arbitrary generator.  I've written such things a
few times in various clumsy ways.  Having the caching available in the C
code would eliminate a bunch of indirection.

Is there an idiomatic way in Python to wrap a generator with a getc/
ungetc mechanism?

I know Paul is not alone in having written such things himself in
various clumsy ways.

This is my own clumsy attempt, but it seems like there should be a
simpler way to achieve what I'm doing.

def abc():
yield 'a'
yield 'b'
yield 'c'

for letter in abc():
print letter

class Wrap:
def __init__(self, g):
self.g = g
self.use_cached = False

def get(self):
if self.use_cached:
self.use_cached = False
return self.value
self.value = self.g.next()
return self.value

def unget(self):
if self.use_cached:
raise Exception('only one unget allowed')
self.use_cached = True


w = Wrap(abc())
print w.get()
w.unget()
print w.get()
print w.get()
for letter in w.g:
print letter
 
A

Arnaud Delobelle

Paul Rubin said:
You wouldn't generate the cached value ahead of time. You'd just
remember the last generated value so that you could use it again.
Sort of like getc/ungetc.

An intermediate measure might be to have a stdlib wrapper that added
caching like this to an arbitrary generator. I've written such things a
few times in various clumsy ways. Having the caching available in the C
code would eliminate a bunch of indirection.

I've done such a thing myself a few times. I remember posting on
python-ideas a while ago (no time to find the thread ATM). My
suggestion was to add a function peekable(it) that returns an iterator
with a peek() method, whose behaviour is exactly the one that you
describe (i.e. similar to getc/ungetc). I also suggested that iterators
could optionally implement a peek() method themselves, in which case
peek(it) would return the iterator without modification. For examples,
list_iterators, str_iterators and other iterators over sequences could
implement next() without any cost. I don't recall that this proposal
gained much traction!
 
G

Grant Edwards

I remember thinking that Python would be better off if all generators
automatically cached an item,

That would play havoc with generators that had side effects. Not
everybody who writes generators is using them to generate Fibonacci
numbers.
so you could test for emptiness, look ahead at the next item without
consuming it, etc.

And what happens when the generator is doing things like executing
database transactions?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top