Why don't generators execute until first yield?

Discussion in 'Python' started by Martin Sand Christensen, May 7, 2008.

  1. Hi!

    First a bit of context.

    Yesterday I spent a lot of time debugging the following method in a
    rather slim database abstraction layer we've developed:

    ,----
    | def selectColumn(self, table, column, where={}, order_by=[], group_by=[]):
    | """Performs a SQL select query returning a single column
    |
    | The column is returned as a list. An exception is thrown if the
    | result is not a single column."""
    | query = build_select(table, [column], where, order_by, group_by)
    | result = DBResult(self.rawQuery(query))
    | if result.colcount != 1:
    | raise QueryError("Query must return exactly one column", query)
    | for row in result.fetchAllRowsAsList():
    | yield row[0]
    `----

    I'd just rewritten the method as a generator rather than returning a
    list of results. The following test then failed:

    ,----
    | def testSelectColumnMultipleColumns(self):
    | res = self.fdb.selectColumn('db3ut1', ['c1', 'c2'],
    | {'c1':(1, 2)}, order_by='c1')
    | self.assertRaises(db3.QueryError, self.fdb.selectColumn,
    | 'db3ut1', ['c1', 'c2'], {'c1':(1, 2)}, order_by='c1')
    `----

    I expected this to raise a QueryError due to the result.colcount != 1
    constraint being violated (as was the case before), but that isn't the
    case. The constraint it not violated until I get the first result from
    the generator.

    Now to the main point. When a generator function is run, it immediately
    returns a generator, and it does not run any code inside the generator.
    Not until generator.next() is called is any code inside the generator
    executed, giving it traditional lazy evaluation semantics. Why don't
    generators follow the usual eager evaluation semantics of Python and
    immediately execute up until right before the first yield instead?
    Giving generators special case semantics for no good reason is a really
    bad idea, so I'm very curious if there is a good reason for it being
    this way. With the current semantics it means that errors can pop up at
    unexpected times rather than the code failing fast.

    Martin
     
    Martin Sand Christensen, May 7, 2008
    #1
    1. Advertising

  2. Martin Sand Christensen

    Ian Kelly Guest

    On Wed, May 7, 2008 at 2:29 AM, Martin Sand Christensen <> wrote:
    > Now to the main point. When a generator function is run, it immediately
    > returns a generator, and it does not run any code inside the generator.
    > Not until generator.next() is called is any code inside the generator
    > executed, giving it traditional lazy evaluation semantics. Why don't
    > generators follow the usual eager evaluation semantics of Python and
    > immediately execute up until right before the first yield instead?
    > Giving generators special case semantics for no good reason is a really
    > bad idea, so I'm very curious if there is a good reason for it being
    > this way. With the current semantics it means that errors can pop up at
    > unexpected times rather than the code failing fast.


    Isn't lazy evaluation sort of the whole point of replacing a list with
    an iterator? Besides which, running up to the first yield when
    instantiated would make the generator's first iteration inconsistent
    with the remaining iterations. Consider this somewhat contrived
    example:

    def printing_iter(stuff):
    for item in stuff:
    print item
    yield item

    Clearly, the idea here is to create a generator that wraps another
    iterator and prints each item as it yields it. But using your
    suggestion, this would instead print the first item at the time the
    generator is created, rather than when the first item is actually
    iterated over.

    If you really want a generator that behaves the way you describe, I
    suggest doing something like this:

    def myGenerator(args):
    immediate_setup_code()

    def generator():
    for item in actual_generator_loop():
    yield item
    return generator()
     
    Ian Kelly, May 7, 2008
    #2
    1. Advertising

  3. >>>>> "Ian" == Ian Kelly <> writes:
    Ian> Isn't lazy evaluation sort of the whole point of replacing a list
    Ian> with an iterator? Besides which, running up to the first yield when
    Ian> instantiated would make the generator's first iteration
    Ian> inconsistent with the remaining iterations.

    That wasn't my idea, although that may not have come across quite
    clearly enough. I wanted the generator to immediately run until right
    before the first yield so that the first call to next() would start with
    the first yield.

    My objection is that generators _by default_ have different semantics
    than the rest of the language. Lazy evaluation as a concept is great for
    all the benefits it can provide, but, as I've illustrated, strictly lazy
    evaluation semantics can be somewhat surprising at times and lead to
    problems that are hard to debug if you don't constantly bear the
    difference in mind. In this respect, it seems to me that my suggestion
    would be an improvement. I'm not any kind of expert on languages,
    though, and I may very well be missing a part of the bigger picture that
    makes it obvous why things should be as they are.

    As for code to slightly change the semantics of generators, that doesn't
    really address the issue as I see it: if you're going to apply such code
    to your generators, you're probably doing it exactly because you're
    aware of the difference in semantics, and you're not going to be
    surprised by it. You may still want to change the semantics, but for
    reasons that are irrelevant to my point.

    Martin
     
    Martin Sand Christensen, May 7, 2008
    #3
  4. >>>>> "Duncan" == Duncan Booth <> writes:
    [...]
    Duncan> Now try:
    Duncan>
    Duncan> for command in getCommandsFromUser():
    Duncan> print "the result of that command was", execute(command)
    Duncan>
    Duncan> where getCommandsFromUser is a greedy generator that reads from stdin,
    Duncan> and see why generators don't work that way.

    I don't see a problem unless the generator isn't defined where it's
    going to be used. In other similar input bound use cases, such as the
    generator iterating over a query result set in my original post, I see
    even less of a problem. Maybe I'm simply daft and you need to spell it
    out for me. :)

    Martin
     
    Martin Sand Christensen, May 7, 2008
    #4
  5. Duncan Booth wrote:

    > It does this:
    >
    >>>> @greedy

    > def getCommandsFromUser():
    > while True:
    > yield raw_input('Command?')
    >
    >
    >>>> for cmd in getCommandsFromUser():

    > print "that was command", cmd
    >
    >
    > Command?hello
    > Command?goodbye
    > that was command hello
    > Command?wtf
    > that was command goodbye
    > Command?



    Not here..


    In [7]: def getCommandsFromUser():
    while True:
    yield raw_input('Command?')
    ...:
    ...:

    In [10]: for cmd in getCommandsFromUser(): print "that was command", cmd
    ....:
    Command?hi
    that was command hi
    Command?there
    that was command there
    Command?wuwuwuw
    that was command wuwuwuw
    Command?
     
    Marco Mariani, May 7, 2008
    #5
  6. Marco Mariani wrote:

    > Not here..


    Oh, sorry, I obviously didn't see the @greedy decorator amongst all the
    quoting levels.

    Anyway, the idea doesn't make much sense to me :)
     
    Marco Mariani, May 7, 2008
    #6
  7. Duncan Booth wrote:

    > Perhaps if you'd copied all of my code (including the decorator that was
    > the whole point of it)...


    Sure, I missed the point. Python's > symbols become quoting levels and
    mess up messages.

    Anyway, I would loathe to start execution of a generator before starting
    to iterate through it. Especially when generators are passed around.
    The current behavior makes perfect sense.
     
    Marco Mariani, May 7, 2008
    #7
  8. Martin Sand Christensen

    Guest

    On May 7, 7:37 am, Marco Mariani <> wrote:
    > Duncan Booth wrote:
    > > Perhaps if you'd copied all of my code (including the decorator that was
    > > the whole point of it)...

    >
    > Sure, I missed the point. Python's > symbols become quoting levels and
    > mess up messages.
    >
    > Anyway, I would loathe to start execution of a generator before starting
    > to iterate through it. Especially when generators are passed around.
    > The current behavior makes perfect sense.


    Question:

    >>> def f( ):

    ... print 0
    ... while 1:
    ... yield 1
    ...
    >>> g= f( )
    >>> g.next( )

    0
    1
    >>> g.next( )

    1
    >>> g.next( )

    1

    This might fit the bill:

    >>> def dropfirst( h ):

    ... h.next( )
    ... return h
    ...
    >>> g= dropfirst( f( ) )

    0
    >>> g.next( )

    1
    >>> g.next( )

    1
    >>> g.next( )

    1

    However as dropfirst is dropping a value, both caller -and- cally have
    to designate a/the exception. Hold generators are better "first-
    dropped", and you hold 'next' inherently causes side effects. @greedy
    (from earlier) frees the caller of a responsibility/obligation.

    What can follow without a lead?

    The definitions may lean harder on the 'generation' as prior to the
    'next': generators inherently don't cause side effects.

    Or hold, first-dropped is no exception:

    >>> special= object( )
    >>> def f( ):

    ... print 0
    ... yield special
    ... while 1:
    ... yield 1
    ...
    >>> g= f( )
    >>> g.next( )

    0
    <object object at 0x00980470>
    >>> g.next( )

    1
    >>> g.next( )

    1
    >>> g.next( )

    1
     
    , May 7, 2008
    #8
  9. > Now to the main point. When a generator function is run, it immediately
    > returns a generator, and it does not run any code inside the generator.
    > Not until generator.next() is called is any code inside the generator
    > executed, giving it traditional lazy evaluation semantics. Why don't
    > generators follow the usual eager evaluation semantics of Python and
    > immediately execute up until right before the first yield instead?
    > Giving generators special case semantics for no good reason is a really
    > bad idea, so I'm very curious if there is a good reason for it being
    > this way. With the current semantics it means that errors can pop up at
    > unexpected times rather than the code failing fast.


    The semantics of a generator are very clear: on .next(), run until the next
    yield is reached and then return the yielded value. Plus of course the
    dealing with StopIteration-stuff.

    Your scenario would introduce a special-case for the first run, making it
    necessary to keep additional state around (possibly introducing GC-issues
    on the way), just for the sake of it. And violate the lazyness a generator
    is all about. Think of a situation like this:

    def g():
    while True:
    yield time.time()

    Obviously you want to yield the time at the moment of .next() being called.
    Not something stored from ages ago. If anything that setups the generator
    shall be done immediatly, it's easy enough:

    def g():
    first_result = time.time()
    def _g():
    yield first_result
    while True:
    yield time.time()
    return _()

    Diez
     
    Diez B. Roggisch, May 7, 2008
    #9
  10. Martin Sand Christensen wrote:
    > Why don't
    > generators follow the usual eager evaluation semantics of Python and
    > immediately execute up until right before the first yield instead?


    A great example of why this behavior would defeat some of the purpose of
    generators can be found in this amazing PDF presentation:

    http://www.dabeaz.com/generators/Generators.pdf

    > Giving generators special case semantics for no good reason is a really
    > bad idea, so I'm very curious if there is a good reason for it being
    > this way. With the current semantics it means that errors can pop up at
    > unexpected times rather than the code failing fast.


    Most assuredly they do have good reason. Consider the cases in the PDF
    I just mentioned. Building generators that work on the output of other
    generators allows assembling entire pipelines of behavior. A very
    powerful feature that would be impossible if the generators had the
    semantics you describe.

    If you want generators to behave as you suggest they should, then a
    conventional for x in blah approach is likely the better way to go.

    I use a generator anytime I want to be able to iterate across something
    that has a potentially expensive cost, in terms of memory or cpu, to do
    all at once.
     
    Michael Torrie, May 7, 2008
    #10
  11. Martin Sand Christensen

    Guest

    On May 7, 4:51 pm, Michael Torrie <> wrote:
    > Martin Sand Christensen wrote:
    > > Why don't
    > > generators follow the usual eager evaluation semantics of Python and
    > > immediately execute up until right before the first yield instead?

    >
    > A great example of why this behavior would defeat some of the purpose of
    > generators can be found in this amazing PDF presentation:
    >
    > http://www.dabeaz.com/generators/Generators.pdf
    >
    > > Giving generators special case semantics for no good reason is a really
    > > bad idea, so I'm very curious if there is a good reason for it being
    > > this way. With the current semantics it means that errors can pop up at
    > > unexpected times rather than the code failing fast.

    >
    > Most assuredly they do have good reason.  Consider the cases in the PDF
    > I just mentioned.  Building generators that work on the output of other
    > generators allows assembling entire pipelines of behavior.  A very
    > powerful feature that would be impossible if the generators had the
    > semantics you describe.
    >
    > If you want generators to behave as you suggest they should, then a
    > conventional for x in blah approach is likely the better way to go.
    >
    > I use a generator anytime I want to be able to iterate across something
    > that has a potentially expensive cost, in terms of memory or cpu, to do
    > all at once.


    The amount of concentration you can write in a program in a sitting
    (fixed amount of time) is kind of limited. Sounds like @greedy was
    the way to go. The recall implementation may have a short in the
    future, but isn't functools kind of full? Has wraptools been
    written? Is it any different?

    Naming for @greedy also comes to question. My humble opinion muscles
    glom on to @early vs. @late; @yieldprior; @dropfirst; @cooperative.
    Thesaurus.com adds @ahead vs. @behind.
     
    , May 8, 2008
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,216
    Smokey Grindel
    Dec 2, 2006
  2. Replies:
    1
    Views:
    341
    Gabriel Genellina
    Apr 22, 2008
  3. defn noob

    Generators can only yield ints?

    defn noob, Aug 22, 2008, in forum: Python
    Replies:
    4
    Views:
    276
  4. Markus
    Replies:
    1
    Views:
    217
    Mark Hubbart
    Sep 27, 2004
  5. Michael Edgar
    Replies:
    13
    Views:
    303
    Brian Candler
    Apr 21, 2011
Loading...

Share This Page