need help on generator...

Discussion in 'Python' started by Joh, Jan 21, 2005.

  1. Joh

    Joh Guest

    hello,

    i'm trying to understand how i could build following consecutive sets
    from a root one using generator :

    l = [1,2,3,4]

    would like to produce :

    [1,2], [2,3], [3,4], [1,2,3], [2,3,4]

    but unfortunately can not, i guess i can do it by using sub generator
    and maybe enumerate, please if you help could you explain a bit the
    trick ? looks like this sub generator thing mess me up.

    (i think it should may be also have [1,], [2,], [3,], [1,2,3,4] , and
    then be filtered)

    bests
     
    Joh, Jan 21, 2005
    #1
    1. Advertising

  2. On 21 Jan 2005 05:58:03 -0800
    (Joh) wrote:

    > i'm trying to understand how i could build following consecutive sets
    > from a root one using generator :
    >
    > l = [1,2,3,4]
    >
    > would like to produce :
    >
    > [1,2], [2,3], [3,4], [1,2,3], [2,3,4]


    >>> def consecutive_sets(l):

    .... for i in xrange(2, len(l)):
    .... for j in xrange(0, len(l)-i+1):
    .... yield l[j:j+i]
    ....
    >>> list(consecutive_sets([1,2,3,4]))

    [[1, 2], [2, 3], [3, 4], [1, 2, 3], [2, 3, 4]]

    --
    Denis S. Otkidach
    http://www.python.ru/ [ru]
     
    Denis S. Otkidach, Jan 21, 2005
    #2
    1. Advertising

  3. Joh wrote:

    >hello,
    >
    >i'm trying to understand how i could build following consecutive sets
    >from a root one using generator :
    >
    >l = [1,2,3,4]
    >
    >would like to produce :
    >
    >[1,2], [2,3], [3,4], [1,2,3], [2,3,4]
    >
    >

    Do you mean:

    [1,2], [2,3], [3,4], [1,2,3], [2,3,4], [1,3,4]


    (E.g. all elements in the power set except the empty set, the sets with
    one element and the sets with all elements.)
    Otherwise, please describe your desired sets in verbal - I cannot see
    the point.
     
    Laszlo Zsolt Nagy, Jan 21, 2005
    #3
  4. Joh

    Craig Ringer Guest

    On Fri, 2005-01-21 at 17:14 +0300, Denis S. Otkidach wrote:
    > On 21 Jan 2005 05:58:03 -0800
    > (Joh) wrote:
    >
    > > i'm trying to understand how i could build following consecutive sets
    > > from a root one using generator :
    > >
    > > l = [1,2,3,4]
    > >
    > > would like to produce :
    > >
    > > [1,2], [2,3], [3,4], [1,2,3], [2,3,4]

    >
    > >>> def consecutive_sets(l):

    > ... for i in xrange(2, len(l)):
    > ... for j in xrange(0, len(l)-i+1):
    > ... yield l[j:j+i]


    Since you have a much faster brain than I (though I ended up with
    exactly the same thing barring variable names) and beat me to posting
    the answer, I'll post the inevitable awful generator expression version
    instead:

    consecutive_sets = ( x[offset:eek:ffset+subset_size]
    for subset_size in xrange(2, len(x))
    for offset in xrange(0, len(x) + 1 - subset_size) )

    --
    Craig Ringer
     
    Craig Ringer, Jan 21, 2005
    #4
  5. Re: need help on need help on generator...

    Hi,

    I recently read David Mertz (IBM DeveloperWorks) about generators and got
    excited about using lazy constructs in my Python programming.

    But besides the fact that generators are either produced with the new "yield"
    reserved word or by defining the __new__ method in a class definition, I
    don't know much about them.

    In particular, I don't know what Python constructs does generate a generator.
    I know this is now the case for reading lines in a file or with the new
    "iterator" package. But what else ? Does Craig Ringer answer mean that list
    comprehensions are lazy ? Where can I find a comprehensive list of all the
    lazy constructions built in Python ? (I think that to easily distinguish lazy
    from strict constructs is an absolute programmer need -- otherwise you always
    end up wondering when is it that code is actually executed like in Haskell).

    Thank you

    Francis Girard
    FRANCE

    Le vendredi 21 Janvier 2005 15:38, Craig Ringer a écrit :
    > On Fri, 2005-01-21 at 17:14 +0300, Denis S. Otkidach wrote:
    > > On 21 Jan 2005 05:58:03 -0800
    > >
    > > (Joh) wrote:
    > > > i'm trying to understand how i could build following consecutive sets
    > > > from a root one using generator :
    > > >
    > > > l = [1,2,3,4]
    > > >
    > > > would like to produce :
    > > >
    > > > [1,2], [2,3], [3,4], [1,2,3], [2,3,4]
    > > >
    > > >>> def consecutive_sets(l):

    > >
    > > ... for i in xrange(2, len(l)):
    > > ... for j in xrange(0, len(l)-i+1):
    > > ... yield l[j:j+i]

    >
    > Since you have a much faster brain than I (though I ended up with
    > exactly the same thing barring variable names) and beat me to posting
    > the answer, I'll post the inevitable awful generator expression version
    > instead:
    >
    > consecutive_sets = ( x[offset:eek:ffset+subset_size]
    > for subset_size in xrange(2, len(x))
    > for offset in xrange(0, len(x) + 1 - subset_size) )
    >
    > --
    > Craig Ringer
     
    Francis Girard, Jan 21, 2005
    #5
  6. Joh

    Craig Ringer Guest

    On Fri, 2005-01-21 at 22:38 +0800, Craig Ringer wrote:

    > consecutive_sets = ( x[offset:eek:ffset+subset_size]
    > for subset_size in xrange(2, len(x))
    > for offset in xrange(0, len(x) + 1 - subset_size) )


    Where 'x' is list to operate on, as I should've initially noted. Sorry
    for the reply-to-self.

    I did say "awful" for a reason ;-)

    --
    Craig Ringer
     
    Craig Ringer, Jan 21, 2005
    #6
  7. Joh

    Peter Otten Guest

    Joh wrote:

    > i'm trying to understand how i could build following consecutive sets
    > from a root one using generator :
    >
    > l = [1,2,3,4]
    >
    > would like to produce :
    >
    > [1,2], [2,3], [3,4], [1,2,3], [2,3,4]
    >
    > but unfortunately can not, i guess i can do it by using sub generator
    > and maybe enumerate, please if you help could you explain a bit the
    > trick ? looks like this sub generator thing mess me up.


    Here is an (untested) variant that accepts any iterable while trying to
    remain memory-efficient. This makes it necessary to shuffle the order of
    the output a bit.

    from itertools import tee, islice

    def gen(iterable, start, end):
    it = iter(iterable)
    while True:
    it, a = tee(it)
    a = tuple(islice(a, end-1))
    for sz in xrange(start, len(a)+1):
    yield a[:sz]
    it.next()


    if __name__ == "__main__":
    print list(gen(range(1, 5), 2, 4))

    # prints:
    # [(1, 2), (1, 2, 3), (2, 3), (2, 3, 4), (3, 4)]

    Peter
     
    Peter Otten, Jan 21, 2005
    #7
  8. Joh

    Craig Ringer Guest

    Re: need help on need help on generator...

    On Fri, 2005-01-21 at 16:05 +0100, Francis Girard wrote:
    > I recently read David Mertz (IBM DeveloperWorks) about generators and
    > got excited about using lazy constructs in my Python programming.


    Speaking of totally great articles, and indirectly to lazyness (though
    not lazyily evaluated constructs), I was really impressed by this
    fantastic article on metaclasses:

    http://gnosis.cx/publish/programming/metaclass_1.html
    http://gnosis.cx/publish/programming/metaclass_2.html

    which shows that they're really just not that hard. That saved me an
    IMMENSE amount of utterly tedious coding just recently.

    > But besides the fact that generators are either produced with the new
    > "yield" reserved word or by defining the __new__ method in a class
    > definition, I don't know much about them.


    They can also be created with a generator expression under Python 2.4. A
    generator expression works much like a list comprehension, but returns a
    generator instead of a list, and is evaluated lazily. (It also doesn't
    pollute the outside namespace with its working variables).

    >>> print [ x for x in range(1,10)]

    [1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> print ( x for x in xrange(1,10) )

    <generator object at 0x401e40ac>
    >>> print list(( x for x in xrange(1,10) ))

    [1, 2, 3, 4, 5, 6, 7, 8, 9]

    Not the use of xrange above for efficiency in the generator expressions.
    These examples are trivial and pointless, but hopefully get the point
    across.

    > In particular, I don't know what Python constructs does generate a
    > generator.


    As far as I know, functions that use yield, and generator expressions. I
    was unaware of the ability to create them using a class with a __new__
    method, and need to check that out - I can imagine situations in which
    it might be rather handy.

    I'm not sure how many Python built-in functions and library modules
    return generators for things.

    > I know this is now the case for reading lines in a file or with the
    > new "iterator" package. But what else ? Does Craig Ringer answer mean
    > that list comprehensions are lazy ?


    Nope, but generator expressions are, and they're pretty similar.

    > Where can I find a comprehensive list of all the lazy constructions
    > built in Python ? (I think that to easily distinguish lazy from strict
    > constructs is an absolute programmer need -- otherwise you always end
    > up wondering when is it that code is actually executed like in
    > Haskell).


    I'm afraid I can't help you with that. I tend to take the view that side
    effects in lazily executed code are a bad plan, and use lazy execution
    for things where there is no reason to care when the code is executed.

    --
    Craig Ringer
     
    Craig Ringer, Jan 21, 2005
    #8
  9. Le vendredi 21 Janvier 2005 16:06, Craig Ringer a écrit :
    > On Fri, 2005-01-21 at 22:38 +0800, Craig Ringer wrote:
    > > consecutive_sets = ( x[offset:eek:ffset+subset_size]
    > > for subset_size in xrange(2, len(x))
    > > for offset in xrange(0, len(x) + 1 - subset_size) )

    >
    > Where 'x' is list to operate on, as I should've initially noted. Sorry
    > for the reply-to-self.
    >
    > I did say "awful" for a reason ;-)
    >
    > --
    > Craig Ringer


    First, I think that you mean :

    consecutive_sets = [ x[offset:eek:ffset+subset_size]
    for subset_size in xrange(2, len(x))
    for offset in xrange(0, len(x) + 1 - subset_size)]

    (with square brackets).

    Second,

    this is not lazy anymore (like Denis S. Otkidach previous answer was) because
    the __whole__ list get constructed __before__ any other piece of code have a
    chance to execute. The type of consecutive_sets is simply a list, not a
    generator.

    I'm just trying to understand and obviously I'm missing the point.

    Thank you

    Francis Girard
    FRANCE
     
    Francis Girard, Jan 21, 2005
    #9
  10. Re: need help on need help on generator...

    Really, thank you Craig Ringer for your great answer.


    > I'm afraid I can't help you with that. I tend to take the view that side
    > effects in lazily executed code are a bad plan, and use lazy execution
    > for things where there is no reason to care when the code is executed.



    I completly agree with this. But this is much more true in theory than in
    practice. In practice you might end up with big big memory usage with lazy
    constructs as the "system" has to store intermediate results (the execution
    context in the case of Python). These kinds of phenomena happen all the time
    in a language like Haskell -- at least for a beginner like me -- if you don't
    pay attention to it ; and this makes the language a lot more difficult to
    master. Thus you have to keep an eye on performance even though, in FP, you
    shoould just have to "declare" your intentions and let the system manage the
    execution path.


    > http://gnosis.cx/publish/programming/metaclass_1.html
    > http://gnosis.cx/publish/programming/metaclass_2.html


    Thank you, I'll read that.

    Francis Girard
    FRANCE

    Le vendredi 21 Janvier 2005 16:42, Craig Ringer a écrit :
    > On Fri, 2005-01-21 at 16:05 +0100, Francis Girard wrote:
    > > I recently read David Mertz (IBM DeveloperWorks) about generators and
    > > got excited about using lazy constructs in my Python programming.

    >
    > Speaking of totally great articles, and indirectly to lazyness (though
    > not lazyily evaluated constructs), I was really impressed by this
    > fantastic article on metaclasses:
    >
    > http://gnosis.cx/publish/programming/metaclass_1.html
    > http://gnosis.cx/publish/programming/metaclass_2.html
    >
    > which shows that they're really just not that hard. That saved me an
    > IMMENSE amount of utterly tedious coding just recently.
    >
    > > But besides the fact that generators are either produced with the new
    > > "yield" reserved word or by defining the __new__ method in a class
    > > definition, I don't know much about them.

    >
    > They can also be created with a generator expression under Python 2.4. A
    > generator expression works much like a list comprehension, but returns a
    > generator instead of a list, and is evaluated lazily. (It also doesn't
    > pollute the outside namespace with its working variables).
    >
    > >>> print [ x for x in range(1,10)]

    >
    > [1, 2, 3, 4, 5, 6, 7, 8, 9]
    >
    > >>> print ( x for x in xrange(1,10) )

    >
    > <generator object at 0x401e40ac>
    >
    > >>> print list(( x for x in xrange(1,10) ))

    >
    > [1, 2, 3, 4, 5, 6, 7, 8, 9]
    >
    > Not the use of xrange above for efficiency in the generator expressions.
    > These examples are trivial and pointless, but hopefully get the point
    > across.
    >
    > > In particular, I don't know what Python constructs does generate a
    > > generator.

    >
    > As far as I know, functions that use yield, and generator expressions. I
    > was unaware of the ability to create them using a class with a __new__
    > method, and need to check that out - I can imagine situations in which
    > it might be rather handy.
    >
    > I'm not sure how many Python built-in functions and library modules
    > return generators for things.
    >
    > > I know this is now the case for reading lines in a file or with the
    > > new "iterator" package. But what else ? Does Craig Ringer answer mean
    > > that list comprehensions are lazy ?

    >
    > Nope, but generator expressions are, and they're pretty similar.
    >
    > > Where can I find a comprehensive list of all the lazy constructions
    > > built in Python ? (I think that to easily distinguish lazy from strict
    > > constructs is an absolute programmer need -- otherwise you always end
    > > up wondering when is it that code is actually executed like in
    > > Haskell).

    >
    > I'm afraid I can't help you with that. I tend to take the view that side
    > effects in lazily executed code are a bad plan, and use lazy execution
    > for things where there is no reason to care when the code is executed.
    >
    > --
    > Craig Ringer
     
    Francis Girard, Jan 21, 2005
    #10
  11. Joh

    Craig Ringer Guest

    On Fri, 2005-01-21 at 16:54 +0100, Francis Girard wrote:

    > First, I think that you mean :
    >
    > consecutive_sets = [ x[offset:eek:ffset+subset_size]
    > for subset_size in xrange(2, len(x))
    > for offset in xrange(0, len(x) + 1 - subset_size)]
    >
    > (with square brackets).


    > I'm just trying to understand and obviously I'm missing the point.


    Yep. This:

    ( x for x in xrange(10) )

    will return a generator that calculates things as it goes, while this:

    [ x for x in xrange(10) ]

    will return a list.

    Check out:
    http://www.python.org/peps/pep-0289.html
    http://docs.python.org/whatsnew/node4.html
    http://www.python.org/dev/doc/newstyle/ref/genexpr.html
    for details.

    --
    Craig Ringer
     
    Craig Ringer, Jan 21, 2005
    #11
  12. Thank you,

    I immediately download version 2.4, switching from version 2.3.

    Francis Girard
    FRANCE

    Le vendredi 21 Janvier 2005 17:34, Craig Ringer a écrit :
    > On Fri, 2005-01-21 at 16:54 +0100, Francis Girard wrote:
    > > First, I think that you mean :
    > >
    > > consecutive_sets = [ x[offset:eek:ffset+subset_size]
    > > for subset_size in xrange(2, len(x))
    > > for offset in xrange(0, len(x) + 1 - subset_size)]
    > >
    > > (with square brackets).
    > >
    > > I'm just trying to understand and obviously I'm missing the point.

    >
    > Yep. This:
    >
    > ( x for x in xrange(10) )
    >
    > will return a generator that calculates things as it goes, while this:
    >
    > [ x for x in xrange(10) ]
    >
    > will return a list.
    >
    > Check out:
    > http://www.python.org/peps/pep-0289.html
    > http://docs.python.org/whatsnew/node4.html
    > http://www.python.org/dev/doc/newstyle/ref/genexpr.html
    > for details.
    >
    > --
    > Craig Ringer
     
    Francis Girard, Jan 21, 2005
    #12
  13. Joh

    Nick Coghlan Guest

    Re: need help on need help on generator...

    Francis Girard wrote:
    > In particular, I don't know what Python constructs does generate a generator.
    > I know this is now the case for reading lines in a file or with the new
    > "iterator" package. But what else ? Does Craig Ringer answer mean that list
    > comprehensions are lazy ? Where can I find a comprehensive list of all the
    > lazy constructions built in Python ? (I think that to easily distinguish lazy
    > from strict constructs is an absolute programmer need -- otherwise you always
    > end up wondering when is it that code is actually executed like in Haskell).


    I don't think there is a *list* as such, but there are some rules of thumb for
    when lazy evaluation will take place (hopefully others will note any cases that
    I missed):

    1. Iterators (classes with a next() method, and an __iter__ method that returns
    'self') are lazily evaluated, as itr.next() is called to retrieve each value. I
    think you will find it is this method, rather than __new__ which is relevant to
    creating class-based generators. Note that "for x in itr" automatically calls
    itr.next() in order to obtain each new value of the loop variable.
    This iterator protocol is the basis of lazy evaluation in Python, and is
    described here:
    http://www.python.org/dev/doc/devel/lib/typeiter.html

    2. Iterables (classes with an __iter__ method) will return a lazy iterator via
    iter(obj). Actual iterators return themselves from __iter__, so iter(obj) is a
    good way to make sure you have an iterator.

    3. Generators (functions that use 'yield' instead of 'return') and generator
    expressions (like list comprehensions, but without the square brackets) are
    simply concise ways of creating iterators.

    4. The utility functions in the itertools module generally return iterators
    rather than lists (this shouldn't suprise anyone!)

    5. Several builtin functions return iterators rather than lists, specifically
    xrange(), enumerate() and reversed(). Other builtins that yield sequences
    (range(), sorted(), zip()) return lists.

    However, be aware that some things which accept any iterable don't take
    advantage of the lazy evaluation, and still cause the whole thing to be created
    in memory at once - "".join(itr) is currently one such operation.

    The sequence vs iterator distinction is somewhat unfortunate (since it breaks
    with TOOWTDI), but completing the transition to an iterator based approach isn't
    going to be possible until Python 3.0, when things that currently return lists
    can be converted to return iterators (i.e. it has been suggested that the
    fundamental construct in Python 3.x should be an iterator just as a list is the
    fundamental construct in Python 2.x)

    Regards,
    Nick.

    --
    Nick Coghlan | | Brisbane, Australia
    ---------------------------------------------------------------
    http://boredomandlaziness.skystorm.net
     
    Nick Coghlan, Jan 22, 2005
    #13
  14. Re: need help on need help on generator...

    Francis Girard <> wrote:
    ...
    > But besides the fact that generators are either produced with the new "yield"
    > reserved word or by defining the __new__ method in a class definition, I
    > don't know much about them.


    Having __new__ in a class definition has nothing much to do with
    generators; it has to do with how the class is instantiated when you
    call it. Perhaps you mean 'next' (and __iter__)? That makes instances
    of the class iterators, just like iterators are what you get when you
    call a generator.

    > In particular, I don't know what Python constructs does generate a generator.


    A 'def' of a function whose body uses 'yield', and in 2.4 the new genexp
    construct.

    > I know this is now the case for reading lines in a file or with the new
    > "iterator" package.


    Nope, besides the fact that the module you're thinking of is named
    'itertools': itertools uses a lot of C-coded special types, which are
    iterators but not generators. Similarly, a file object is an iterator
    but not a generator.

    > But what else ?


    Since you appear to conflate generators and iterators, I guess the iter
    built-in function is the main one you missed. iter(x), for any x,
    either raises an exception (if x's type is not iterable) or else returns
    an iterator.

    > Does Craig Ringer answer mean that list
    > comprehensions are lazy ?


    Nope, those were generator expressions.

    > Where can I find a comprehensive list of all the
    > lazy constructions built in Python ?


    That's yet a different question -- at least one needs to add the
    built-in xrange, which is neither an iterator nor a generator but IS
    lazy (a historical artefact, admittedly).

    But fortunately Python's built-ins are not all THAT many, so that's
    about it.

    > (I think that to easily distinguish lazy
    > from strict constructs is an absolute programmer need -- otherwise you always
    > end up wondering when is it that code is actually executed like in Haskell).


    Encapsulation doesn't let you "easily distinguish" issues of
    implementation. For example, the fact that a file is an iterator (its
    items being its lines) doesn't tell you if that's internally implemented
    in a lazy or eager way -- it tells you that you can code afile.next() to
    get the next line, or "for line in afile:" to loop over them, but does
    not tell you whether the code for the file object is reading each line
    just when you ask for it, or whether it reads all lines before and just
    keeps some state about the next one, or somewhere in between.

    The answer for the current implementation, BTW, is "in between" -- some
    buffering, but bounded consumption of memory -- but whether that tidbit
    of pragmatics is part of the file specs, heh, that's anything but clear
    (just as for other important tidbits of Python pragmatics, such as the
    facts that list.sort is wickedly fast, 'x in alist' isn't, 'x in adict'
    IS...).


    Alex
     
    Alex Martelli, Jan 22, 2005
    #14
  15. Re: need help on need help on generator...

    Nick Coghlan <> wrote:

    > 5. Several builtin functions return iterators rather than lists, specifically
    > xrange(), enumerate() and reversed(). Other builtins that yield sequences
    > (range(), sorted(), zip()) return lists.


    Yes for enumerate and reversed, no for xrange:

    >>> xx=xrange(7)
    >>> xx.next()

    Traceback (most recent call last):
    File "<stdin>", line 1, in ?
    AttributeError: 'xrange' object has no attribute 'next'
    >>>


    it SHOULD return an iterator, no doubt, but it doesn't (can't, for
    backwards compatibility reasons). Neither does it return a list: it
    returns "an `xrange' object", a specialized type that's not an iterator,
    though it's iterable. It's a type, btw:

    >>> xrange

    <type 'xrange'>
    >>>


    so it's not surprising that calling it returns instances of it
    (enumerate and reversed are also types, but *WITH* 'next'...).


    Alex
     
    Alex Martelli, Jan 22, 2005
    #15
  16. Joh

    Craig Ringer Guest

    Re: need help on need help on generator...

    On Sat, 2005-01-22 at 10:10 +0100, Alex Martelli wrote:

    > The answer for the current implementation, BTW, is "in between" -- some
    > buffering, but bounded consumption of memory -- but whether that tidbit
    > of pragmatics is part of the file specs, heh, that's anything but clear
    > (just as for other important tidbits of Python pragmatics, such as the
    > facts that list.sort is wickedly fast, 'x in alist' isn't, 'x in adict'
    > IS...).


    A particularly great example when it comes to unexpected buffering
    effects is the file iterator. Take code that reads a header from a file
    using an (implicit) iterator, then tries to read() the rest of the file.
    Taking the example of reading an RFC822-like message into a list of
    headers and a body blob:

    ..>>> inpath = '/tmp/msg.eml'
    ..>>> infile = open(inpath)
    ..>>> for line in infile:
    ..... if not line.strip():
    ..... break
    ..... headers.append(tuple(line.split(':',1)))
    ..>>> body = infile.read()


    (By the way, if you ever implement this yourself for real, you should
    probably be hurt - use the 'email' or 'rfc822' modules instead. For one
    thing, reinventing the wheel is rarely a good idea. For another, the
    above code is horrid - in particular it doesn't handle malformed headers
    at all, isn't big on readability/comments, etc.)

    If you run the above code on a saved email message, you'd expect 'body'
    to contain the body of the message, right? Nope. The iterator created
    from the file when you use it in that for loop does internal read-ahead
    for efficiency, and has already read in the entire file or at least a
    chunk more of it than you've read out of the iterator. It doesn't
    attempt to hide this from the programmer, so the file position marker is
    further into the file (possibly at the end on a smaller file) than you'd
    expect given the data you've actually read in your program.

    I'd be interested to know if there's a better solution to this than:

    ..>>> inpath = '/tmp/msg.eml'
    ..>>> infile = open(inpath)
    ..>>> initer = iter(infile)
    ..>>> headers = []
    ..>>> for line in initer:
    ..... if not line.strip():
    ..... break
    ..... headers.append(tuple(line.split(':',1)))
    ..>>> data = ''.join(x for x in initer)

    because that seems like a pretty ugly hack (and please ignore the
    variable names). Perhaps a way to get the file to seek back to the point
    last read from the iterator when the iterator is destroyed?

    --
    Craig Ringer
     
    Craig Ringer, Jan 22, 2005
    #16
  17. Re: need help on need help on generator...

    Le samedi 22 Janvier 2005 10:10, Alex Martelli a écrit :
    > Francis Girard <> wrote:
    > ...
    >
    > > But besides the fact that generators are either produced with the new
    > > "yield" reserved word or by defining the __new__ method in a class
    > > definition, I don't know much about them.

    >
    > Having __new__ in a class definition has nothing much to do with
    > generators; it has to do with how the class is instantiated when you
    > call it. Perhaps you mean 'next' (and __iter__)? That makes instances
    > of the class iterators, just like iterators are what you get when you
    > call a generator.
    >


    Yes, I meant "next".


    > > In particular, I don't know what Python constructs does generate a
    > > generator.

    >
    > A 'def' of a function whose body uses 'yield', and in 2.4 the new genexp
    > construct.
    >


    Ok. I guess I'll have to update to version 2.4 (from 2.3) to follow the
    discussion.

    > > I know this is now the case for reading lines in a file or with the new
    > > "iterator" package.

    >
    > Nope, besides the fact that the module you're thinking of is named
    > 'itertools': itertools uses a lot of C-coded special types, which are
    > iterators but not generators. Similarly, a file object is an iterator
    > but not a generator.
    >
    > > But what else ?

    >
    > Since you appear to conflate generators and iterators, I guess the iter
    > built-in function is the main one you missed. iter(x), for any x,
    > either raises an exception (if x's type is not iterable) or else returns
    > an iterator.
    >


    You're absolutly right, I take the one for the other and vice-versa. If I
    understand correctly, a "generator" produce something over which you can
    iterate with the help of an "iterator". Can you iterate (in the strict sense
    of an "iterator") over something not generated by a "generator" ?


    > > Does Craig Ringer answer mean that list
    > > comprehensions are lazy ?

    >
    > Nope, those were generator expressions.
    >
    > > Where can I find a comprehensive list of all the
    > > lazy constructions built in Python ?

    >
    > That's yet a different question -- at least one needs to add the
    > built-in xrange, which is neither an iterator nor a generator but IS
    > lazy (a historical artefact, admittedly).
    >
    > But fortunately Python's built-ins are not all THAT many, so that's
    > about it.
    >
    > > (I think that to easily distinguish lazy
    > > from strict constructs is an absolute programmer need -- otherwise you
    > > always end up wondering when is it that code is actually executed like in
    > > Haskell).

    >
    > Encapsulation doesn't let you "easily distinguish" issues of
    > implementation. For example, the fact that a file is an iterator (its
    > items being its lines) doesn't tell you if that's internally implemented
    > in a lazy or eager way -- it tells you that you can code afile.next() to
    > get the next line, or "for line in afile:" to loop over them, but does
    > not tell you whether the code for the file object is reading each line
    > just when you ask for it, or whether it reads all lines before and just
    > keeps some state about the next one, or somewhere in between.
    >


    You're right. I was much more talking (mistakenly) about lazy evaluation of
    the arguments to a function (i.e. the function begins execution before its
    arguments get evaluated) -- in such a case I think it should be specified
    which arguments are "strict" and which are "lazy" -- but I don't think
    there's such a thing in Python (... well not yet as Python get more and more
    akin to FP).

    > The answer for the current implementation, BTW, is "in between" -- some
    > buffering, but bounded consumption of memory -- but whether that tidbit
    > of pragmatics is part of the file specs, heh, that's anything but clear
    > (just as for other important tidbits of Python pragmatics, such as the
    > facts that list.sort is wickedly fast, 'x in alist' isn't, 'x in adict'
    > IS...).
    >
    >
    > Alex


    Thank you

    Francis Girard
    FRANCE
     
    Francis Girard, Jan 22, 2005
    #17
  18. Joh

    Craig Ringer Guest

    Re: need help on need help on generator...

    On Sat, 2005-01-22 at 17:46 +0800, I wrote:

    > I'd be interested to know if there's a better solution to this than:
    >
    > .>>> inpath = '/tmp/msg.eml'
    > .>>> infile = open(inpath)
    > .>>> initer = iter(infile)
    > .>>> headers = []
    > .>>> for line in initer:
    > .... if not line.strip():
    > .... break
    > .... headers.append(tuple(line.split(':',1)))
    > .>>> data = ''.join(x for x in initer)
    >
    > because that seems like a pretty ugly hack (and please ignore the
    > variable names). Perhaps a way to get the file to seek back to the point
    > last read from the iterator when the iterator is destroyed?


    And now, answering my own question (sorry):

    Answer: http://tinyurl.com/6avdt

    so we can slightly simplify the above to:

    ..>>> inpath = '/tmp/msg.eml'
    ..>>> infile = open(inpath)
    ..>>> headers = []
    ..>>> for line in infile:
    ..... if not line.strip():
    ..... break
    ..... headers.append(tuple(line.split(':',1)))
    ..>>> data = ''.join(x for x in infile)

    at the cost of making it less clear what's going on and having someone
    later go "duh, why isn't he using read() here instead" but can't seem to
    do much more than that.

    Might it be worth providing a way to have file objects seek back to the
    current position of the iterator when read() etc are called? If not, I
    favour the suggestion in the referenced post - file should probably fail
    noisily, or at least emit a warning.

    What are others thoughts on this?

    --
    Craig Ringer
     
    Craig Ringer, Jan 22, 2005
    #18
  19. Re: need help on need help on generator...

    Francis Girard <> wrote:
    ...
    > > A 'def' of a function whose body uses 'yield', and in 2.4 the new genexp
    > > construct.

    >
    > Ok. I guess I'll have to update to version 2.4 (from 2.3) to follow the
    > discussion.


    It's worth upgrading even just for the extra speed;-).


    > > Since you appear to conflate generators and iterators, I guess the iter
    > > built-in function is the main one you missed. iter(x), for any x,
    > > either raises an exception (if x's type is not iterable) or else returns
    > > an iterator.

    >
    > You're absolutly right, I take the one for the other and vice-versa. If I
    > understand correctly, a "generator" produce something over which you can
    > iterate with the help of an "iterator". Can you iterate (in the strict sense
    > of an "iterator") over something not generated by a "generator" ?


    A generator function (commonly known as a generator), each time you call
    it, produces a generator object AKA a generator-iterator. To wit:

    >>> def f(): yield 23

    ....
    >>> f

    <function f at 0x75fe70>
    >>> x = f()
    >>> x

    <generator object at 0x75aa58>
    >>> type(x)

    <type 'generator'>

    A generator expression (genexp) also has a result which is a generator
    object:

    >>> x = (23 for __ in [0])
    >>> type(x)

    <type 'generator'>


    Iterators need not be generator-iterators, by any means. Generally, the
    way to make sure you have an iterator is to call iter(...) on something;
    if the something was already an iterator, NP, then iter's idempotent:

    >>> iter(x) is x

    True


    That's what "an iterator" means: some object x such that x.next is
    callable without arguments and iter(x) is x.

    Since iter(x) tries calling type(x).__iter__(x) [[slight simplification
    here by ignoring custom metaclasses, see recent discussion on python-dev
    as to why this is only 99% accurate, not 100% accurate]], one way to
    code an iterator is as a class. For example:

    class Repeater(object):
    def __iter__(self): return self
    def next(self): return 23

    Any instance of Repeater is an iterator which, as it happens, has just
    the same behavior as itertools.repeat(23), which is also the same
    behavior you get from iterators obtained by calling:

    def generepeat():
    while True: yield 23

    In other words, after:

    a = Repeater()
    b = itertools.repeat(23)
    c = generepeat()

    the behavior of a, b and c is indistinguishable, though you can easily
    tell them apart by introspection -- type(a) != type(b) != type(c).


    Python's penchant for duck typing -- behavior matters more, WAY more
    than implementation details such as type() -- means we tend to consider
    a, b and c fully equivalent. Focusing on ``generator'' is at this level
    an implementation detail.


    Most often, iterators (including generator-iterators) are used in a for
    statement (or equivalently a for clause of a listcomp or genexp), which
    is why one normally doesn't think about built-in ``iter'': it's called
    automatically by these ``for'' syntax-forms. In other words,

    for x in <<<whatever>>>:
    ...body...

    is just like:

    __tempiter = iter(<<<whatever>>>)
    while True:
    try: x = __tempiter.next()
    except StopIteration: break
    ...body...

    ((Simplification alert: the ``for'' statement has an optional ``else''
    which this allegedly "just like" form doesn't mimic exactly...))


    > You're right. I was much more talking (mistakenly) about lazy evaluation of
    > the arguments to a function (i.e. the function begins execution before its
    > arguments get evaluated) -- in such a case I think it should be specified
    > which arguments are "strict" and which are "lazy" -- but I don't think
    > there's such a thing in Python (... well not yet as Python get more and more
    > akin to FP).


    Python's strict that way. To explicitly make some one argument "lazy",
    sorta, you can put a "lambda:" in front of it at call time, but then you
    have to "call the argument" to get it evaluated; a bit of a kludge.
    There's a PEP out to allow a ``prefix :'' to mean just the same as this
    "lambda:", but even though I co-authored it I don't think it lowers the
    kludge quotient by all that much.

    Guido, our beloved BDFL, is currently musing about optional typing of
    arguments, which might perhaps open a tiny little crack towards letting
    some arguments be lazy. I don't think Guido wants to go there, though.

    My prediction is that even Python 3000 will be strict. At least this
    makes some things obvious at each call-site without having to study the
    way a function is defined, e.g., upon seeing
    f(a+b, c*d)
    you don't have to wonder, or study the ``def f'', to find out when the
    addition and the multiplication happen -- they happen before f's body
    gets a chance to run, and thus, in particular, if either operation
    raises an exception, there's nothing f can do about it.

    And that's a misunderstanding I _have_ seen repeatedly even in people
    with a pretty good overall grasp of Python, evidenced in code such as:
    self.assertRaises(SomeError, f(23))
    with astonishment that -- if f(23) does indeed raise SomeError -- this
    exception propagates, NOT caught by assertRaises; and if mistakenly
    f(23) does NOT raise, you typically get a TypeError about None not being
    callable. The way to do the above call, _since Python is strict_, is:
    self.assertRaises(SomeError, f, 23)
    i.e. give assertRaises the function (or other callable) and arguments to
    pass to it -- THIS way, assertRaises performs the call under its own
    control (inside the try clause of a try/except statement) and can and
    does catch and report things appropriately.

    The frequency of this misunderstanding is high enough to prove to me
    that strictness is not FULLY understood by some "intermediate level"
    Pythonistas. However, I doubt the right solution is to complicate
    Python with the ability to have some arguments be strict, and other
    lazy, much as sometimes one might yearn for it. _Maybe_ one could have
    some form that makes ALL arguments lazy, presenting them as an iterator
    to the function itself. But even then the form _should_, I suspect, be
    obvious at the call site, rather than visible only in the "def"...



    Alex
     
    Alex Martelli, Jan 22, 2005
    #19
  20. Re: need help on need help on generator...

    Craig Ringer <> wrote:

    > .>>> data = ''.join(x for x in infile)


    Maybe ''.join(infile) is a better way to express this functionality?
    Avoids 2.4 dependency and should be faster as well as more concise.


    > Might it be worth providing a way to have file objects seek back to the
    > current position of the iterator when read() etc are called? If not, I


    It's certainly worth doing a patch and see what the python-dev crowd
    thinks of it, I think; it might make it into 2.5.

    > favour the suggestion in the referenced post - file should probably fail
    > noisily, or at least emit a warning.


    Under what conditions, exactly, would you want such an exception?


    Alex
     
    Alex Martelli, Jan 22, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Martin Maurer
    Replies:
    3
    Views:
    5,091
    Peter
    Apr 19, 2006
  2. TheDustbustr
    Replies:
    1
    Views:
    496
    Sami Hangaslammi
    Jul 25, 2003
  3. Replies:
    9
    Views:
    609
  4. Chris Withers

    Problems with email.Generator.Generator

    Chris Withers, Sep 11, 2006, in forum: Python
    Replies:
    20
    Views:
    1,758
    Max M
    Sep 12, 2006
  5. Terry Reedy

    Generator functions subclass generator?

    Terry Reedy, Jun 18, 2009, in forum: Python
    Replies:
    0
    Views:
    501
    Terry Reedy
    Jun 18, 2009
Loading...

Share This Page