Proposed new PEP: print to expand generators

Discussion in 'Python' started by James J. Besemer, Jun 4, 2006.

  1. I would like to champion a proposed enhancement to Python. I describe the
    basic idea below, in order to gage community interest. Right now, it's only
    an idea, and I'm sure there's room for improvement. And of course it's
    possible there's some serious "gotcha" I've overlooked. Thus I welcome any
    and all comments.

    If there's some agreement that this proposal is worth further consideration
    then I'll re-submit a formal document in official PEP format.

    Regards

    --jb

    PEP -- EXTEND PRINT TO EXPAND GENERATORS

    NUTSHELL

    I propose that we extend the semantics of "print" such that if the object to
    be printed is a generator then print would iterate over the resulting
    sequence of sub-objects and recursively print each of the items in order.

    E.g.,

    print obj

    under the proposal would behave something like

    import types

    if type( obj ) == types.GeneratorType:
    for item in obj:
    print item, # recursive call
    print # trailing newline
    else:
    print obj # existing print behavior

    I know this isn't precisely how print would work, but I intentionally
    simplified the illustration to emphasize the intended change. Nevertheless,
    several points above expressly are part of this proposal (subject to
    discussion and possible revision):

    Print behavior does not change EXCEPT in the case
    that the object being printed is a generator.

    Enumerated items are printed with intervening spaces
    [alternatively: "" or "\n"].

    An enumerated sequence ends with a newline
    [alternatively: "" or " "].

    Iterators themselves could return iterators as elements, and the proposed
    change to print would recursively serialize any arbitrary "tree" of iterators.

    __str__() for complex user-defined objects then could return iterators, and
    arbitrarily complex structures could be printed out without glomming
    everything into a huge string -- only to throw it away in the end.

    I expect we likely also would want to modify str() itself to embody this
    serialization behavior. This additional change would support those cases
    where one actually does want the single large string in the end, say, to
    store into a UI widget. Still, the string would be constructed once at the
    end, much more efficiently than by building a bunch of smaller, intermediate
    strings.

    Then, in an abstract sense, we would not be changing print at all -- the new
    semantics would be embodied in the change to str(). However, in practice,
    we'd also want to modify print, as an important optimization for a more
    common use case.

    The present behavior (displaying, e.g., "<generator object at 0x016BA288>")
    would still be available via

    print repr( generator )

    Note that this behavior presently results from all three of:

    print generator
    print str( generator )
    print repr( generator )

    So, this proposal merely ascribes useful new semantics to the first two of
    three redundant language constructs.

    MOTIVATION

    With increasingly complex objects, the print representation naturally becomes
    more complex. In particular, when an object consists of a collection of
    sub-objects, it's natural for it's string representation to be defined
    recursively in terms of the sub-components' string representations, with some
    further indication of how they're held together.

    This is possible to do with the __str__ overload and the existing print
    semantics. However, existing semantics require constructing many otherwise
    unnecessary intermediate strings, and, as such, is grossly inefficient.
    Worse, each intermediate string is generally the catenation of several
    previous intermediaries, so the volume of intermediate results steadily
    increases throughout the conversion. Finally, the cost of string operations
    is proportional to the length of the strings in question, so I expect the
    overall cost increases significantly faster than in direct proportion to the
    size of the output (i.e. it's non-linear).

    E.g., instances of the following classes can become arbitrarily expensive to
    print out:

    def HtmlTable( object ):
    # ...
    def __str__( self ):
    return ( "<table"
    + str( self.attr )
    + ">\n"
    + "".join([ str( row ) for row in self.head ])
    + "".join([ str( row ) for row in self.rows ])
    + "</table>\n" )

    def HtmlRow( object ):
    # ...
    def __str__( self ):
    return ( "<tr"
    + str( self.attr )
    + ">\n"
    + "".join([ str( cell ) for cell in self.cells ])
    + "</tr>\n" )

    def HtmlCell( object ):
    # ...
    def __str__( self ):
    return ( "<td"
    + str( self.attr )
    + ">\n"
    + "".join([ str( datum ) for datum in self.data ])
    + "</td>\n" )

    Clearly, printing an arbitrary HtmlTable might require a LOT of unnecessary
    string manipulation.

    Using the proposed extension, the above example could be implemented instead
    as something like:

    def HtmlTable( object ):
    # ...
    def __str__( self ):
    yield "<table"
    yield str( self.attr )
    yield ">\n"
    for row in self.head:
    yield str( row )
    for row in self.rows:
    yield str( row )
    yield "</table>\n"

    def HtmlRow( object ):
    # ...
    def __str__( self ):
    yield "<tr"
    yield str( self.attr )
    yield ">\n"
    for cell in self.cells:
    yield str( cell )
    yield "</tr>\n"


    def HtmlCell( object ):
    # ...
    def __str__( self ):
    yield "<td"
    yield str( self.attr )
    yield ">\n"
    for datum in self.data:
    yield str( datum )
    yield "</td>\n"

    With the new extension, the individual bits of data are simply output in the
    proper order, virtually eliminating unnecessary string operations, resulting
    in a huge performance improvement. In fact, in the common case where all of
    the leaf nodes are literal strings, then the entire HTML table (or page!)
    could be written out without any string manipulation -- the existing strings
    are simply written out from their present locations in memory!

    Furthermore, there's greater clarity and economy of expression in the
    proposed new method.

    The primary motivation behind this proposal is to eliminate unnecessary
    overhead, while retaining all the convenience of the existing semantics of
    string representations of custom objects.

    While it's not 100% backwards compatible, it assigns a new meaning to one of
    several redundant and little-used, existing language constructs.


    ALTERNATIVES

    In lieu of the proposed change, users can define their own auxiliary function
    to generate the output. E.g.:

    def HtmlTable( object ):
    # ...
    def pr( self, stream=sys.stdout ):
    "<table"
    print >>stream, str( self.attr )
    print >>stream, ">\n"
    for row in self.head:
    print >>stream, row
    row in self.rows:
    print >>stream, row
    print >>stream, "</table>"

    I myself have successfully used this technique in a variety of applications.

    Pro:
    Requires no changes to Python

    Con:
    The solution has to be "hand crafted" in each case,
    subject to user errors.

    The solution only works if user expressly maintains the
    convention throughout his class hierarchy.

    The solution is not interchangeable with objects
    from other authors.

    ///
    James J. Besemer, Jun 4, 2006
    #1
    1. Advertising

  2. James J. Besemer

    Roy Smith Guest

    In article <>,
    "James J. Besemer" <> wrote:

    > I propose that we extend the semantics of "print" such that if the object to
    > be printed is a generator then print would iterate over the resulting
    > sequence of sub-objects and recursively print each of the items in order.


    I believe the functionality you desire already exists, or something very
    close to it, in the pprint (pretty printer) module.
    Roy Smith, Jun 4, 2006
    #2
    1. Advertising

  3. James J. Besemer wrote:
    > I propose that we extend the semantics of "print" such that if the
    > object to be printed is a generator then print would iterate over the
    > resulting sequence of sub-objects and recursively print each of the
    > items in order.


    I don't feel like searching for the specific python-dev threads right
    now, but something like this has been suggested before (I think with a
    "%i" formatting code), and Guido felt strongly that the addition or
    removal of a simple print statement shouldn't change the behavior of the
    surrounding code.

    Consider code like::

    items = get_generator_or_None()
    for item in items:
    do_something(item)

    Now let's say I insert a debugging line like::

    items = get_generator_or_None()
    print "make sure this isn't None:", items
    for item in items:
    do_something(item)

    My debugging line now just broke the rest of my code. That's not good.


    The other reason I don't think this PEP should go forward (at least as
    it is) is that Python 3000 is already going to turn the print statement
    into a function (though the exact details of that function have not been
    hashed out yet). So adding extra cruft to the print statement is kind
    of wasted effort.

    STeVe
    Steven Bethard, Jun 4, 2006
    #3
  4. James J. Besemer a écrit :
    >

    (snip)
    >
    > PEP -- EXTEND PRINT TO EXPAND GENERATORS
    >
    > NUTSHELL
    >
    > I propose that we extend the semantics of "print" such that if the
    > object to be printed is a generator then print would iterate over the
    > resulting sequence of sub-objects and recursively print each of the
    > items in order.
    >


    Please, don't:

    from itertools import cycle
    def mygen():
    return cycle('this is a very bad idea'.split())
    Bruno Desthuilliers, Jun 5, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Michael Attenborough
    Replies:
    22
    Views:
    2,268
    Mike Treseler
    Mar 13, 2006
  2. Francis Avila
    Replies:
    9
    Views:
    355
    Francis Avila
    Nov 20, 2003
  3. Christoph Becker-Freyseng

    PEP for new modules (I read PEP 2)

    Christoph Becker-Freyseng, Jan 15, 2004, in forum: Python
    Replies:
    3
    Views:
    368
    Gerrit Holl
    Jan 16, 2004
  4. Raymond Hettinger
    Replies:
    8
    Views:
    308
    Daniel 'Dang' Griffith
    Apr 21, 2004
  5. Paul Rubin

    proposed PEP: iterator splicing

    Paul Rubin, Apr 15, 2007, in forum: Python
    Replies:
    8
    Views:
    274
    Georg Brandl
    Apr 15, 2007
Loading...

Share This Page