Suggesting a new feature - "Inverse Generators"

Discussion in 'Python' started by Jordan Rastrick, Mar 25, 2005.

  1. First, a disclaimer. I am a second year Maths and Computer Science
    undergraduate, and this is my first time ever on Usenet (I guess I'm
    part of the http generation). On top of that, I have been using Python
    for a grand total of about a fortnight now. Hence, I apologise if what
    follows is a stupid suggestion, or if its already been made somewhere
    else, or if this is not the appropriate place to make it, etc. But I
    did honestly do some background checking through the Python
    documentation, www.python.org, FAQs, and so forth before making this
    post. Basically what I'm trying to say here is, please don't flame this
    post :)

    Anyway, I have a suggestion to make for a rather radical new feature in
    Python. Rather than just struggling to explain and justify it straight
    out, I'll give the background of how I thought up the feature - and how
    I worked through the ideas I came up with. The example may seem a bit
    trivial to start, but bear with me, because I think theres at least
    some possibility the suggestion has merit.

    I am writing a project in Python atm, and one thing it needs to do is
    parse a very simple text file to get a list of records. Its a
    straightforward kind of thing that I (and I'm sure you) have done many
    times before in many languages.

    Each bit of interesting data in the file occurs over 3 or 4 lines of
    text. On top of this there are lines of random uninteresting junk as
    well.

    I decided to write a generator to filter out the junk lines, strip
    excess whitespace and the like (generators are one of my favourite
    features in Python). So the main body of code looks like this:

    def filterJunk(lines):
    for line in lines:
    # Do filtering & cleaning up stuff on line.
    # ...
    yield line

    def getData(filename):
    recordList = []
    input = file(filename)
    lines = line.readlines()
    input.close()
    for line in filterJunk(lines):
    # ... create records and add to recordList

    So far, so good. Next comes the business of actually creating the
    records from the lines of interesting text.

    You can probably see the problem I ran into - I need to look at 3 or 4
    lines at a time to create the record. Theres no easy way out of this
    requirement, because whether the record is contained in 3 or 4 lines is
    conditional on the value of the second line. But the for loop isn't
    going to allow me to do that.
    So I rewrote the for loop above as:

    while True:
    try:
    # ....do stuff repeatedly with lines.next() to add to recordList[]
    except StopIteration:
    pass

    This works. (And yes, there are also a multitude of other reasonably
    simple ways to approach this problem, some of which I've used in this
    past. And probably some fancy ways involving map and reduce as well.)

    But it seems ugly to me somehow. And I'm suspicious of anything that
    hints of ugliness (in Computer Science, anyway.) I suppose that
    Python's getting me into the habit of assuming theres always an elegant
    looking solution.

    I tried, and failed, to think of some sort of Generator-based concept
    (I really do love generators! So simple, but so useful!) that would
    allow me to retain a for loop. I guess that, essentially, I feel this
    task is really just iterating over the lines, and that iterating over
    something in Python is a job for a for loop. Generators allow you to
    use for loops all regular iteration. So I wanted to have something
    "vaugely Generator like", that would allow me to get back to using a
    for loop.

    It occured to me - what if I had some sort of "inverse generator" - a
    function with a resumable call mechanism that retained state, but that
    would resume when accepting input rather than returning output. For
    want of a better term, call it an Acceptor.

    If you think of a Generator as a method acting as (an abstraction of) a
    source of sequential data, then the Acceptor correspondingly is a sink.


    I started playing round with the Acceptor idea, thinking about how it
    might work. I imagined a simple mirroring of Generator syntax - using a
    new keyword, accept, to take in a value, just as Generator's use yield
    to return a value. Here's my first attempt at 'coding' my precious for
    loop using an imaginary Acceptor:

    def combineIntoRecord(): # This is an acceptor function
    optionalline = None # We may not get given a value for this line
    accept firstline
    accept secondline
    if condition(secondline):
    accept optionalline
    accept lastline
    r = createRecord(firstline, secondline, lastline, optionalline)
    return r

    recordlist = []
    for line in lines:
    recordlist.append(combineIntoRecord(line))

    I've since thought about it somewhat, and its clear I got this first
    try just plain wrong. Its got a number of hideous defects. First,
    combineIntoRecord() is 'accepting' its values by taking a magical
    argument, line, that does not appear in the argument list.

    Second, in this particular syntax, do
    combineIntoRecord(x) at line 50
    and
    combineIntoRecord(y) at line 200
    want x and y to be combined into the same record, using the old state,
    or is the second trying to start a "new call" to the method? For an
    extreme example, imagine calls made in different modules. Basically,
    the jump into combineIntoRecord code needs to somehow specify an
    'instance' of the function, with a given existing state. The method
    call needs to be bound to something, somehow.

    Worst of all, though, since combineIntoRecord(line) doesn't always
    return a value, how does the interpreter know when to call append? It
    seems the 'magic' of the acceptor is spilling out into the surrounding
    code, making it impossible to understand or intelligently parse. The
    magicness needs to be contained within the acceptor itself.

    How to resolve these issues?
    Well, thinking about how generators achieve their magic-like effects:
    - A generator, despite being a function, doesn't return a value in
    the usual sense.
    - Calls to a generator don't result in a value, they result in an
    Iterator object.
    - The magic happens at the 'yield' keyword.
    So, analguously:
    - Perhaps call to combineIntoRecord() shouldn't return a value
    - It should return an acceptor object instance instead
    - Every call results in a different object - solves problem 2 above
    - Acceptor objects have a special method for feeding them values,
    takes care of problem 1 above
    BUT:
    - When and what does an acceptor object 'return'?
    - We don't actually want to restrict what an acceptor *function*
    returns.
    - Acceptors (unlike generators) should be able to return just as per
    ordinary functions
    - Its not the output of the function thats 'magic', its the input to
    the function
    Thus:
    - We must confine the magic to the input the function recieves - its
    argument list.
    - Have a 'magic argument' that 'takes in' the accepted values.
    - This is fine - it keeps the magic within the call of the acceptor
    function
    - If an acceptor returns like an ordinary function, it needs to
    return when called
    - Hence it needs to know its accepted values "all at once"

    >From our earlier thinking, an acceptor, fundamentally, accepts a

    sequence of data.

    So let the first argument to an acceptor function call be a "magic
    sequence" that contains all the values we want to be accepted. Other
    arguments may be passed as normal.

    A disadvantage: the power of the acceptor is restricted in this
    version, since you can't have elements Accepted at any arbitrary point
    in the code. This is really a good thing, though - with the old version
    it was the case that too much power was making the idea totally
    unworkable. What we've done is essentially bound our Acceptor to
    something, namely a sequence. Note, though, that we can use any
    abstract source of sequential data - including a Generator - so theres
    still quite a lot of flexibility in how it gets data for accepting.

    To illustrate this model of an Acceptor, heres an example of a simple
    Acceptor that returns the first element of a sequence that satisfies
    some condtion (which is presumed to be a function):

    def firstThatSatisfies(condition):
    accept element
    if condition(element): return element

    To use, we simply call:

    first = firstThatSatisfies(somelist, condition)

    somelist, not appearing in the argument list, gets fed into the accept
    statement by the 'acceptor magic' .

    Note how explicit iteration has been removed from this canonical
    algorithm altogether. I dont know about you, but I think it looks kind
    of neat.

    A Slight Digression:

    Clearly it might be true that condtion(element) == 0 for every element
    in some list. This brings us to the issue of an accept statment that
    doesnt get fed any data.

    There are two possible ways to deal with this. One is to insist every
    accept statement must get called, and raise an exception if this doesnt
    happen. However, I prefer the seemingly more flexible option where you
    simply say "Bad luck, once the magic sequece is empty, any remaining
    acceptor code doesnt get executed, and any acceptor state is lost."
    This means an acceptor might not always give a well defined return
    value, but this is no worse than something like

    def f()
    x = someCalculation()
    bool = someConditionThatWillAlwaysBeFalse()
    if false: return x

    This conventional example has a return value of None, and an Acceptor
    that doesnt reach an explict return statement can be treated exactly
    the same way.

    Digression

    Now, going back to my original problem and the first Acceptor I wrote:

    def combineIntoRecord(): # This is an acceptor function
    optionalline = None # We may not get given a value for this line
    accept firstline
    accept secondline
    if condition(secondline):
    accept optionalline
    accept lastline
    r = createRecord(firstline, secondline, lastline, optionalline)
    return r

    This doesnt work anymore - it produces and returns the first record we
    want, and then stops. We want a list of all records .

    Maybe we could use a loop.The 'accepting magic' would need to be
    confied to this loop. And the loop needs to know when the magic stream
    of data comes to an end to know when to return the final value. Maybe
    use another special keyword:

    def combineIntoRecord():
    recordList = [].
    while acceptingonly
    # ....accept values, append createRecord to recordLlist


    Perhaps. I don't really like the idea of having any magic new keywords
    other than accept itself; if it needs this sort of explicit while loop,
    maybe it shouldnt be in an Acceptor at all (the whole original point of
    this idea was to get rid of while loop after all!) **

    So are Acceptors pretty useless, after all that? Maybe. Then again,
    maybe not.

    What is this combineIntoRecord(), that I have written using
    non-existent language features, trying to do, in a general sense?

    It wants to take some sequence (the lines in a file) and produce
    another sequence, that in some way depends first. Howevers it wants to
    be able define basically any arbitary dependence between the sequences,
    taking the elements from the first and producing output to the second
    at its leisure.

    The 'taking at its leisure' comes by being virtue of an Acceptor.
    'Producing at its leisure' is a feature already available in Python.
    Its called a Generator.

    Behold:

    # An Acceptor/Generator!!!
    def combineIntoRecords():
    optionalline = None # We may not get given a value for this line
    accept firstline
    accept secondline
    if condition(secondline):
    accept optionalline
    accept lastline
    yield createRecord(firstline, secondline, optionalline,
    lastline)

    def getData(lines):
    return list(combineIntoRecords(filterJunk(lines)))

    So, alas, my beloved for loop was not saved after all.

    Fortunately, the dreaded While loop has been vanquished. In fact all
    loopiness has disappeared. Along with any other semblance of a main
    method.

    I think I like the finished result. I'll leave if for you to decide if
    you like it too.
    Jordan Rastrick, Mar 25, 2005
    #1
    1. Advertising

  2. Sorry about the mangled formatting... like i said, first time on Usenet

    Suggestions, comments, replies, etc most welcome.

    This definitely includes replies of the form: "This is stupid,
    because..."
    provided it isnt followed with "youre a jerk who knows nothing.
    Period."

    Heres a follow up rant in the form of a Q&A, trying to answer what I
    think are likely or interesting questions:

    ----------------------------------------------------

    Q: Is this idea even remotely feasible to actually implement?

    A: No Idea

    Q: Are you volunteering to implement it?

    A: Not really. I'd have no clue where to start. I probably don't even
    have enough C experience to begin to understand the Python source. But
    if some real Python dev (or devs)actually wants to implement this, and
    is willing to humour an undergraduate by somehow utilising assistance
    in form a very limited set of programming skills, well, then I'd be
    highly flattered and more than happy to help :)

    Q: The whole document is full of half-formed ideas, hundredth-formed
    ideas,
    mistakes(conceptual/typograhpcial/stylistic/grammatical/terminological/programming/...),
    inconsistencies, random nonsense......

    A: Firstly, thats not a question, its a statment.

    Secondly, see the very first paragraph.

    Thirdly, the whole document, from initial attempts to code the original
    for loop (yes, it is a real program I'm trying to write here), to
    thinking of Acceptors, refining the model to something vaguely
    coherent, coming up with examples, and typing it all out, was produced
    on the fly, in the space of 5 hours or so. I attempted to give it a
    semblance of coherent structure at first, but soon gave up. Its
    basically stream of conciousness production. The bit where the first,
    nonsensical example is given, is literally where I first tried to
    forumlate the idea in code, saw the massive holes, stopped, thought
    about it for a few minutes, and came backed and typed out the much
    improved second version. The whole thing is totally unedited.Oh, and I
    had virtually no sleep last night.

    A real answer, with respect to the issue of terminology: If I use
    technical terms loosely, e.g., if I improperly use 'method' for
    'function' or vice versa (a hang up from dealing with languages that
    only have one e.g. Java, or the other, e.g. C) , please forgive me. I
    hope what I'm trying to say is still clear enough.

    And as for coding style, Python itself is inconsistent (or to be more
    diplomatic, not obsessively concerned with the relatively minor issue
    of) using underscore_seperated_words or mixedCaseInstead, etc. And
    again, I cry the nefarious influence of Java....

    Q: Did you steal this from somewhere else?

    A: No, its entirely something I thought of, by myself, in the last 5
    hours. Thats not to say something like it (or more likely, better than
    it) hasnt been thought of somewhere else by someone else. But if it
    has, I havent seen it.

    To give credit where due though: I was motivated by the various
    examples, from sources like www.python.org, of how Generators can be
    used to do things really elegantly. Also, I'm still tossing around
    ideas in the back of my mind from a great talk given at my university
    by Rob Pike, of Google, about a language he helped devise and implement
    called sawzall (or something close). Its used to do processing with
    Google's massive data sets and highly parallel computing architecture,
    yet its incredibly simple to understand and very efficient. I guess it
    was only tenuously related to the idea presented here. But it was
    certainly very inspiring in a general computer-science-enthusiasm kind
    of way.

    Q: Why did you go to all this trouble thinking up this feature when you
    could have solved the original problem in so many other, simpler ways?

    A: I honestly can't give a decent justification :) Whenever I've
    needed to do things like:

    line = readline from file
    if (line is some sort of 'extra' data)
    do something with line
    line = readline from file # get the 'regular' data
    ....

    in the past, in Pascal, Java and the like, it always irked me. Don't
    ask why, it just did.

    I thought I could get around it with generator in Python, but I
    couldnt. In trying to think of a generator solution, I more or less
    randomly thought of extending the generator concept somehow, and I
    decided to see where it took me. I thought it took me to somewhere
    interesting, and possibly worthy of further investigation.

    Q: What about using some sort of 'sequential data sink' that doesn't
    use 'pretending to be a function' syntax, e.g. a simple Class of some
    sort?

    A: Maybe. You can have Iterators without Generators. Maybe you can have
    "inverse Iterators" without "inverse Generators", I havent thought
    about it.
    I love the 'pretending to be a function' syntax, its part of what makes
    Generators so elegant. So I tried to emulate it.

    Q: What could be some of the benefits of introducing this feature? What
    could it be used for?

    A: Again, I don't really know. Past the examples I've given, I havent
    really thought about possible applications.

    Potentially there might be efficiency gains using an Acceptor in place
    of a regular function that just processes a sequence, in the same way
    that using a Generator can be more efficient than using an actual
    sequence like a list.

    Then again, there might not.

    Q: ....So you've put forward this idea without really stopping to think
    if its actually useful?

    A: The idea seems *potentially* powerful and useful to me. Once more,
    for emphasis - I'm most definitely NOT trying to assert this is a great
    idea. I'm saying that maybe its a good idea, so lets put it on the
    table. That way, if it is really bad, lots of really smart people can
    shoot it down really quickly and save me wasting any more time thinking
    about it.

    Q: Maybe you should have put a bit more personal though/research into
    the idea before putting it out into the public domain? Or at least
    waited until you had more experience in Python?

    A: I'm not an academic or a professional, I don't have any reputation
    worth preserving. I tried not to suggest this in a forum where it'd
    distract the people who are responsible for developing Python from
    doing their very important work. Anyone could choose to ignore the post
    if they so desired - I havent wasted anyone's time, unless they've
    conciously chosen to waste it. And you read far enough to get to this
    question, so I assume you were at least vaguely interested.

    Q: Why not make a smaller, more tenuous suggestion, get some feedback,
    and go from there, rather than making one big ranting post?

    A: I suppose I just wanted to get all the ideas out while they were
    stll in my head. Its the kind of chaotically creative approach I often
    apply (though not intentionally) to programming, with results that are
    occasionally great and often disastrous.

    Q: In Python, there were list comprehensions, and then generator
    comprehensions. So what about Acceptor comprehensions?

    A: Yeah, sure, maybe. It sounds like a cool idea at least. Havent
    thought about it.

    Q: Why did you choose this particular syntax and naming?

    A: I just used what seemed the most obvious way to do things. I'm sure
    better versions could be thought of.

    Some advantages: its simple. The use of accept parallels the use of
    yield in generators.
    Its, arguably, pretty intutiitive.

    Disadvantages: It could be a bit too verbose, having to type accept for
    each magic argument. Perhaps an abbreviated sytax might be good

    e.g. accept arg1, arg2
    in place of
    accept arg1
    accept arg2

    so long as it didn't conflict/confuse with existing syntax, for tuple
    assigment, etc.

    Another possible variant: the 'magic sequence' could be explicitly
    named in an Acceptor's argument list. I kind of like the way the
    Acceptor is 'bound' to its magic sequence by the call, with the
    sequence itself disappering from the argument list... its sort of
    vaguely similar to how x magically appears in the argument list when
    making the method call x.someMethod(otherArgs)

    Q: Why do you keep using the word magic everywhere?

    A: "Any sufficiently advanced techonology is indistinguishable from
    magic."
    Seems like a good enough term for a language feature that I can
    conceive, and can think in a limited fashion about some of its effects,
    but have no real understanding of its possible mechanisms. Or just for
    stuff that seems cool, powerful and slightly mysterious. :) Besides,
    once I'd started using it, I think it helped to be consistent.

    Q: Can't you already achieve this precise effect very easily with
    existing syntax (filter, list comprehensions, simple recursion,
    something)?

    A: Maybe, I don't know. If you can think of how, I'd like to hear it.

    Q: What about backwards compatibility issues?

    A: The only obvious one I can think of is the use of accept as a
    keyword, which would conflict with any variables called accept. Im sure
    one exists in some code somewhere. Hopefully, not too much code would
    be broken. Possibly a different syntax wouldnt even have to use a new
    keyword, or could use one that wouldnt cause much disruption to code.

    I imagine the issue would not be significantly worse than the
    introduction of yield for generators was. Or any other new language
    feature for that matter.

    Note, this is one of the reasons I'd be reluctant to see other special
    keywords/functions aside from accept used in the mechanism.

    Q: Why don't you want Acceptors to be able to know when the magic
    sequence ends?

    A: I don't know, I just don't like it. My intuition is you'd end up
    writing loops in acceptor functions that could have just been written
    in more or less exactly the same way without an acceptor, instead of
    writing Acceptor focused code. There's already an implict loop
    iterating through the sequence. Adding an additional loop with
    something like
    while accepting:
    accept ....
    is complicating things. Note neither of the two finished examples had
    any loops in them at all.

    As usual, I could be completely wrong.

    A possible compromise: Have an optional finally block at the end of the
    method, that executes when the magic sequence runs out of elements.
    This could serve much the same function as while accepting: without
    reintroducing banished loopiness. And maybe a similar thing, but prior
    to accepting. A 'first' block. So:

    def combineIntoRecord(): # This is an acceptor function
    first:
    recordList = []
    optionalline = None # We may not get given a value for this line
    accept firstline
    accept secondline
    if condition(secondline):
    accept optionalline
    accept lastline
    recordList.append(createRecord(firstline, secondline, lastline,
    optionalline))
    finally:
    return recordList

    Of course, thats another new keyword, plus an existing one
    overloaded.... again, I'm not a big fan.....

    Basically, I can summarise my views on what should and shouldn't be
    included in the mechanism thusly:
    Iwant to be able to 'accept' in an arbitrarily (or at least very)
    complicated way within an Acceptor. I reckon that is what would give
    them their power, and could lead to cool applications.

    But conversely I don't want Acceptors to 'need to do everything',
    burdening them with extra syntax to allow them to behave like regular
    functions. They're not intended as a super technique that solves every
    problem, they're intended to find and fill a particular programming
    niche. Better that niche be small, but Acceptors be a perfect fit for
    it - obviously the right answer for some problems, and equally,
    obviously the wrong answer for others.

    For comparison, think about lambda functions in Python allowed to
    contain arbitrary sequences of statements, not just single expressions.
    Anything that can't fit easily in a lambda, doesnt belong there. The
    same goes for Acceptors.

    Most imporant of all, the motivation of the whole exercise is to make
    things simpler and more elegant, not more complex and ugly.

    Q: Isn't it just perverse to want to code algorithms that are
    inherently suited to looping in syntax that does not explicitly loop?

    A: Perhaps. The lack of looping wasn't actually how I originally
    envisioned the final result.

    But you need to think long and hard about what is meant by an
    "algorithm inherently suited to looping." Try using a purely functional
    language (e.g. Haskell) and you might see what I mean.

    As far as my limited understanding of these things go, from a
    theoretical viewpoint, looping is actually only a special case of
    recursion. Its just that in structured, low-level, imperative
    languages, looping is the standard technique for solving a certain
    class of problems. So if like most people you learn to program in such
    a language, you'll naturally tend to think 'loops.' Hence the issue is
    really something inherent to your way of thinking about programs, and
    not to the actual problem.

    Q: You don't need to use yield, or a loop, or first/finally blocks, to
    make the final example work. Just pass in an empty list, and append it
    at the end of the method; return nothing.

    A: Yeah, I just now realised that. In fact possibly all the benefits of
    both the while Accepting and the first/finally ideas could be realised
    by preparing and passing in any nessecary intial state, and using
    callback on the passed objects instead of return. That way of
    implementing things seems cleaner and more in line with the 'Acceptor
    feel', in fact. This is definitely a line of thought that needs to be
    explored further....

    For what its worth, conceptually, that solution pretty much works the
    same as using yield. But I like the idea of an Acceptor/Generator, so
    I'll let it stand in its own right. Needs a better name name though :)

    Q: What's the project you were working on when you thought of this?

    A: Nothing particularily interesting or important ;-)

    Q: Have you got anything more to add?

    A: Yeah, I'm tired, and going to bed now.

    I'll be interested to see if this actually generates a substantial
    discussion of any sort.

    And I do apologise for the length of this rant. I hope that by reading
    it, you've at the very least gained some sort of amusement and sense of
    overwhelming superiority in light of my outrageously stupid
    thoughts.....
    Jordan Rastrick, Mar 25, 2005
    #2
    1. Advertising

  3. "Jordan Rastrick" <> wrote in message
    news:...

    > def combineIntoRecord(): # This is an acceptor function
    > optionalline = None # We may not get given a value for this line
    > accept firstline
    > accept secondline
    > if condition(secondline):
    > accept optionalline
    > accept lastline
    > r = createRecord(firstline, secondline, lastline, optionalline)
    > return r


    > recordlist = []
    > for line in lines:
    > recordlist.append(combineIntoRecord(line))


    How about doing it this way?

    class Acceptor:
    def __init__(self, gen):
    self.gen = gen
    def next(self):
    firstline = self.gen.next()
    secondline = self.gen.next()
    if condition(secondline):
    optionalline = self.gen.next()
    accept lastline
    r = createRecord(firstline, secondline, lastline, optinalline)
    return r

    This is just a generator done longhand. If anything in Acceptor.next raises
    StopIteration, so will Acceptor.next itself. Which means that you can now
    write this:

    for r in Acceptor(line):
    recordlist.append(r)

    or, for that matter,

    recordlist = list(Acceptor(line))

    Incidentally, I did not try to fix the bug in your code that if
    condition(secondline) is false, optionalline never gets set so the program
    will crash :)
    Andrew Koenig, Mar 25, 2005
    #3
  4. Jordan Rastrick

    Tim Hochberg Guest

    Jordan Rastrick wrote:

    [CHOP]

    > Behold:
    >
    > # An Acceptor/Generator!!!
    > def combineIntoRecords():
    > optionalline = None # We may not get given a value for this line
    > accept firstline
    > accept secondline
    > if condition(secondline):
    > accept optionalline
    > accept lastline
    > yield createRecord(firstline, secondline, optionalline,
    > lastline)


    That's a nice example, but why not:

    def combineIntoRecords(iterable):
    iterator = iter(iterable)
    optionalline = None
    firstline = iterator.next()
    secondline = iterator.next()
    if condition(secondline):
    optionalline = iterator.next()
    lastline = iterator.next()
    yield createRecord(firstline, secondline, optionalline, lastline)

    ???

    -tim

    >
    > def getData(lines):
    > return list(combineIntoRecords(filterJunk(lines)))
    >
    > So, alas, my beloved for loop was not saved after all.
    >
    > Fortunately, the dreaded While loop has been vanquished. In fact all
    > loopiness has disappeared. Along with any other semblance of a main
    > method.
    >
    > I think I like the finished result. I'll leave if for you to decide if
    > you like it too.
    >
    Tim Hochberg, Mar 25, 2005
    #4
  5. Thanks for the very fast feedback :)

    I specifically set optionalline = None to deal with that bug you
    mentioned, with the implicit assumption createRecord knows how to deal
    with a None argument. If that guard got destroyed in the copy paste
    process, my bad.

    As for you solution, yes, you could do it that way.

    But I'm not so much interested in alternate solutions to the problem
    itself, which is to be honest trivial. I'm intereseted in the
    implications of the imaginary solution of the Acceptor function.

    Just as you can have Iterators (an abstract class based solution to
    iteration) without Generators, yet Generators are still of interest, so
    you could solve this problem without Acceptors, yet Acceptors are
    (potentially) of interest.

    Basically, an Acceptor is linked to a Generator (or some other
    sequence) in the way you have linked the class implementation to a
    Generator. So this is a kind of implementation of an Acceptor using a
    class. I like the Acceptor syntax, though. And I'm wondering if maybe
    there are more complex examples that are harder or impossible to do
    your way...

    I think I already put something (briefer) in the Q&A bit along these
    lines....
    Jordan Rastrick, Mar 25, 2005
    #5
  6. Tim Hochberg wrote:
    > Jordan Rastrick wrote:
    >


    itertools.groupby enables you to do this, you just need to define a suitable
    grouping function, that stores its state:

    For example, if short lines should be appended to the previous line:

    from itertools import groupby
    linesource = """\
    Here is a long line, long line, long line
    and this is short
    and this is short
    Here is a long line, long line, long line
    and this is short""".splitlines()

    def record(item, seq = [0]):
    if len(item) > 20:
    seq[0] +=1
    return seq[0]


    >>> for groupnum, lines in groupby(linesource, record):

    ... print "".join(lines)
    ...
    Here is a long line, long line, long lineand this is shortand this is short
    Here is a long line, long line, long lineand this is short
    >>>


    Michael
    Michael Spencer, Mar 25, 2005
    #6
  7. Yes, granted.

    This is basically the same as Andrew's reply, except with Iterators in
    place of generators, so I'll let my answer to that stand. In fact, its
    my solution, but with iter.next() in place of accept :)

    This is probably something like how I wanted to solve the problem when
    I first was looking at it and wanting a generator based solution. I
    ended up thinking of this whole Acceptor business as part of getting me
    to this point.

    I'll try to reduce my pages of ranting to a single question. In
    posting, i was wondering if the "syntactic sugar" (Acceptors) that i
    invented to implement the solution is of any general interest. So are
    there maybe examples less straightforward than this one, where
    Acceptors work better? Or can you always just "turn the generator
    inside out" in the way you have done here?

    If you can always do it your way, well, thats a powerful design
    pattern, and it just goes to show my faith in Generators was justified
    :) And that I wasnt thinking hard /clearly enough about how to use
    them.

    There are other issues, like "Does the Acceptor syntax, although
    perhaps functionally equivalent to other methods, ever make the code
    more readable, easier to parse, etc?" But they're a lot less important
    i'd say.
    Jordan Rastrick, Mar 25, 2005
    #7
  8. Wow, if I'm going to get replies (with implemented solutions!) this
    quickly, I'll post here more often :)

    This is the most different to my solution, and also the shortest, and
    therefore the most interesting, reply so far. Its also the last one
    I'll reply to before I go to bed. zzzz....

    Its taken me a while to get a rough understanding of this code, but I
    think I have some idea. Correct me if I'm wrong.

    groupby groups based on value of line(record)

    Record returns 1 for the first line, 1 of the second, 1 for the 3rd,
    then 2 for the 4th because seq[0] gets incremented since len(line) > 20

    OK thats fair enough. But how does record retain state between calls?
    How is that related to the fact your storing your value as a singleton
    list, instead just an int?

    It seems a little confusing to be honest, probably mainly due to my
    unfamiliarity with groupby. Retaining state between method calls is
    part of what interests me so much about the Generator/ Acceptor case.
    Here youre retaining state between calls with none of the special
    syntax used for example in Generators.

    How? Is it a side effect of the way groupby uses record? If so, then
    thats a littleoblique and unreadable for my liking.

    Is the state the record returns passed back to it somehow? I take it
    gets passed a value for seq at some point, seeing as how you've
    bothered to define it as a default argument rather than just seq = [0]
    on the first line.

    That works, but at the cost of having to return and pass all of state
    every call. I imagine other solutions (Generator/Acceptor based etc)
    would be substanitally more efficient. And again more readable, to a
    Python beginner such as myself at least.

    Still, this is fascinating.... going to have to spend some time
    experimenting with groupby as soon as I get a chance....
    Jordan Rastrick, Mar 25, 2005
    #8
  9. >
    > I'll try to reduce my pages of ranting to a single question. In
    > posting, i was wondering if the "syntactic sugar" (Acceptors) that i
    > invented to implement the solution is of any general interest. So are
    > there maybe examples less straightforward than this one, where
    > Acceptors work better? Or can you always just "turn the generator
    > inside out" in the way you have done here?
    >
    > If you can always do it your way, well, thats a powerful design
    > pattern, and it just goes to show my faith in Generators was justified
    > :) And that I wasnt thinking hard /clearly enough about how to use
    > them.
    >
    > There are other issues, like "Does the Acceptor syntax, although
    > perhaps functionally equivalent to other methods, ever make the code
    > more readable, easier to parse, etc?" But they're a lot less important
    > i'd say.



    To me your acceptors look like micro-threads or similar concepts that have
    been popped up here every now and then - but so far it seems they didn't
    end up beeing included. I can't say for what reasons though.

    Just yesterday I looked into stackless python to grasp what it does, and
    while it is not syntactically different to standard python, it seems to
    make coding the way you intend to do possible.


    --
    Regards,

    Diez B. Roggisch
    Diez B. Roggisch, Mar 25, 2005
    #9
  10. Jordan Rastrick wrote:
    > Wow, if I'm going to get replies (with implemented solutions!) this
    > quickly, I'll post here more often :)


    That is indeed typical of this most attentive group :)
    >
    > Its taken me a while to get a rough understanding of this code, but I
    > think I have some idea.

    It is just an example jotted in 2 min - no doubt it could be made clearer.

    > Correct me if I'm wrong.
    > groupby groups based on value of line(record)
    >

    No, groupby, groups on the value of record(item), where item is given by
    iterating over linesource

    You should check the itertools documentation:
    http://docs.python.org/lib/itertools-functions.html

    > Record returns 1 for the first line, 1 of the second, 1 for the 3rd,
    > then 2 for the 4th because seq[0] gets incremented since len(line) > 20


    In this case, it doesn't matter what record returns, as long as it is equal for
    successive values of item that should be grouped
    >
    > OK thats fair enough. But how does record retain state between calls?
    > How is that related to the fact your storing your value as a singleton
    > list, instead just an int?


    You are asking about the fundamental behavior of Python: argument passing,
    mutable objects and scopes.
    >
    > It seems a little confusing to be honest, probably mainly due to my
    > unfamiliarity with groupby. Retaining state between method calls is
    > part of what interests me so much about the Generator/ Acceptor case.
    > Here youre retaining state between calls with none of the special
    > syntax used for example in Generators.
    >
    > How? Is it a side effect of the way groupby uses record? If so, then
    > thats a littleoblique and unreadable for my liking.


    No, it's nothing special about groupby. record simply stores its state in a
    mutable default parameter. This isn't general good practice: at least you have
    to be careful with it. You can see the behavior in the following example:
    >>> def accumulate(value, accum = []):

    ... accum.append(value)
    ... return accum
    ...
    >>> accumulate(1)

    [1]
    >>> accumulate(2)

    [1, 2]
    >>> accumulate(6)

    [1, 2, 6]
    >>>


    ....
    >
    > Still, this is fascinating.... going to have to spend some time
    > experimenting with groupby as soon as I get a chance....
    >

    Experimenting is good. So is the the documentation:
    http://docs.python.org/tut/tut.html

    Michael
    Michael Spencer, Mar 25, 2005
    #10
  11. Jordan Rastrick

    Terry Reedy Guest

    Terminology: To me and some (most?) others posting here and, I believe,
    both the docs and the common meaning of 'generator', a Python generator is
    the particular kind of iterator that produces multiple values on request
    and which is created by the generator function that you write.

    Acceptors (consumers): The concept is known to both CS and Python
    developers. According to Tim Peters, Python generators constitute
    'semi-coroutines'. Any full coroutine mechanism (Stackless, for instance)
    will allow the reverse (not inverse, which would undo rather than
    complement). For various reasons, the Python developers choose that data
    chains should be written as consumer code getting data from a generator,
    which might in turn get data from a generator.

    In your example, opening and reading the lines of the data file could be
    done by filterjunk, not the main function, but I can see reasons both ways.

    For your specific problem, you could, I believe, use an intermediate
    generator (which could also be combined with filterjunk) that combines
    lines into complete text records. Something like (ignoring any fussy
    details left out):

    def textrecord(file):
    trlines = []
    for line in filterjunk(file):
    trlines.append(line)
    if complete(trline): yield ''.join(trlines)
    # where 'complete' is code to determine if have all lines in record or
    not

    Terry J. Reedy
    Terry Reedy, Mar 25, 2005
    #11
  12. Jordan Rastrick

    Serge Orlov Guest

    Michael Spencer wrote:
    > > Still, this is fascinating.... going to have to spend some time
    > > experimenting with groupby as soon as I get a chance....
    > >

    > Experimenting is good. So is the the documentation:
    > http://docs.python.org/tut/tut.html


    Reading documentation is a good idea, but I think your example would
    be more clear for Jordan if you used function attributes:

    def record(item):
    if len(item) > 20:
    record.seq +=1
    return record.seq
    record.seq = 0

    Serge.
    Serge Orlov, Mar 25, 2005
    #12
  13. Michael Spencer wrote:
    > itertools.groupby enables you to do this, you just need to define a
    > suitable grouping function, that stores its state:


    Michael, this would make a great Python Cookbook Recipe.

    --Scott David Daniels
    Scott David Daniels, Mar 25, 2005
    #13
  14. On Fri, 25 Mar 2005 08:46:12 -0800, Michael Spencer <> wrote:

    >Tim Hochberg wrote:
    >> Jordan Rastrick wrote:
    >>

    >
    >itertools.groupby enables you to do this, you just need to define a suitable
    >grouping function, that stores its state:
    >
    >For example, if short lines should be appended to the previous line:
    >
    >from itertools import groupby
    >linesource = """\
    >Here is a long line, long line, long line
    >and this is short
    >and this is short
    >Here is a long line, long line, long line
    >and this is short""".splitlines()
    >
    >def record(item, seq = [0]):
    > if len(item) > 20:
    > seq[0] +=1
    > return seq[0]
    >
    >
    > >>> for groupnum, lines in groupby(linesource, record):

    > ... print "".join(lines)
    > ...
    > Here is a long line, long line, long lineand this is shortand this is short
    > Here is a long line, long line, long lineand this is short
    > >>>

    Nice, but I think "record" is a bit opaque semantically.
    How about group_id or generate_incrementing_unique_id_for_each_group_to_group_by or such?

    Regards,
    Bengt Richter
    Bengt Richter, Mar 25, 2005
    #14
  15. Scott David Daniels wrote:
    > Michael Spencer wrote:
    >
    >> itertools.groupby enables you to do this, you just need to define a
    >> suitable grouping function, that stores its state:

    >
    >
    > Michael, this would make a great Python Cookbook Recipe.
    >

    OK, will do. What would you call it? Something like: "Stateful grouping of
    iterable items"

    [Bengt]:
    > Nice, but I think "record" is a bit opaque semantically.
    > How about group_id or generate_incrementing_unique_id_for_each_group_to_group_by or such?
    >
    > Regards,
    > Bengt Richter


    Agreed, it's an issue. I think the most natural name is groupby - but that
    would cause more trouble. What do you think about 'grouping' ?
    I would use 'generate_incrementing_unique_id_for_each_group_to_group_by', but
    then people might think I'm trying to outdo Bob Ippolito :)

    [Serge]:
    > I think your example would
    > be more clear for Jordan if you used function attributes:
    >
    > def record(item):
    > if len(item) > 20:
    > record.seq +=1
    > return record.seq
    > record.seq = 0


    That does read better than the mutable default argument hack. Is this use of
    function attributes generally encouraged? (I tend to think of func_dict for
    meta-data, used only outside the function) Thoughts?

    Michael
    Michael Spencer, Mar 25, 2005
    #15
  16. Michael Spencer wrote:
    > Scott David Daniels wrote:
    >> Michael Spencer wrote:

    > OK, will do. What would you call it? Something like: "Stateful
    > grouping of iterable items"


    How about "Using groupby to fill lines"?

    --Scott David Daniels
    Scott David Daniels, Mar 25, 2005
    #16
  17. On Fri, 25 Mar 2005 12:07:23 -0800, Michael Spencer <> wrote:

    >Scott David Daniels wrote:
    >> Michael Spencer wrote:
    >>
    >>> itertools.groupby enables you to do this, you just need to define a
    >>> suitable grouping function, that stores its state:

    >>
    >>
    >> Michael, this would make a great Python Cookbook Recipe.
    >>

    >OK, will do. What would you call it? Something like: "Stateful grouping of
    >iterable items"
    >
    >[Bengt]:
    >> Nice, but I think "record" is a bit opaque semantically.
    >> How about group_id or generate_incrementing_unique_id_for_each_group_to_group_by or such?
    >>
    >> Regards,
    >> Bengt Richter

    >
    >Agreed, it's an issue. I think the most natural name is groupby - but that
    >would cause more trouble. What do you think about 'grouping' ?
    >I would use 'generate_incrementing_unique_id_for_each_group_to_group_by', but
    >then people might think I'm trying to outdo Bob Ippolito :)
    >
    >[Serge]:
    >> I think your example would
    >> be more clear for Jordan if you used function attributes:
    >>
    >> def record(item):
    >> if len(item) > 20:
    >> record.seq +=1
    >> return record.seq
    >> record.seq = 0

    >
    >That does read better than the mutable default argument hack. Is this use of
    >function attributes generally encouraged? (I tend to think of func_dict for
    >meta-data, used only outside the function) Thoughts?
    >

    Personally, I don't like depending on an externally bound (and rebindable) (usually global)
    name as a substitute for "self." You could always use a class to carry state, e.g. (untested)

    class Grouper(object):
    def __init__(self): self.group_id = 0
    def __call__(self, item):
    if len(item) > 20: self.group_id += 1 # advance id to next group
    return self.group_id
    # ...
    grouper = Grouper()
    # ...
    for groupnum, lines in groupby(linesource, grouper):
    print "".join(lines)

    Or (guess I better actually try this one ;-)

    >>> linesource = """\

    ... Here is a long line, long line, long line
    ... and this is short
    ... and this is short
    ... Here is a long line, long line, long line
    ... and this is short""".splitlines()
    >>>
    >>> for groupnum, lines in groupby(linesource, type('',(),{'n':0, '__call__':lambda s,i: setattr(s,'n', s.n+(i>20)) or s.n})()):

    ... print "".join(lines)
    ...
    Here is a long line, long line, long line
    and this is short
    and this is short
    Here is a long line, long line, long line
    and this is short

    Regards,
    Bengt Richter
    Bengt Richter, Mar 25, 2005
    #17
  18. > No, it's nothing special about groupby. record simply stores its
    state in a
    > mutable default parameter. This isn't general good practice: at

    least you have
    > to be careful with it. You can see the behavior in the following

    example:
    > >>> def accumulate(value, accum = []):

    > ... accum.append(value)
    > ... return accum
    > ...
    > >>> accumulate(1)

    > [1]
    > >>> accumulate(2)

    > [1, 2]
    > >>> accumulate(6)

    > [1, 2, 6]
    > >>>


    Wow.... I'd never seen this kind of thing in examples of Python code.
    Although its really neat, it doesn't really make sense, intuituvely to
    me. Why does accum remember its state - I suppose its to do with the
    scope of arguments (as opposed to method variables) or something like
    that?

    Still, thats powerful. But I see why its not standard use - it could
    be easily abused!
    Jordan Rastrick, Mar 26, 2005
    #18
  19. Jordan Rastrick

    Peter Otten Guest

    Jordan Rastrick wrote:

    >> No, it's nothing special about groupby. record simply stores its

    > state in a
    >> mutable default parameter. This isn't general good practice: at

    > least you have
    >> to be careful with it. You can see the behavior in the following

    > example:
    >> >>> def accumulate(value, accum = []):

    >> ... accum.append(value)
    >> ... return accum
    >> ...
    >> >>> accumulate(1)

    >> [1]
    >> >>> accumulate(2)

    >> [1, 2]
    >> >>> accumulate(6)

    >> [1, 2, 6]
    >> >>>

    >
    > Wow.... I'd never seen this kind of thing in examples of Python code.
    > Although its really neat, it doesn't really make sense, intuituvely to
    > me. Why does accum remember its state - I suppose its to do with the
    > scope of arguments (as opposed to method variables) or something like
    > that?


    Michael's accumulator uses the fact that default arguments are only
    evaluated once -- when the function is created. The behaviour shown above
    is actually a common trap every newbie has to experience once until he
    learns the workaround:

    def accumulate(value, a=None):
    if a is None:
    a = []
    a.append(value)
    return a

    > Still, thats powerful. But I see why its not standard use - it could
    > be easily abused!


    There are limitations, too. If you want more than one accumulator you have
    to pass the accum argument explicitly or wrap accumulate() into a factory:

    def make_accumulator():
    def accumulate(value, a=[]):
    a.append(value)
    return a
    return accumulate

    Sill, you cannot get hold of the result of the accumulation without
    modifying it. One way to fix that:

    >>> def make_accumulator():

    .... a = []
    .... def accumulate(value):
    .... a.append(value)
    .... return a, accumulate
    ....
    >>> items1, accu1 = make_accumulator()
    >>> for i in range(4): accu1(i)

    ....
    >>> items1

    [0, 1, 2, 3]
    >>> items2, accu2 = make_accumulator()
    >>> for i in "abc": accu2(i)

    ....
    >>> items2

    ['a', 'b', 'c']
    >>> items1

    [0, 1, 2, 3]

    Now this is all nice and dandy to play around with and learn something about
    Python's scoping rules, but you can get the same functionality in a
    straightforward way with a callable object (like Bengt Richter's Grouper)
    and that is what I would recommend.

    Peter
    Peter Otten, Mar 26, 2005
    #19
  20. "Jordan Rastrick" <> wrote in message
    news:...

    > But I'm not so much interested in alternate solutions to the problem
    > itself, which is to be honest trivial. I'm intereseted in the
    > implications of the imaginary solution of the Acceptor function.


    Of course.

    But you'll get more people interested in your alternatives if you can come
    up with some use cases that the existing language can't handle easily.
    Andrew Koenig, Mar 26, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rim
    Replies:
    20
    Views:
    697
    Corey Coughlin
    Jul 8, 2003
  2. Replies:
    11
    Views:
    427
  3. Antoon Pardon
    Replies:
    3
    Views:
    261
    Steven Bethard
    Mar 6, 2006
  4. mttc
    Replies:
    9
    Views:
    321
    Martien Verbruggen
    Nov 25, 2008
  5. Mark Owert
    Replies:
    3
    Views:
    89
    Kris Eiben
    Sep 12, 2003
Loading...

Share This Page