Suggesting a new feature - "Inverse Generators"

J

Jordan Rastrick

First, a disclaimer. I am a second year Maths and Computer Science
undergraduate, and this is my first time ever on Usenet (I guess I'm
part of the http generation). On top of that, I have been using Python
for a grand total of about a fortnight now. Hence, I apologise if what
follows is a stupid suggestion, or if its already been made somewhere
else, or if this is not the appropriate place to make it, etc. But I
did honestly do some background checking through the Python
documentation, www.python.org, FAQs, and so forth before making this
post. Basically what I'm trying to say here is, please don't flame this
post :)

Anyway, I have a suggestion to make for a rather radical new feature in
Python. Rather than just struggling to explain and justify it straight
out, I'll give the background of how I thought up the feature - and how
I worked through the ideas I came up with. The example may seem a bit
trivial to start, but bear with me, because I think theres at least
some possibility the suggestion has merit.

I am writing a project in Python atm, and one thing it needs to do is
parse a very simple text file to get a list of records. Its a
straightforward kind of thing that I (and I'm sure you) have done many
times before in many languages.

Each bit of interesting data in the file occurs over 3 or 4 lines of
text. On top of this there are lines of random uninteresting junk as
well.

I decided to write a generator to filter out the junk lines, strip
excess whitespace and the like (generators are one of my favourite
features in Python). So the main body of code looks like this:

def filterJunk(lines):
for line in lines:
# Do filtering & cleaning up stuff on line.
# ...
yield line

def getData(filename):
recordList = []
input = file(filename)
lines = line.readlines()
input.close()
for line in filterJunk(lines):
# ... create records and add to recordList

So far, so good. Next comes the business of actually creating the
records from the lines of interesting text.

You can probably see the problem I ran into - I need to look at 3 or 4
lines at a time to create the record. Theres no easy way out of this
requirement, because whether the record is contained in 3 or 4 lines is
conditional on the value of the second line. But the for loop isn't
going to allow me to do that.
So I rewrote the for loop above as:

while True:
try:
# ....do stuff repeatedly with lines.next() to add to recordList[]
except StopIteration:
pass

This works. (And yes, there are also a multitude of other reasonably
simple ways to approach this problem, some of which I've used in this
past. And probably some fancy ways involving map and reduce as well.)

But it seems ugly to me somehow. And I'm suspicious of anything that
hints of ugliness (in Computer Science, anyway.) I suppose that
Python's getting me into the habit of assuming theres always an elegant
looking solution.

I tried, and failed, to think of some sort of Generator-based concept
(I really do love generators! So simple, but so useful!) that would
allow me to retain a for loop. I guess that, essentially, I feel this
task is really just iterating over the lines, and that iterating over
something in Python is a job for a for loop. Generators allow you to
use for loops all regular iteration. So I wanted to have something
"vaugely Generator like", that would allow me to get back to using a
for loop.

It occured to me - what if I had some sort of "inverse generator" - a
function with a resumable call mechanism that retained state, but that
would resume when accepting input rather than returning output. For
want of a better term, call it an Acceptor.

If you think of a Generator as a method acting as (an abstraction of) a
source of sequential data, then the Acceptor correspondingly is a sink.


I started playing round with the Acceptor idea, thinking about how it
might work. I imagined a simple mirroring of Generator syntax - using a
new keyword, accept, to take in a value, just as Generator's use yield
to return a value. Here's my first attempt at 'coding' my precious for
loop using an imaginary Acceptor:

def combineIntoRecord(): # This is an acceptor function
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
r = createRecord(firstline, secondline, lastline, optionalline)
return r

recordlist = []
for line in lines:
recordlist.append(combineIntoRecord(line))

I've since thought about it somewhat, and its clear I got this first
try just plain wrong. Its got a number of hideous defects. First,
combineIntoRecord() is 'accepting' its values by taking a magical
argument, line, that does not appear in the argument list.

Second, in this particular syntax, do
combineIntoRecord(x) at line 50
and
combineIntoRecord(y) at line 200
want x and y to be combined into the same record, using the old state,
or is the second trying to start a "new call" to the method? For an
extreme example, imagine calls made in different modules. Basically,
the jump into combineIntoRecord code needs to somehow specify an
'instance' of the function, with a given existing state. The method
call needs to be bound to something, somehow.

Worst of all, though, since combineIntoRecord(line) doesn't always
return a value, how does the interpreter know when to call append? It
seems the 'magic' of the acceptor is spilling out into the surrounding
code, making it impossible to understand or intelligently parse. The
magicness needs to be contained within the acceptor itself.

How to resolve these issues?
Well, thinking about how generators achieve their magic-like effects:
- A generator, despite being a function, doesn't return a value in
the usual sense.
- Calls to a generator don't result in a value, they result in an
Iterator object.
- The magic happens at the 'yield' keyword.
So, analguously:
- Perhaps call to combineIntoRecord() shouldn't return a value
- It should return an acceptor object instance instead
- Every call results in a different object - solves problem 2 above
- Acceptor objects have a special method for feeding them values,
takes care of problem 1 above
BUT:
- When and what does an acceptor object 'return'?
- We don't actually want to restrict what an acceptor *function*
returns.
- Acceptors (unlike generators) should be able to return just as per
ordinary functions
- Its not the output of the function thats 'magic', its the input to
the function
Thus:
- We must confine the magic to the input the function recieves - its
argument list.
- Have a 'magic argument' that 'takes in' the accepted values.
- This is fine - it keeps the magic within the call of the acceptor
function
- If an acceptor returns like an ordinary function, it needs to
return when called
- Hence it needs to know its accepted values "all at once"
From our earlier thinking, an acceptor, fundamentally, accepts a
sequence of data.

So let the first argument to an acceptor function call be a "magic
sequence" that contains all the values we want to be accepted. Other
arguments may be passed as normal.

A disadvantage: the power of the acceptor is restricted in this
version, since you can't have elements Accepted at any arbitrary point
in the code. This is really a good thing, though - with the old version
it was the case that too much power was making the idea totally
unworkable. What we've done is essentially bound our Acceptor to
something, namely a sequence. Note, though, that we can use any
abstract source of sequential data - including a Generator - so theres
still quite a lot of flexibility in how it gets data for accepting.

To illustrate this model of an Acceptor, heres an example of a simple
Acceptor that returns the first element of a sequence that satisfies
some condtion (which is presumed to be a function):

def firstThatSatisfies(condition):
accept element
if condition(element): return element

To use, we simply call:

first = firstThatSatisfies(somelist, condition)

somelist, not appearing in the argument list, gets fed into the accept
statement by the 'acceptor magic' .

Note how explicit iteration has been removed from this canonical
algorithm altogether. I dont know about you, but I think it looks kind
of neat.

A Slight Digression:

Clearly it might be true that condtion(element) == 0 for every element
in some list. This brings us to the issue of an accept statment that
doesnt get fed any data.

There are two possible ways to deal with this. One is to insist every
accept statement must get called, and raise an exception if this doesnt
happen. However, I prefer the seemingly more flexible option where you
simply say "Bad luck, once the magic sequece is empty, any remaining
acceptor code doesnt get executed, and any acceptor state is lost."
This means an acceptor might not always give a well defined return
value, but this is no worse than something like

def f()
x = someCalculation()
bool = someConditionThatWillAlwaysBeFalse()
if false: return x

This conventional example has a return value of None, and an Acceptor
that doesnt reach an explict return statement can be treated exactly
the same way.

Digression

Now, going back to my original problem and the first Acceptor I wrote:

def combineIntoRecord(): # This is an acceptor function
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
r = createRecord(firstline, secondline, lastline, optionalline)
return r

This doesnt work anymore - it produces and returns the first record we
want, and then stops. We want a list of all records .

Maybe we could use a loop.The 'accepting magic' would need to be
confied to this loop. And the loop needs to know when the magic stream
of data comes to an end to know when to return the final value. Maybe
use another special keyword:

def combineIntoRecord():
recordList = [].
while acceptingonly
# ....accept values, append createRecord to recordLlist


Perhaps. I don't really like the idea of having any magic new keywords
other than accept itself; if it needs this sort of explicit while loop,
maybe it shouldnt be in an Acceptor at all (the whole original point of
this idea was to get rid of while loop after all!) **

So are Acceptors pretty useless, after all that? Maybe. Then again,
maybe not.

What is this combineIntoRecord(), that I have written using
non-existent language features, trying to do, in a general sense?

It wants to take some sequence (the lines in a file) and produce
another sequence, that in some way depends first. Howevers it wants to
be able define basically any arbitary dependence between the sequences,
taking the elements from the first and producing output to the second
at its leisure.

The 'taking at its leisure' comes by being virtue of an Acceptor.
'Producing at its leisure' is a feature already available in Python.
Its called a Generator.

Behold:

# An Acceptor/Generator!!!
def combineIntoRecords():
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
yield createRecord(firstline, secondline, optionalline,
lastline)

def getData(lines):
return list(combineIntoRecords(filterJunk(lines)))

So, alas, my beloved for loop was not saved after all.

Fortunately, the dreaded While loop has been vanquished. In fact all
loopiness has disappeared. Along with any other semblance of a main
method.

I think I like the finished result. I'll leave if for you to decide if
you like it too.
 
J

Jordan Rastrick

Sorry about the mangled formatting... like i said, first time on Usenet

Suggestions, comments, replies, etc most welcome.

This definitely includes replies of the form: "This is stupid,
because..."
provided it isnt followed with "youre a jerk who knows nothing.
Period."

Heres a follow up rant in the form of a Q&A, trying to answer what I
think are likely or interesting questions:

----------------------------------------------------

Q: Is this idea even remotely feasible to actually implement?

A: No Idea

Q: Are you volunteering to implement it?

A: Not really. I'd have no clue where to start. I probably don't even
have enough C experience to begin to understand the Python source. But
if some real Python dev (or devs)actually wants to implement this, and
is willing to humour an undergraduate by somehow utilising assistance
in form a very limited set of programming skills, well, then I'd be
highly flattered and more than happy to help :)

Q: The whole document is full of half-formed ideas, hundredth-formed
ideas,
mistakes(conceptual/typograhpcial/stylistic/grammatical/terminological/programming/...),
inconsistencies, random nonsense......

A: Firstly, thats not a question, its a statment.

Secondly, see the very first paragraph.

Thirdly, the whole document, from initial attempts to code the original
for loop (yes, it is a real program I'm trying to write here), to
thinking of Acceptors, refining the model to something vaguely
coherent, coming up with examples, and typing it all out, was produced
on the fly, in the space of 5 hours or so. I attempted to give it a
semblance of coherent structure at first, but soon gave up. Its
basically stream of conciousness production. The bit where the first,
nonsensical example is given, is literally where I first tried to
forumlate the idea in code, saw the massive holes, stopped, thought
about it for a few minutes, and came backed and typed out the much
improved second version. The whole thing is totally unedited.Oh, and I
had virtually no sleep last night.

A real answer, with respect to the issue of terminology: If I use
technical terms loosely, e.g., if I improperly use 'method' for
'function' or vice versa (a hang up from dealing with languages that
only have one e.g. Java, or the other, e.g. C) , please forgive me. I
hope what I'm trying to say is still clear enough.

And as for coding style, Python itself is inconsistent (or to be more
diplomatic, not obsessively concerned with the relatively minor issue
of) using underscore_seperated_words or mixedCaseInstead, etc. And
again, I cry the nefarious influence of Java....

Q: Did you steal this from somewhere else?

A: No, its entirely something I thought of, by myself, in the last 5
hours. Thats not to say something like it (or more likely, better than
it) hasnt been thought of somewhere else by someone else. But if it
has, I havent seen it.

To give credit where due though: I was motivated by the various
examples, from sources like www.python.org, of how Generators can be
used to do things really elegantly. Also, I'm still tossing around
ideas in the back of my mind from a great talk given at my university
by Rob Pike, of Google, about a language he helped devise and implement
called sawzall (or something close). Its used to do processing with
Google's massive data sets and highly parallel computing architecture,
yet its incredibly simple to understand and very efficient. I guess it
was only tenuously related to the idea presented here. But it was
certainly very inspiring in a general computer-science-enthusiasm kind
of way.

Q: Why did you go to all this trouble thinking up this feature when you
could have solved the original problem in so many other, simpler ways?

A: I honestly can't give a decent justification :) Whenever I've
needed to do things like:

line = readline from file
if (line is some sort of 'extra' data)
do something with line
line = readline from file # get the 'regular' data
....

in the past, in Pascal, Java and the like, it always irked me. Don't
ask why, it just did.

I thought I could get around it with generator in Python, but I
couldnt. In trying to think of a generator solution, I more or less
randomly thought of extending the generator concept somehow, and I
decided to see where it took me. I thought it took me to somewhere
interesting, and possibly worthy of further investigation.

Q: What about using some sort of 'sequential data sink' that doesn't
use 'pretending to be a function' syntax, e.g. a simple Class of some
sort?

A: Maybe. You can have Iterators without Generators. Maybe you can have
"inverse Iterators" without "inverse Generators", I havent thought
about it.
I love the 'pretending to be a function' syntax, its part of what makes
Generators so elegant. So I tried to emulate it.

Q: What could be some of the benefits of introducing this feature? What
could it be used for?

A: Again, I don't really know. Past the examples I've given, I havent
really thought about possible applications.

Potentially there might be efficiency gains using an Acceptor in place
of a regular function that just processes a sequence, in the same way
that using a Generator can be more efficient than using an actual
sequence like a list.

Then again, there might not.

Q: ....So you've put forward this idea without really stopping to think
if its actually useful?

A: The idea seems *potentially* powerful and useful to me. Once more,
for emphasis - I'm most definitely NOT trying to assert this is a great
idea. I'm saying that maybe its a good idea, so lets put it on the
table. That way, if it is really bad, lots of really smart people can
shoot it down really quickly and save me wasting any more time thinking
about it.

Q: Maybe you should have put a bit more personal though/research into
the idea before putting it out into the public domain? Or at least
waited until you had more experience in Python?

A: I'm not an academic or a professional, I don't have any reputation
worth preserving. I tried not to suggest this in a forum where it'd
distract the people who are responsible for developing Python from
doing their very important work. Anyone could choose to ignore the post
if they so desired - I havent wasted anyone's time, unless they've
conciously chosen to waste it. And you read far enough to get to this
question, so I assume you were at least vaguely interested.

Q: Why not make a smaller, more tenuous suggestion, get some feedback,
and go from there, rather than making one big ranting post?

A: I suppose I just wanted to get all the ideas out while they were
stll in my head. Its the kind of chaotically creative approach I often
apply (though not intentionally) to programming, with results that are
occasionally great and often disastrous.

Q: In Python, there were list comprehensions, and then generator
comprehensions. So what about Acceptor comprehensions?

A: Yeah, sure, maybe. It sounds like a cool idea at least. Havent
thought about it.

Q: Why did you choose this particular syntax and naming?

A: I just used what seemed the most obvious way to do things. I'm sure
better versions could be thought of.

Some advantages: its simple. The use of accept parallels the use of
yield in generators.
Its, arguably, pretty intutiitive.

Disadvantages: It could be a bit too verbose, having to type accept for
each magic argument. Perhaps an abbreviated sytax might be good

e.g. accept arg1, arg2
in place of
accept arg1
accept arg2

so long as it didn't conflict/confuse with existing syntax, for tuple
assigment, etc.

Another possible variant: the 'magic sequence' could be explicitly
named in an Acceptor's argument list. I kind of like the way the
Acceptor is 'bound' to its magic sequence by the call, with the
sequence itself disappering from the argument list... its sort of
vaguely similar to how x magically appears in the argument list when
making the method call x.someMethod(otherArgs)

Q: Why do you keep using the word magic everywhere?

A: "Any sufficiently advanced techonology is indistinguishable from
magic."
Seems like a good enough term for a language feature that I can
conceive, and can think in a limited fashion about some of its effects,
but have no real understanding of its possible mechanisms. Or just for
stuff that seems cool, powerful and slightly mysterious. :) Besides,
once I'd started using it, I think it helped to be consistent.

Q: Can't you already achieve this precise effect very easily with
existing syntax (filter, list comprehensions, simple recursion,
something)?

A: Maybe, I don't know. If you can think of how, I'd like to hear it.

Q: What about backwards compatibility issues?

A: The only obvious one I can think of is the use of accept as a
keyword, which would conflict with any variables called accept. Im sure
one exists in some code somewhere. Hopefully, not too much code would
be broken. Possibly a different syntax wouldnt even have to use a new
keyword, or could use one that wouldnt cause much disruption to code.

I imagine the issue would not be significantly worse than the
introduction of yield for generators was. Or any other new language
feature for that matter.

Note, this is one of the reasons I'd be reluctant to see other special
keywords/functions aside from accept used in the mechanism.

Q: Why don't you want Acceptors to be able to know when the magic
sequence ends?

A: I don't know, I just don't like it. My intuition is you'd end up
writing loops in acceptor functions that could have just been written
in more or less exactly the same way without an acceptor, instead of
writing Acceptor focused code. There's already an implict loop
iterating through the sequence. Adding an additional loop with
something like
while accepting:
accept ....
is complicating things. Note neither of the two finished examples had
any loops in them at all.

As usual, I could be completely wrong.

A possible compromise: Have an optional finally block at the end of the
method, that executes when the magic sequence runs out of elements.
This could serve much the same function as while accepting: without
reintroducing banished loopiness. And maybe a similar thing, but prior
to accepting. A 'first' block. So:

def combineIntoRecord(): # This is an acceptor function
first:
recordList = []
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
recordList.append(createRecord(firstline, secondline, lastline,
optionalline))
finally:
return recordList

Of course, thats another new keyword, plus an existing one
overloaded.... again, I'm not a big fan.....

Basically, I can summarise my views on what should and shouldn't be
included in the mechanism thusly:
Iwant to be able to 'accept' in an arbitrarily (or at least very)
complicated way within an Acceptor. I reckon that is what would give
them their power, and could lead to cool applications.

But conversely I don't want Acceptors to 'need to do everything',
burdening them with extra syntax to allow them to behave like regular
functions. They're not intended as a super technique that solves every
problem, they're intended to find and fill a particular programming
niche. Better that niche be small, but Acceptors be a perfect fit for
it - obviously the right answer for some problems, and equally,
obviously the wrong answer for others.

For comparison, think about lambda functions in Python allowed to
contain arbitrary sequences of statements, not just single expressions.
Anything that can't fit easily in a lambda, doesnt belong there. The
same goes for Acceptors.

Most imporant of all, the motivation of the whole exercise is to make
things simpler and more elegant, not more complex and ugly.

Q: Isn't it just perverse to want to code algorithms that are
inherently suited to looping in syntax that does not explicitly loop?

A: Perhaps. The lack of looping wasn't actually how I originally
envisioned the final result.

But you need to think long and hard about what is meant by an
"algorithm inherently suited to looping." Try using a purely functional
language (e.g. Haskell) and you might see what I mean.

As far as my limited understanding of these things go, from a
theoretical viewpoint, looping is actually only a special case of
recursion. Its just that in structured, low-level, imperative
languages, looping is the standard technique for solving a certain
class of problems. So if like most people you learn to program in such
a language, you'll naturally tend to think 'loops.' Hence the issue is
really something inherent to your way of thinking about programs, and
not to the actual problem.

Q: You don't need to use yield, or a loop, or first/finally blocks, to
make the final example work. Just pass in an empty list, and append it
at the end of the method; return nothing.

A: Yeah, I just now realised that. In fact possibly all the benefits of
both the while Accepting and the first/finally ideas could be realised
by preparing and passing in any nessecary intial state, and using
callback on the passed objects instead of return. That way of
implementing things seems cleaner and more in line with the 'Acceptor
feel', in fact. This is definitely a line of thought that needs to be
explored further....

For what its worth, conceptually, that solution pretty much works the
same as using yield. But I like the idea of an Acceptor/Generator, so
I'll let it stand in its own right. Needs a better name name though :)

Q: What's the project you were working on when you thought of this?

A: Nothing particularily interesting or important ;-)

Q: Have you got anything more to add?

A: Yeah, I'm tired, and going to bed now.

I'll be interested to see if this actually generates a substantial
discussion of any sort.

And I do apologise for the length of this rant. I hope that by reading
it, you've at the very least gained some sort of amusement and sense of
overwhelming superiority in light of my outrageously stupid
thoughts.....
 
A

Andrew Koenig

def combineIntoRecord(): # This is an acceptor function
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
r = createRecord(firstline, secondline, lastline, optionalline)
return r
recordlist = []
for line in lines:
recordlist.append(combineIntoRecord(line))

How about doing it this way?

class Acceptor:
def __init__(self, gen):
self.gen = gen
def next(self):
firstline = self.gen.next()
secondline = self.gen.next()
if condition(secondline):
optionalline = self.gen.next()
accept lastline
r = createRecord(firstline, secondline, lastline, optinalline)
return r

This is just a generator done longhand. If anything in Acceptor.next raises
StopIteration, so will Acceptor.next itself. Which means that you can now
write this:

for r in Acceptor(line):
recordlist.append(r)

or, for that matter,

recordlist = list(Acceptor(line))

Incidentally, I did not try to fix the bug in your code that if
condition(secondline) is false, optionalline never gets set so the program
will crash :)
 
T

Tim Hochberg

Jordan Rastrick wrote:

[CHOP]
Behold:

# An Acceptor/Generator!!!
def combineIntoRecords():
optionalline = None # We may not get given a value for this line
accept firstline
accept secondline
if condition(secondline):
accept optionalline
accept lastline
yield createRecord(firstline, secondline, optionalline,
lastline)

That's a nice example, but why not:

def combineIntoRecords(iterable):
iterator = iter(iterable)
optionalline = None
firstline = iterator.next()
secondline = iterator.next()
if condition(secondline):
optionalline = iterator.next()
lastline = iterator.next()
yield createRecord(firstline, secondline, optionalline, lastline)

???

-tim
 
J

Jordan Rastrick

Thanks for the very fast feedback :)

I specifically set optionalline = None to deal with that bug you
mentioned, with the implicit assumption createRecord knows how to deal
with a None argument. If that guard got destroyed in the copy paste
process, my bad.

As for you solution, yes, you could do it that way.

But I'm not so much interested in alternate solutions to the problem
itself, which is to be honest trivial. I'm intereseted in the
implications of the imaginary solution of the Acceptor function.

Just as you can have Iterators (an abstract class based solution to
iteration) without Generators, yet Generators are still of interest, so
you could solve this problem without Acceptors, yet Acceptors are
(potentially) of interest.

Basically, an Acceptor is linked to a Generator (or some other
sequence) in the way you have linked the class implementation to a
Generator. So this is a kind of implementation of an Acceptor using a
class. I like the Acceptor syntax, though. And I'm wondering if maybe
there are more complex examples that are harder or impossible to do
your way...

I think I already put something (briefer) in the Q&A bit along these
lines....
 
M

Michael Spencer

Tim said:
Jordan Rastrick wrote:

itertools.groupby enables you to do this, you just need to define a suitable
grouping function, that stores its state:

For example, if short lines should be appended to the previous line:

from itertools import groupby
linesource = """\
Here is a long line, long line, long line
and this is short
and this is short
Here is a long line, long line, long line
and this is short""".splitlines()

def record(item, seq = [0]):
if len(item) > 20:
seq[0] +=1
return seq[0]

... print "".join(lines)
...
Here is a long line, long line, long lineand this is shortand this is short
Here is a long line, long line, long lineand this is short
Michael
 
J

Jordan Rastrick

Yes, granted.

This is basically the same as Andrew's reply, except with Iterators in
place of generators, so I'll let my answer to that stand. In fact, its
my solution, but with iter.next() in place of accept :)

This is probably something like how I wanted to solve the problem when
I first was looking at it and wanting a generator based solution. I
ended up thinking of this whole Acceptor business as part of getting me
to this point.

I'll try to reduce my pages of ranting to a single question. In
posting, i was wondering if the "syntactic sugar" (Acceptors) that i
invented to implement the solution is of any general interest. So are
there maybe examples less straightforward than this one, where
Acceptors work better? Or can you always just "turn the generator
inside out" in the way you have done here?

If you can always do it your way, well, thats a powerful design
pattern, and it just goes to show my faith in Generators was justified
:) And that I wasnt thinking hard /clearly enough about how to use
them.

There are other issues, like "Does the Acceptor syntax, although
perhaps functionally equivalent to other methods, ever make the code
more readable, easier to parse, etc?" But they're a lot less important
i'd say.
 
J

Jordan Rastrick

Wow, if I'm going to get replies (with implemented solutions!) this
quickly, I'll post here more often :)

This is the most different to my solution, and also the shortest, and
therefore the most interesting, reply so far. Its also the last one
I'll reply to before I go to bed. zzzz....

Its taken me a while to get a rough understanding of this code, but I
think I have some idea. Correct me if I'm wrong.

groupby groups based on value of line(record)

Record returns 1 for the first line, 1 of the second, 1 for the 3rd,
then 2 for the 4th because seq[0] gets incremented since len(line) > 20

OK thats fair enough. But how does record retain state between calls?
How is that related to the fact your storing your value as a singleton
list, instead just an int?

It seems a little confusing to be honest, probably mainly due to my
unfamiliarity with groupby. Retaining state between method calls is
part of what interests me so much about the Generator/ Acceptor case.
Here youre retaining state between calls with none of the special
syntax used for example in Generators.

How? Is it a side effect of the way groupby uses record? If so, then
thats a littleoblique and unreadable for my liking.

Is the state the record returns passed back to it somehow? I take it
gets passed a value for seq at some point, seeing as how you've
bothered to define it as a default argument rather than just seq = [0]
on the first line.

That works, but at the cost of having to return and pass all of state
every call. I imagine other solutions (Generator/Acceptor based etc)
would be substanitally more efficient. And again more readable, to a
Python beginner such as myself at least.

Still, this is fascinating.... going to have to spend some time
experimenting with groupby as soon as I get a chance....
 
D

Diez B. Roggisch

I'll try to reduce my pages of ranting to a single question. In
posting, i was wondering if the "syntactic sugar" (Acceptors) that i
invented to implement the solution is of any general interest. So are
there maybe examples less straightforward than this one, where
Acceptors work better? Or can you always just "turn the generator
inside out" in the way you have done here?

If you can always do it your way, well, thats a powerful design
pattern, and it just goes to show my faith in Generators was justified
:) And that I wasnt thinking hard /clearly enough about how to use
them.

There are other issues, like "Does the Acceptor syntax, although
perhaps functionally equivalent to other methods, ever make the code
more readable, easier to parse, etc?" But they're a lot less important
i'd say.


To me your acceptors look like micro-threads or similar concepts that have
been popped up here every now and then - but so far it seems they didn't
end up beeing included. I can't say for what reasons though.

Just yesterday I looked into stackless python to grasp what it does, and
while it is not syntactically different to standard python, it seems to
make coding the way you intend to do possible.
 
M

Michael Spencer

Jordan said:
Wow, if I'm going to get replies (with implemented solutions!) this
quickly, I'll post here more often :)

That is indeed typical of this most attentive group :)
Its taken me a while to get a rough understanding of this code, but I
think I have some idea.
It is just an example jotted in 2 min - no doubt it could be made clearer.
Correct me if I'm wrong.
groupby groups based on value of line(record)
No, groupby, groups on the value of record(item), where item is given by
iterating over linesource

You should check the itertools documentation:
http://docs.python.org/lib/itertools-functions.html
Record returns 1 for the first line, 1 of the second, 1 for the 3rd,
then 2 for the 4th because seq[0] gets incremented since len(line) > 20

In this case, it doesn't matter what record returns, as long as it is equal for
successive values of item that should be grouped
OK thats fair enough. But how does record retain state between calls?
How is that related to the fact your storing your value as a singleton
list, instead just an int?

You are asking about the fundamental behavior of Python: argument passing,
mutable objects and scopes.
It seems a little confusing to be honest, probably mainly due to my
unfamiliarity with groupby. Retaining state between method calls is
part of what interests me so much about the Generator/ Acceptor case.
Here youre retaining state between calls with none of the special
syntax used for example in Generators.

How? Is it a side effect of the way groupby uses record? If so, then
thats a littleoblique and unreadable for my liking.

No, it's nothing special about groupby. record simply stores its state in a
mutable default parameter. This isn't general good practice: at least you have
to be careful with it. You can see the behavior in the following example:
>>> def accumulate(value, accum = []):
... accum.append(value)
... return accum
...
>>> accumulate(1) [1]
>>> accumulate(2) [1, 2]
>>> accumulate(6) [1, 2, 6]
>>>
....

Still, this is fascinating.... going to have to spend some time
experimenting with groupby as soon as I get a chance....
Experimenting is good. So is the the documentation:
http://docs.python.org/tut/tut.html

Michael
 
T

Terry Reedy

Terminology: To me and some (most?) others posting here and, I believe,
both the docs and the common meaning of 'generator', a Python generator is
the particular kind of iterator that produces multiple values on request
and which is created by the generator function that you write.

Acceptors (consumers): The concept is known to both CS and Python
developers. According to Tim Peters, Python generators constitute
'semi-coroutines'. Any full coroutine mechanism (Stackless, for instance)
will allow the reverse (not inverse, which would undo rather than
complement). For various reasons, the Python developers choose that data
chains should be written as consumer code getting data from a generator,
which might in turn get data from a generator.

In your example, opening and reading the lines of the data file could be
done by filterjunk, not the main function, but I can see reasons both ways.

For your specific problem, you could, I believe, use an intermediate
generator (which could also be combined with filterjunk) that combines
lines into complete text records. Something like (ignoring any fussy
details left out):

def textrecord(file):
trlines = []
for line in filterjunk(file):
trlines.append(line)
if complete(trline): yield ''.join(trlines)
# where 'complete' is code to determine if have all lines in record or
not

Terry J. Reedy
 
S

Scott David Daniels

Michael said:
itertools.groupby enables you to do this, you just need to define a
suitable grouping function, that stores its state:

Michael, this would make a great Python Cookbook Recipe.

--Scott David Daniels
(e-mail address removed)
 
B

Bengt Richter

Tim said:
Jordan Rastrick wrote:

itertools.groupby enables you to do this, you just need to define a suitable
grouping function, that stores its state:

For example, if short lines should be appended to the previous line:

from itertools import groupby
linesource = """\
Here is a long line, long line, long line
and this is short
and this is short
Here is a long line, long line, long line
and this is short""".splitlines()

def record(item, seq = [0]):
if len(item) > 20:
seq[0] +=1
return seq[0]

... print "".join(lines)
...
Here is a long line, long line, long lineand this is shortand this is short
Here is a long line, long line, long lineand this is short
Nice, but I think "record" is a bit opaque semantically.
How about group_id or generate_incrementing_unique_id_for_each_group_to_group_by or such?

Regards,
Bengt Richter
 
M

Michael Spencer

Scott said:
Michael, this would make a great Python Cookbook Recipe.
OK, will do. What would you call it? Something like: "Stateful grouping of
iterable items"

[Bengt]:
Nice, but I think "record" is a bit opaque semantically.
How about group_id or generate_incrementing_unique_id_for_each_group_to_group_by or such?

Regards,
Bengt Richter

Agreed, it's an issue. I think the most natural name is groupby - but that
would cause more trouble. What do you think about 'grouping' ?
I would use 'generate_incrementing_unique_id_for_each_group_to_group_by', but
then people might think I'm trying to outdo Bob Ippolito :)

[Serge]:
I think your example would
be more clear for Jordan if you used function attributes:

def record(item):
if len(item) > 20:
record.seq +=1
return record.seq
record.seq = 0

That does read better than the mutable default argument hack. Is this use of
function attributes generally encouraged? (I tend to think of func_dict for
meta-data, used only outside the function) Thoughts?

Michael
 
S

Scott David Daniels

Michael said:
OK, will do. What would you call it? Something like: "Stateful
grouping of iterable items"

How about "Using groupby to fill lines"?

--Scott David Daniels
(e-mail address removed)
 
B

Bengt Richter

Scott said:
Michael, this would make a great Python Cookbook Recipe.
OK, will do. What would you call it? Something like: "Stateful grouping of
iterable items"

[Bengt]:
Nice, but I think "record" is a bit opaque semantically.
How about group_id or generate_incrementing_unique_id_for_each_group_to_group_by or such?

Regards,
Bengt Richter

Agreed, it's an issue. I think the most natural name is groupby - but that
would cause more trouble. What do you think about 'grouping' ?
I would use 'generate_incrementing_unique_id_for_each_group_to_group_by', but
then people might think I'm trying to outdo Bob Ippolito :)

[Serge]:
I think your example would
be more clear for Jordan if you used function attributes:

def record(item):
if len(item) > 20:
record.seq +=1
return record.seq
record.seq = 0

That does read better than the mutable default argument hack. Is this use of
function attributes generally encouraged? (I tend to think of func_dict for
meta-data, used only outside the function) Thoughts?
Personally, I don't like depending on an externally bound (and rebindable) (usually global)
name as a substitute for "self." You could always use a class to carry state, e.g. (untested)

class Grouper(object):
def __init__(self): self.group_id = 0
def __call__(self, item):
if len(item) > 20: self.group_id += 1 # advance id to next group
return self.group_id
# ...
grouper = Grouper()
# ...
for groupnum, lines in groupby(linesource, grouper):
print "".join(lines)

Or (guess I better actually try this one ;-)
... Here is a long line, long line, long line
... and this is short
... and this is short
... Here is a long line, long line, long line
... and this is short""".splitlines() ... print "".join(lines)
...
Here is a long line, long line, long line
and this is short
and this is short
Here is a long line, long line, long line
and this is short

Regards,
Bengt Richter
 
J

Jordan Rastrick

No, it's nothing special about groupby. record simply stores its
state in a
mutable default parameter. This isn't general good practice: at least you have
to be careful with it. You can see the behavior in the following example:
def accumulate(value, accum = []):
... accum.append(value)
... return accum
...
accumulate(1) [1]
accumulate(2) [1, 2]
accumulate(6) [1, 2, 6]

Wow.... I'd never seen this kind of thing in examples of Python code.
Although its really neat, it doesn't really make sense, intuituvely to
me. Why does accum remember its state - I suppose its to do with the
scope of arguments (as opposed to method variables) or something like
that?

Still, thats powerful. But I see why its not standard use - it could
be easily abused!
 
P

Peter Otten

Jordan said:
No, it's nothing special about groupby. record simply stores its state in a
mutable default parameter. This isn't general good practice: at least you have
to be careful with it. You can see the behavior in the following example:
def accumulate(value, accum = []):
... accum.append(value)
... return accum
...
accumulate(1) [1]
accumulate(2) [1, 2]
accumulate(6) [1, 2, 6]

Wow.... I'd never seen this kind of thing in examples of Python code.
Although its really neat, it doesn't really make sense, intuituvely to
me. Why does accum remember its state - I suppose its to do with the
scope of arguments (as opposed to method variables) or something like
that?

Michael's accumulator uses the fact that default arguments are only
evaluated once -- when the function is created. The behaviour shown above
is actually a common trap every newbie has to experience once until he
learns the workaround:

def accumulate(value, a=None):
if a is None:
a = []
a.append(value)
return a
Still, thats powerful. But I see why its not standard use - it could
be easily abused!

There are limitations, too. If you want more than one accumulator you have
to pass the accum argument explicitly or wrap accumulate() into a factory:

def make_accumulator():
def accumulate(value, a=[]):
a.append(value)
return a
return accumulate

Sill, you cannot get hold of the result of the accumulation without
modifying it. One way to fix that:
.... a = []
.... def accumulate(value):
.... a.append(value)
.... return a, accumulate
....
items1, accu1 = make_accumulator()
for i in range(4): accu1(i) ....
items1 [0, 1, 2, 3]
items2, accu2 = make_accumulator()
for i in "abc": accu2(i) ....
items2 ['a', 'b', 'c']
items1
[0, 1, 2, 3]

Now this is all nice and dandy to play around with and learn something about
Python's scoping rules, but you can get the same functionality in a
straightforward way with a callable object (like Bengt Richter's Grouper)
and that is what I would recommend.

Peter
 
A

Andrew Koenig

But I'm not so much interested in alternate solutions to the problem
itself, which is to be honest trivial. I'm intereseted in the
implications of the imaginary solution of the Acceptor function.

Of course.

But you'll get more people interested in your alternatives if you can come
up with some use cases that the existing language can't handle easily.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top