Pre-PEP: reverse iteration methods

R

Raymond Hettinger

Here is a discussion draft of a potential PEP.
The ideas grew out of the discussion on pep-284.
Comments are invited. Dart throwing is optional.


Raymond Hettinger

-------------------------------------------------------------

PEP: 323
Title: Add Reverse Iteration Methods
Version: $Revision: 1.1 $
Last-Modified: $Date: 2003/03/11 04:49:44 $
Author: Raymond Hettinger <python at rcn.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Sep-2003
Python-Version: 2.4
Post-History: 23-Sep-2003


Abstract
========

This proposal is to extend the API of several sequence types
to include methods for iterating over the sequence in reverse.


Motivation
==========

For indexable objects, current methods for reverse iteration are
error prone, unnatural, and not especially readable::

for i in xrange(n-1, -1, -1):
print seqn

One other current approach involves reversing a list before iterating
over it. That technique wastes computer cycles, memory, and lines of
code. Also, it only works with lists (strings, for example, do not define
a reverse method)::

rseqn = list(seqn)
rseqn.reverse()
for elem in rseqn:
print elem

Reverse iteration is much less common than forward iteration, but it
does arise regularly in practice.


Proposal
========

Add a method called iter_backwards() to sequence objects that can benefit
from it. The above examples then simplify to::

for i in xrange(n).iter_backwards():
print seqn

for elem in seqn.iter_backwards():
print elem

The new protocol would be applied to lists, strings, xranges objects,
and possibly other sequence objects as well (depending on use cases
and implementation issues). It would not apply to unordered collections
like dicts and sets.

No language syntax changes are needed.


Open Issues
===========

* Should tuples be included? In the past they have been denied some list
like behaviors such as count() and index().

* Should file objects be included? Implementing reverse iteration may not
be easy though it would be useful on occasion.

* Should enumerate() be included? It would only provide reverse iteration
whenever the underlying sequence supported it.


Copyright
=========

This document has been placed in the public domain.
 
A

Andrew Dalke

Raymond Hettinger:
Proposal
========

Add a method called iter_backwards() to sequence objects that can benefit
from it. The above examples then simplify to::

for i in xrange(n).iter_backwards():
print seqn

for elem in seqn.iter_backwards():
print elem


What about letting 'iter_backwards' be a builtin which
looks for __riter__ and if it doesn't exist, get the length
and do old-fashioned offset indexing?

Something like this untested code

def iter_backwards(obj):
try:
riter = getattr(obj, "__riter__")
except AttributeError:
n = len(obj)
while n != 0:
n = n -1
yield obj[n]
else:
for term in riter():
yield term

which would be used like

for i in iter_backwards(xrange(n)):
print seqn

for elem in iter_backwards(seqn):
print elem

Andrew
(e-mail address removed)
 
J

John Roth

Overall impression: I like it.
More comments interspersed.

John Roth

Raymond Hettinger said:
Here is a discussion draft of a potential PEP.
The ideas grew out of the discussion on pep-284.
Comments are invited. Dart throwing is optional.


Raymond Hettinger

-------------------------------------------------------------

PEP: 323
Title: Add Reverse Iteration Methods
Version: $Revision: 1.1 $
Last-Modified: $Date: 2003/03/11 04:49:44 $
Author: Raymond Hettinger <python at rcn.com>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 23-Sep-2003
Python-Version: 2.4
Post-History: 23-Sep-2003


Abstract
========

This proposal is to extend the API of several sequence types
to include methods for iterating over the sequence in reverse.


Motivation
==========

For indexable objects, current methods for reverse iteration are
error prone, unnatural, and not especially readable::

for i in xrange(n-1, -1, -1):
print seqn

One other current approach involves reversing a list before iterating
over it. That technique wastes computer cycles, memory, and lines of
code.


It also has the "returns None instead of an object" wart.
Also, it only works with lists (strings, for example, do not define
a reverse method)::

rseqn = list(seqn)
rseqn.reverse()
for elem in rseqn:
print elem

Reverse iteration is much less common than forward iteration, but it
does arise regularly in practice.
Absolutely.


Proposal
========

Add a method called iter_backwards() to sequence objects that can benefit
from it. The above examples then simplify to::

for i in xrange(n).iter_backwards():
print seqn

for elem in seqn.iter_backwards():
print elem

The new protocol would be applied to lists, strings, xranges objects,
and possibly other sequence objects as well (depending on use cases
and implementation issues). It would not apply to unordered collections
like dicts and sets.


I presume that the result of iter_backwards() is an iterator
object with the desired behavior. In other words, it's not
the original object that has methods to handle the iteration
protocol.
No language syntax changes are needed.


Open Issues
===========

* Should tuples be included? In the past they have been denied some list
like behaviors such as count() and index().

I'd say yes, but I don't know the reason why tuples are missing methods.
* Should file objects be included? Implementing reverse iteration may not
be easy though it would be useful on occasion.

Are we talking text files or binary files here? Offhand, I'm not even
sure there is a binary file iterator, although if there is reverse iteration
shouldn't be difficult. Reverse iteration for text files simply means
implementing
the reverse scan in C, since the standard library doesn't support it. For
small enough files, it may be easier to simply internally use getlines() and
then use iter_reverse() on the list.
* Should enumerate() be included? It would only provide reverse iteration
whenever the underlying sequence supported it.

I don't see why not.

In general, I support the notion that a concept should be implemented
pervasively, unless there's a clear reason not to do it in a special case.
Special cases are just that - if they really are special, you trip over them
once and then remember them. What gets messy is something that's
partially implemented, where there is no clear rule to determine when
it is and when it isn't.

John Roth
 
T

Terry Reedy

Raymond Hettinger said:
for i in xrange(n).iter_backwards():
print seqn

for elem in seqn.iter_backwards():
print elem


I would prefer a much shorter name, such as riter for reverse/right
iter. This goes along with current nomenclature such as rfind, etc.

Terry J. Reedy
 
J

John Roth

Andrew Dalke said:
Raymond Hettinger:
Proposal
========

Add a method called iter_backwards() to sequence objects that can benefit
from it. The above examples then simplify to::

for i in xrange(n).iter_backwards():
print seqn

for elem in seqn.iter_backwards():
print elem


What about letting 'iter_backwards' be a builtin which
looks for __riter__ and if it doesn't exist, get the length
and do old-fashioned offset indexing?


There are objects that support iteration that
don't support len(), such as file objects.
This has got the advantage, of course, that it would
automatically work on all objects that support
an sequence-like protocol, though.

Frankly, I prefer the notion of a method.
Maybe make it a method of object that
automatically works on all new style
objects that support a sequence-like
protocol?

John Roth
 
A

Andrew Dalke

John Roth:
There are objects that support iteration that
don't support len(), such as file objects.

Sure. Then it'll given an exception. The implementation
should turn around and raise the proper exception there
instead of a "len doesn't exist" one.

There's also a problem that len works on a dict but
__getitem__(int) will only work if the dict stores 0->n-1
as keys.
This has got the advantage, of course, that it would
automatically work on all objects that support
an sequence-like protocol, though.
Yup.

Frankly, I prefer the notion of a method.

While I don't. There are many, many ways to
iterate through a list. (What about all evens
followed by all odds?) Right now they are all
done by using a function which creates an iterator
across a container. If it's a method then you've
said that reverse_iter is special enough that it
must be a method, which seems strange for something
which is so infrequently used.

Andrew
(e-mail address removed)
 
S

Sean Ross

Raymond Hettinger said:
Here is a discussion draft of a potential PEP. [snip]
Proposal
========

Add a method called iter_backwards() to sequence objects that can benefit
from it. The above examples then simplify to::

for i in xrange(n).iter_backwards():
print seqn

for elem in seqn.iter_backwards():
print elem


Hi.
This will mostly just be some alternate naming suggestions:

How about just ".backwards", where backwards acts like a read-only property
that returns a generator for reverse iteration?

for i in xrange(n).backwards:
print seqn

for elem in seqn.backwards:
print elem

It's not as explicit as iter_backwards(), but it's shorter, cleaner, and
appears
obvious (to me, anyway). Or, perhaps, 'ibackwards', or 'ireverse' ; or
ibackwards() or ireverse(). (being reminiscent of imap() and izip())
'.reverse' would be good, but, of course, it's already taken...

If one could be made, a fast** general purpose "reverse(iterable)" (or
ireverse(iterable)) function would be my preference (at the very least, it
would
avoid having to add iter_backwards() to several type definitions).

# e.g.
for i in reverse(xrange(n)):
print seqn

for index, item in reverse(enumerate(seqn)):
print "index:%s item:%s"%(index, item)

# that sort of thing...





** Yep, fast. Something where xrange(n-1, -1, -1) is not exorbitantly
faster than reverse(xrange(n)). Similarly, for sequence types, you'd
want reverse(list) to be atleast somewhere near as quick as:

# python-faq, entry 4.6
list.reverse()
try:
for x in list:
"do something with x"
finally:
list.reverse()
 
R

Raymond Hettinger

[Raymond Hettinger]
[John Roth]
Absolutely.

Because the question will arise, I did a quick use case analysis from
the standard library. I'll attach the following to the PEP. Feel free
to comment or add other use cases.


Raymond Hettinger

--------------------------------------------------------------------------


Use Case Analysis
=================

Here are some instances of reverse iteration taken from the standard
library and comments on why reverse iteration was necessary:

* atexit.exit_handlers() uses::
while _exithandlers:
func, targs, kargs = _exithandlers.pop()
. . .
The application dictates the need to run exit handlers in the
reverse order they were built. The "while alist: alist.pop()"
form is readable and clean; however, it would be slightly faster
and clearer with:
for func, target, kargs in _exithandlers.iter_backwards():
. . .
del _exithandlers

* difflib.get_close_matches() uses::
result.sort() # Retain only the best n.
result = result[-n:] # Move best-scorer to head of list.
result.reverse() # Strip scores.
return [x for score, x in result]
The need for reverse iteration arises from a requirement to return
a portion of a sort in an order opposite of the sort criterion. The
list comprehension is incidental (the third step of a Schwartzian
transform). This particular use case can met with extended slicing,
but the code is somewhat unattractive and hard to visually verify::
result.sort()
return result[:-n-1:-1]

* heapq.heapify() uses "for i in xrange(n//2 - 1, -1, -1)" because
higher-level orderings are more easily formed from pairs of
lower-level orderings. A forward version of this algorithm is
possible; however, that would complicate the rest of the heap code
which iterates over the underlying list in the opposite direction.

* mhlib.test() uses::
testfolders.reverse();
for t in testfolders:
do('mh.deletefolder(%s)' % `t`)
The need for reverse iteration arises because the tail of the underlying
list is altered during iteration.

* platform._dist_try_harder() uses "for n in range(len(verfiles)-1, -1, -1)"
because the loop deletes selected elements from "verfiles" but needs to
leave the rest of the list intact for further iteration. This use
case could be more efficiently met with itertools.ifilter but the
lambda condition and functional style would render it less readable.

* random.shuffle uses "for i in xrange(len(x)-1, 0, -1)" because the
algorithm is most easily understood as randomly selecting elements
from an ever diminishing pool. In fact, the algorithm can be run in
a forward direction but is less intuitive and rarely presented that
way in literature.

* rfc822.Message.__delitem__() uses:
list.reverse()
for i in list:
del self.headers
The need for reverse iteration arises because the tail of the underlying
list is altered during iteration.
 
A

Andrew Dalke

Sean Ross:
This will mostly just be some alternate naming suggestions:

How about just ".backwards", where backwards acts like a read-only property
that returns a generator for reverse iteration?

-1 from me. 'backwards' looks to me like something which changes
the list ordering in place, like what 'reverse' does. Or something
which returns a new list but in reverse order. It's an adjective,
when it should be a verb.

ibackwords would work, but so would riter or reviter or
variations thereof.

Andrew
(e-mail address removed)
 
S

Stephen Horne

Proposal
========

Add a method called iter_backwards() to sequence objects that can benefit
from it. The above examples then simplify to::

for i in xrange(n).iter_backwards():
print seqn

for elem in seqn.iter_backwards():
print elem


That is a pretty long name. Can we use the convention from itertools
where an 'i' prefix is sufficient to suggest an iterator, giving
'ibackwards' or 'ireverse' or similar.

Second, I'm not quite ready to drop my property idea ;-)

An advantage is that the object returned by the property can sensibly
support more than just iteration - e.g. slicing (as touched on in the
PEP284 thread). So for instance...
x = range(10)
list(x.backward) [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
list(x.backward [::2])
[8, 6, 4, 2, 0]
* Should file objects be included? Implementing reverse iteration may not
be easy though it would be useful on occasion.

IMO no - doing this essentially needs the whole file to be read into
memory, in which case you may as well read the whole file into a list
and then iterate the list backwards.
* Should enumerate() be included? It would only provide reverse iteration
whenever the underlying sequence supported it.

Why not...

for i, j in enumerate (listname.iter_backwards ()) :

in other words, as enumerate can handle the already-reversed
sequence/iteration, I don't see the point.
 
S

Stephen Horne

-1 from me. 'backwards' looks to me like something which changes
the list ordering in place, like what 'reverse' does. Or something
which returns a new list but in reverse order. It's an adjective,
when it should be a verb.

I disagree with this. IMO something that modifies a value in place
should be named using a verb (such as reverse, or sort, or update...).
Something that returns a value without modifying its parameters/object
should be named to describe what it returns - normally a noun or
adjective (such as globals, len, isinstance).

So if we had a sorting function that returns a sorted version of the
parameter without modifying its parameter, I'd want it to be called
'sorted' as opposed to the current in-place 'sort'.

Of course there are non-modifying functions that are named as verbs
(map, filter and reduce being particularly obvious), but that does
seems slightly wrong to me - though for the ones I've listed there is
basically an overriding convention (they are well known names).
 
S

Stephen Horne

While I don't.

To me, the reason to use a method (or property) is simply that most
types cannot be efficiently 'backwardised'. For instance, iterators in
general would have to be buffered into a list an then the resulting
list iterated backwards. That could be useful, but the overhead is
sufficient that I think explicitly writing 'list (x).backward ()'
rather than 'x.backward ()' would be a good thing.

Having a method (or property) explicitly associates it with the
object/class being handled.
 
S

Stephen Horne

-1 from me. 'backwards' looks to me like something which changes
the list ordering in place, like what 'reverse' does. Or something
which returns a new list but in reverse order. It's an adjective,
when it should be a verb.

I disagree with this. IMO something that modifies a value in place
should be named using a verb (such as reverse, or sort, or update...).
Something that returns a value without modifying its parameters/object
should be named to describe what it returns - normally a noun or
adjective (such as globals, len, isinstance).

So if we had a sorting function that returns a sorted version of the
parameter without modifying its parameter, I'd want it to be called
'sorted' as opposed to the current in-place 'sort'.

Of course there are non-modifying functions that are named as verbs
(map, filter and reduce being particularly obvious), but that does
seems slightly wrong to me - though for the ones I've listed there is
basically an overriding convention (they are well known names).
 
S

Stephen Horne

While I don't.

To me, the reason to use a method (or property) is simply that most
types cannot be efficiently 'backwardised'. For instance, iterators in
general would have to be buffered into a list an then the resulting
list iterated backwards. That could be useful, but the overhead is
sufficient that I think explicitly writing 'list (x).backward ()'
rather than 'x.backward ()' would be a good thing.

Having a method (or property) explicitly associates it with the
object/class being handled.
 
?

=?ISO-8859-1?Q?Hannu_Kankaanp=E4=E4?=

Andrew Dalke said:
Sure. Then it'll given an exception. The implementation
should turn around and raise the proper exception there
instead of a "len doesn't exist" one.

There's also a problem that len works on a dict but
__getitem__(int) will only work if the dict stores 0->n-1
as keys.


How about using a temporary sequence if __riter__ isn't defined?
It's slow, but would work with all non-infinite iterators that
don't have strange side effects (vast majority).


def riter(obj):
try:
rit = getattr(obj, "__riter__")
except AttributeError:
# assuming list has __riter__, the following could be:
# for x in riter(list(obj)):
# yield x
seq = list(obj)
n = len(seq)
while n != 0:
n -= 1
yield seq[n]
else:
for term in rit():
yield term

# reverse iteration of file
f = file('riter.py')
try:
for line in riter(f):
print line,
finally:
f.close()

# reverse iteration of a dictionary
for x in riter({'foo':1, 'bar':2}):
print x,


I changed it to use a shorter name as suggested by Reedy, to be similar
with rfind etc. I also prefer a function instead of a method, as the function
can provide (slow) default behaviour for iterators that don't explicitely
support reverse iteration.
 
S

sebastien

PEP: 323
Title: Add Reverse Iteration Methods
Version: $Revision: 1.1 $

I don't need this often enough to want it as a method or a builtin
function.
But it would ne nice to have a function in the itertool module.

I would call it riter(), it should work on any sequence with len()
support or on any object wich define the __riter__ special method.

In practice, I mostly use reverse iteration when I most remove some
few objects on the sequence I'm itering to, so I have more neeed for a
renumerate() function:

for index, value in itertools.renumerate(sequence):
if not value:
del sequence(index)
 
R

Raymond Hettinger

[Raymond Hettinger]
[Stephen Horne]
That is a pretty long name. Can we use the convention from itertools
where an 'i' prefix is sufficient to suggest an iterator, giving
'ibackwards' or 'ireverse' or similar.

That sounds reasonable to me. I'll add that alternative to the PEP.

If the discussion on enumerate() is any indication, then the naming
discussion will be much more lively than on the idea itself. Perhaps,
Alex will once again be able to suggest a musically perfect, beautiful
Italian name.

BTW, if you think iter_backwards is long, then take a look at some
of the method names in the sets module.

Second, I'm not quite ready to drop my property idea ;-)

An advantage is that the object returned by the property can sensibly
support more than just iteration - e.g. slicing (as touched on in the
PEP284 thread). So for instance...
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

I'm not clear on the details of what you're proposing. What is
type(x.backward)?
Please show what a pure python version would look like
(taking UserList as an example).


IMO no - doing this essentially needs the whole file to be read into
memory, in which case you may as well read the whole file into a list
and then iterate the list backwards.

Oh, it could be done using seeks and whatnot;
however, the code might be grotesque.


Why not...

for i, j in enumerate (listname.iter_backwards ()) :

in other words, as enumerate can handle the already-reversed
sequence/iteration, I don't see the point.

The point is that enumerate will starting numbering from zero
for the last item:
[(0, 'c'), (1, 'b'), (2, 'a')]

If you want the indices to match their original position, then you
would need something like:
[(2, 'c'), (1, 'b'), (0, 'a')]

The implementation would need to access both mystr.iterbackwards()
and mystr.__len__().

At first glance, this is a can of worms, a pandora's box,
a gordian knot, or some unnamed code smell unsuitable
for this mixed metaphor.



Raymond Hettinger
 
J

John Roth

Stephen Horne said:
IMO no - doing this essentially needs the whole file to be read into
memory, in which case you may as well read the whole file into a list
and then iterate the list backwards.

Actually, it doesn't. It does require that the file be read into the
buffers backwards, but from there it's simply a matter of doing
a reverse scan for the line ending characters. The difficulty is that
the entire algorithm would have to be done in C, without any
significant help from the standard library.

John Roth
 
?

=?ISO-8859-1?Q?Hannu_Kankaanp=E4=E4?=

Stephen Horne said:
Why not...

for i, j in enumerate (listname.iter_backwards ()) :

in other words, as enumerate can handle the already-reversed
sequence/iteration, I don't see the point.

That's not the same:
list(enumerate([10,11,12])) [(0, 10), (1, 11), (2, 12)]
list(enumerate([10,11,12].riter())) [(0, 12), (1, 11), (2, 10)]
list(enumerate([10,11,12]).riter())
[(2, 12), (1, 11), (0, 10)]

Indices would also be reversed.
 
P

Peter Otten

Raymond said:
If you want the indices to match their original position, then you
would need something like:
[(2, 'c'), (1, 'b'), (0, 'a')]

The implementation would need to access both mystr.iterbackwards()
and mystr.__len__().

At first glance, this is a can of worms, a pandora's box,
a gordian knot, or some unnamed code smell unsuitable
for this mixed metaphor.

You can count down starting at -1 for reverse enumeration and keep that can
of pandora's knots closed :)

The only drawback I see would be that the following behaviour might not be
what you expect:
[(i, c, "uvwxyz") for i, c in reverse_enumerate("abc")] [(-1, 'c', 'z'), (-2, 'b', 'y'), (-3, 'a', 'x')]


Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top