a sequence question

C

Chris Wright

Hi,

1) I want to iterate over a list "N at a time"
sort of like:

# Two at a time... won't work, obviously
.... print a,b
....
Traceback (most recent call last):


Is there a nifty way to do with with list comprehensions,
or do I just have to loop over the list ?

cheers and thanks

chris wright
 
R

Roy Smith

Chris Wright said:
Hi,

1) I want to iterate over a list "N at a time"

You could do it with slicing and zip:
l = [1, 2, 3, 4, 5, 6, 7, 8]
zip (l[::2], l[1::2])
[(1, 2), (3, 4), (5, 6), (7, 8)]

To my eyes, that's a bit cryptic, but it works and it's certainly
compact. I don't use either zip() or extended slicing a lot; perhaps if
I used them more often, the above would be more obvious to me if I read
it in somebody else's code.

The interesting thing would be generalizing this to the "N at a time"
case. I think this works:

def nzip (list0, n):
args = []
for i in range(n):
slice = list0[i::n]
args.append (slice)
return zip (*args)

l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
print nzip (l, 3)

Roy-Smiths-Computer:play$ ./nzip.py
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)]

but I haven't given any thought to what happens if the length of the
list isn't a multiple of n (exercise for the reader). It's also
annoying that the above generates a bunch of temporary lists. It would
be cool if there was a way to have the intermediates be generator
expressions, but I'm not that good with that stuff, so I'll leave that
as an exercise for other readers :)
 
F

F. Petitjean

Le Fri, 28 Jan 2005 13:59:45 GMT, Chris Wright a écrit :
Hi,

1) I want to iterate over a list "N at a time"


Is there a nifty way to do with with list comprehensions,
or do I just have to loop over the list ?

cheers and thanks
seq = xrange(1, 9) # an iterable [1, 2, ... 8]
N = 2
it = (iter(seq,)*N # a tuple containing N times the *same* iterator on
seq
print zip(*it) # the list you are after
from itertools import izip
help(izip)
it = (iter(seq),)*2
for tup in izip(*it):
print tup
 
D

Duncan Booth

Chris said:
1) I want to iterate over a list "N at a time"
sort of like:

# Two at a time... won't work, obviously
for a, b in [1,2,3,4]:
... print a,b
...

Try this:

l = [1, 2, 3, 4]
for a, b in zip(*[iter(l)]*2):
print a, b

zip(*[iter(seq)]*N) will group by N (but if there are any odd items at the
end it will ignore them).

map(None, *[iter(seq)]*N) will group by N padding the last item with None
if it needs to.
 
M

Michael Hartl

For problems like this I use a partition function defined in a utils.py
file that I use (based on Peter Norvig's utils file at
http://aima.cs.berkeley.edu/python/utils.py). Using partition, the
problem you posed can be solved by writing

#for a, b in partition([1, 2, 3, 4], 2):
# print a, b

The implementation of partition I use is simple-minded; the previous
posts in this thread suggest some more sophisticated ways to attack it
using generators.

#def partition(seq, partsize):
# """Partition a sequence into subsequences of length partsize."""
# ls = len(seq)
# assert ls % partsize == 0, ('length %s, partition size %s\n'
# % (ls, partsize))
# return [seq[i:(i+partsize)] for i in range(0, ls, partsize)]
Michael
 
N

Nick Coghlan

Duncan said:
Try this:

l = [1, 2, 3, 4]
for a, b in zip(*[iter(l)]*2):
print a, b

zip(*[iter(seq)]*N) will group by N (but if there are any odd items at the
end it will ignore them).

map(None, *[iter(seq)]*N) will group by N padding the last item with None
if it needs to.

For anyone else who was as bemused as I was that Duncan's and F. Petitjean's
suggestions actually *work*, this was what I had to do to figure out *why* they
work:

Py> l = [1, 2, 3, 4]
Py> itr = iter(l)
Py> zip(itr) # Put all items from iterator in position 1
[(1,), (2,), (3,), (4,)]
Py> itr = iter(l)
Py> zip(itr, itr) # Put every second item in position 2
[(1, 2), (3, 4)]

Using zip(*[iter(l)]*N) or zip(*(iter(l),)*N) simply extends the above to the
general case.

I'd definitely recommend hiding this trick inside a function. Perhaps something
like (using Michael's function name):

from itertools import izip, repeat, chain

def partition(seq, part_len):
return izip(*((iter(seq),) * part_len))

def padded_partition(seq, part_len, pad_val=None):
itr = iter(seq)
if (len(seq) % part_len != 0):
padding = repeat(pad_val, part_len)
itr = chain(itr, padding)
return izip(*((itr,) * part_len))

Py> list(partition(range(10), 2))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
Py> list(partition(range(10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
Py> list(padded_partition(range(10), 2))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
Py> list(padded_partition(range(10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
Py> list(padded_partition(range(10), 3, False))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, False, False)]
Py> zip(*padded_partition(range(10), 3))
[(0, 3, 6, 9), (1, 4, 7, None), (2, 5, 8, None)]

Not sure how useful that last example is, but I thought it was cute :)

Cheers,
Nick.
 
S

Steven Bethard

Nick said:
I'd definitely recommend hiding this trick inside a function. Perhaps
something like (using Michael's function name):

from itertools import izip, repeat, chain

def partition(seq, part_len):
return izip(*((iter(seq),) * part_len))

def padded_partition(seq, part_len, pad_val=None):
itr = iter(seq)
if (len(seq) % part_len != 0):
padding = repeat(pad_val, part_len)
itr = chain(itr, padding)
return izip(*((itr,) * part_len))

I think you can write that second one so that it works for iterables
without a __len__:

py> def padded_partition(iterable, part_len, pad_val=None):
.... itr = itertools.chain(
.... iter(iterable), itertools.repeat(pad_val, part_len - 1))
.... return itertools.izip(*[itr]*part_len)
....
py> list(padded_partition(itertools.islice(itertools.count(), 10), 2))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
py> list(padded_partition(itertools.islice(itertools.count(), 10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]

I just unconditionally pad the iterable with 1 less than the partition
size... I think that works right, but I haven't tested it any more than
what's shown.

Steve
 
T

todddeluca

Chris said:
Hi,

1) I want to iterate over a list "N at a time"
sort of like:

# Two at a time... won't work, obviously
for a, b in [1,2,3,4]:
... print a,b
...
Traceback (most recent call last):


Is there a nifty way to do with with list comprehensions,
or do I just have to loop over the list ?

cheers and thanks

chris wright

I wouldn't call this nifty, but it does use list comprehensions:
(n-(len(l)%n))%n is the amount of padding
(len(l)+(n-(len(l)%n))%n)/n is the number of groups (calculated by
adding the padding to the length of l and then dividing by n)
l = range(10)
n = 3
[(l+[None]*((n-(len(l)%n))%n))[i*n:(i+1)*n] for i in
xrange((len(l)+(n-(len(l)%n))%n)/n)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, None, None]]

Regards,
Todd
 
N

Nick Coghlan

Steven said:
I think you can write that second one so that it works for iterables
without a __len__:

py> def padded_partition(iterable, part_len, pad_val=None):
... itr = itertools.chain(
... iter(iterable), itertools.repeat(pad_val, part_len - 1))
... return itertools.izip(*[itr]*part_len)
...
py> list(padded_partition(itertools.islice(itertools.count(), 10), 2))
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]
py> list(padded_partition(itertools.islice(itertools.count(), 10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]

I just unconditionally pad the iterable with 1 less than the partition
size... I think that works right, but I haven't tested it any more than
what's shown.

I think you're right - I was looking at padding unconditionally, but because I
was padding with the actual partition length, it didn't work correctly when the
padding wasn't needed.

Padding with one less than the partition length fixes that quite neatly.

Cheers,
Nick.
 
N

Nick Coghlan

David said:
Using zip(*[iter(l)]*N) or zip(*(iter(l),)*N) simply extends the above to
the

general case.


Clearly true.
But can you please go into much more detail for a newbie?
I see that [iter(l)]*N produces an N element list with each element being
the same iterator object, but after that
http://www.python.org/doc/2.3.5/lib/built-in-funcs.html
just didn't get me there.

See if the following interactive examples clear things up at all:

# The unclear version
Py> itr = iter(range(10))
Py> zipped = zip(*(itr,)*3) # How does this bit work?
Py> print "\n".join(map(str, zipped))
(0, 1, 2)
(3, 4, 5)
(6, 7, 8)

# Manual zip, printing as we go
Py> itr = iter(range(10))
Py> try:
.... while 1: print (itr.next(), itr.next(), itr.next())
.... except StopIteration:
.... pass
....
(0, 1, 2)
(3, 4, 5)
(6, 7, 8)

# Manual zip, actually behaving somewhat like the real thing
Py> itr = iter(range(10))
Py> zipped = []
Py> try:
.... while 1: zipped.append((itr.next(), itr.next(), itr.next()))
.... except StopIteration:
.... pass
....
Py> print "\n".join(map(str, zipped))
(0, 1, 2)
(3, 4, 5)
(6, 7, 8)

Cheers,
Nick.
 
D

David Isaac

Alan said:
I see that [iter(l)]*N produces an N element list with each element being
the same iterator object, but after that
http://www.python.org/doc/2.3.5/lib/built-in-funcs.html
just didn't get me there.

Nick Coghlan said:
Py> itr = iter(range(10))
Py> zipped = zip(*(itr,)*3) # How does this bit work?
# Manual zip, actually behaving somewhat like the real thing
Py> itr = iter(range(10))
Py> zipped = []
Py> try:
... while 1: zipped.append((itr.next(), itr.next(), itr.next()))
... except StopIteration:
... pass


http://www.python.org/doc/2.3.5/lib/built-in-funcs.html says:

"This function returns a list of tuples,
where the i-th tuple contains the i-th element from each of the argument
sequences."

So an "argument sequence" can in fact be any iterable,
and these in turn are asked *in rotation* for their yield, right?
So we pass the (identical) iterables in a tuple or list,
thereby allowing a variable number of arguments.
We unpack the argument list with '*',
which means we have provided three iterables as arguments.
And then zip works as "expected",
once we have learned to expect zip to "rotate" through the arguments.
Is that about right?

If that is right, I still cannot extract it from the doc cited above.
So where should I have looked?

Thanks,
Alan Isaac
 
N

Nick Coghlan

David said:
If that is right, I still cannot extract it from the doc cited above.
So where should I have looked?

Ouch. The terminology's evolved, and it looks to me like the docs for the older
builtins haven't been updated to track it.

The terminology has pretty much settled to 'iterable' for anything which returns
a sensible result from iter(obj), 'iterator' for any iterable which returns
itself from iter(obj), 'reiterable' for any iterable which is not an iterator,
and 'sequence' for any reiterable which supports len(obj) and integer indexing.

That's not the terminology the older docs use, though, even in the most recent
versions of that page [1].

For most of them it's OK, since the text clarifies what the term means in
context (e.g. that 'sequence' actually means 'iterable' for some function
signatures). zip() doesn't do that though - it actually accepts iterables, but
only talks about sequences.

A bug report on Sourceforge would help in getting the problem fixed for the 2.5
docs (possibly even the 2.4.1 docs if it happens soon). 2.3's a lost cause
though, since 2.3.5 is already out the door and only another security bug is
likely to trigger a new 2.3 release.

For the 'left-to-right' evaluation thing, that's technically an implementation
artifact of the CPython implementation, since the zip() docs don't make any
promises. So updating the docs to include that information would probably be a
bigger issue, as it involves behaviour which is currently not defined by the
library.

Cheers,
Nick.

[1] http://www.python.org/dev/doc/devel/lib/built-in-funcs.html
 
D

David Isaac

Nick Coghlan said:
A bug report on Sourceforge would help in getting the problem fixed for the 2.5
docs
Done.


For the 'left-to-right' evaluation thing, that's technically an implementation
artifact of the CPython implementation, since the zip() docs don't make any
promises. So updating the docs to include that information would probably be a
bigger issue, as it involves behaviour which is currently not defined by the
library.

OK, thanks.

Alan Isaac
 
N

Nick Coghlan

David said:
the 2.5



Done.

Bug 1121416, for anyone else interested. Looks Raymond agrees with me about the
left-to-right evaluation of iterables being an overspecification.

Anyway, that means the zip and izip based solutions are technically version and
implementation specific. Fortunately, the final versions are fairly easy to turn
into a custom generator that doesn't rely on izip:

from itertools import islice, chain, repeat

def partition(iterable, part_len):
itr = iter(iterable)
while 1:
item = tuple(islice(itr, part_len))
if len(item) < part_len:
raise StopIteration
yield item

def padded_partition(iterable, part_len, pad_val=None):
padding = repeat(pad_val, part_len-1)
itr = chain(iter(iterable), padding)
return partition(itr, part_len)

Py> list(partition(range(10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
Py> list(padded_partition(range(10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
Py> list(padded_partition(range(10), 3, True))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, True, True)]

Well spotted on the fact that the way we were using zip/izip was undocumented,
btw :)

Cheers,
Nick.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top