inverse of izip

S

Steven Bethard

So I know that zip(*) is the inverse of zip(), e.g.:
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]

What's the inverse of izip? Of course, I could use zip(*) or izip(*),
e.g.:
zip(*itertools.izip(range(10), range(10))) [(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]
x, y = itertools.izip(*itertools.izip(range(10), range(10)))
x, y
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

But then I get a pair of tuples, not a pair of iterators. Basically,
I want to convert an iterator of tuples into a tuple of iterators.

Steve
 
S

Satchidanand Haridas

Steven said:
So I know that zip(*) is the inverse of zip(), e.g.:


[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]

What's the inverse of izip? Of course, I could use zip(*) or izip(*),
e.g.:


[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]

((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

But then I get a pair of tuples, not a pair of iterators. Basically,
I want to convert an iterator of tuples into a tuple of iterators.

Steve

---------------------------------
Traceback (most recent call last):
File "<stdin>", line 1, in ?
StopIteration

-----------------------------


Regards,
Satchit


----
Satchidanand Haridas (sharidas at zeomega dot com)

ZeOmega (www.zeomega.com)
Open Minds' Open Solutions

#20,Rajalakshmi Plaza,
South End Road,
Basavanagudi,
Bangalore-560 004, India
 
S

Steven Bethard

I'm assuming you popped this one off without actually reading my
email. No worries - it happens some times. You'll note however, that
this is exactly what I said didn't work:

Steven said:
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

But then I get a pair of tuples, not a pair of iterators. Basically,
I want to convert an iterator of tuples into a tuple of iterators.

I want the elements returned by the itertools.izip object to be
iterators, not tuples or lists.

Steve
 
S

Steven Bethard

Steven Bethard said:
What's the inverse of izip? Of course, I could use zip(*) or izip(*),
e.g.:
zip(*itertools.izip(range(10), range(10))) [(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]
x, y = itertools.izip(*itertools.izip(range(10), range(10)))
x, y
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

But then I get a pair of tuples, not a pair of iterators. Basically,
I want to convert an iterator of tuples into a tuple of iterators.

Sorry to respond to myself, but after playing around with itertools for a
while, this seems to work:
import itertools
starzip = lambda iterables: ((tuple for tuple in itr) for i, itr in enumerate(itertools.tee(iterables)))
starzip(itertools.izip(range(10), range(10)))
x, y = starzip(itertools.izip(range(10), range(10)))
x
list(x) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(y)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Seems like a bit of work for the inverse of izip though so I'll wait to see if
anyone else has a better solution. (Not to mention, it wouldn't be a single
line solution if I wasn't using 2.4...)

Steve
 
S

Satchidanand Haridas

Hi,

How about using iter() to get another solution like the following:
>>> starzip2 = lambda it: tuple([iter(x) for x in itertools.izip(*it)])
>>> l,m = starzip2(itertools.izip(range(10),range(10)))
>>> l
>>> list(l) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(m)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Thanks,

Satchit

----
Satchidanand Haridas (sharidas at zeomega dot com)

ZeOmega (www.zeomega.com)
Open Minds' Open Solutions

#20,Rajalakshmi Plaza,
South End Road,
Basavanagudi,
Bangalore-560 004, India



Steven said:
What's the inverse of izip? Of course, I could use zip(*) or izip(*),
e.g.:


zip(*itertools.izip(range(10), range(10)))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]

x, y = itertools.izip(*itertools.izip(range(10), range(10)))
x, y
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

But then I get a pair of tuples, not a pair of iterators. Basically,
I want to convert an iterator of tuples into a tuple of iterators.

Sorry to respond to myself, but after playing around with itertools for a
while, this seems to work:


import itertools
starzip = lambda iterables: ((tuple for tuple in itr) for i, itr in
enumerate(itertools.tee(iterables)))

starzip(itertools.izip(range(10), range(10)))


x, y = starzip(itertools.izip(range(10), range(10)))
x


y


list(x)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Seems like a bit of work for the inverse of izip though so I'll wait to see if
anyone else has a better solution. (Not to mention, it wouldn't be a single
line solution if I wasn't using 2.4...)

Steve
 
S

Steven Bethard

Satchidanand Haridas said:
How about using iter() to get another solution like the following:
starzip2 = lambda it: tuple([iter(x) for x in itertools.izip(*it)])
l,m = starzip2(itertools.izip(range(10),range(10)))
l
list(l) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(m)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Unfortunately, I think this exhausts the iterators too early because it
applies * to the iterator:
.... for i in range(10):
.... yield i
.... print "exhausted"
....exhausted

I believe we only get one "exhausted" because as soon as one iterator is used
up with izip, the next iterator is discarded. But we are hitting "exhausted"
before we ever ask for an element from the starzip2 iterators, so it looks to
me like all the pairs from the first iterator are read into memory before the
second iterators are ever accessed...

Steve
 
P

Peter Otten

Steven said:
Steven Bethard said:
What's the inverse of izip? Of course, I could use zip(*) or izip(*),
e.g.:
zip(*itertools.izip(range(10), range(10)))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]
x, y = itertools.izip(*itertools.izip(range(10), range(10)))
x, y
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

But then I get a pair of tuples, not a pair of iterators. Basically,
I want to convert an iterator of tuples into a tuple of iterators.

Sorry to respond to myself, but after playing around with itertools for a
while, this seems to work:
import itertools
starzip = lambda iterables: ((tuple for tuple in itr) for i, itr in enumerate(itertools.tee(iterables)))
starzip(itertools.izip(range(10), range(10)))
x, y = starzip(itertools.izip(range(10), range(10)))
x
list(x) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(y)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Seems like a bit of work for the inverse of izip though so I'll wait to
see if
anyone else has a better solution. (Not to mention, it wouldn't be a
single line solution if I wasn't using 2.4...)


Because Python supports function definitions you only have to do it once :)

However, your sample data is badly chosen. Unless I have made a typo
repeating your demo, you are getting the same (last) sequence twice due to
late binding of i.
.... return ((t for t in itr) for (i, itr) in
enumerate(it.tee(iterables)))
....
map(list, starzip(it.izip("123", "abc"))) [['1', '2', '3'], ['a', 'b', 'c']]
x, y = starzip(it.izip("123", "abc"))
list(x) ['a', 'b', 'c']
list(y) ['a', 'b', 'c']

Here's my fix.

# requires Python 2.4
def cut(itr, index):
# avoid late binding of index
return (item[index] for item in itr)

def starzip(tuples):
a, b = it.tee(tuples)
try:
tuple_len = len(a.next())
except StopIteration:
raise ValueError(
"starzip() does not allow an empty sequence as argument")
t = it.tee(b, tuple_len)
return (cut(itr, index) for (index, itr) in enumerate(t))

a, b, c = starzip(it.izip("abc", [1,2,3], "xyz"))
print a, b, c
assert list(a) == list("abc")
assert list(b) == [1, 2, 3]
assert list(c) == list("xyz")

Peter
 
S

Satchidanand Haridas

Steven said:
How about using iter() to get another solution like the following:
starzip2 = lambda it: tuple([iter(x) for x in itertools.izip(*it)])
l,m = starzip2(itertools.izip(range(10),range(10)))
list(l) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list(m)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


Unfortunately, I think this exhausts the iterators too early because it
applies * to the iterator:
Could you expand on what you mean by exhaust the iterators too early?

The reason I ask is that the * operator is applied to
((1,1),(2,2),....(9,9)). The operation of the
itertools.izip(range10(),range10()) is completed before the * operation
is applied. And the iter() simply converts the result of the inverse
izip operation into an iterator. I hope the above was not too
confusing. :)

I am trying to understand a little about the izip myself. Thanks.

Satchit

... for i in range(10):
... yield i
... print "exhausted"
...


exhausted

I believe we only get one "exhausted" because as soon as one iterator is used
up with izip, the next iterator is discarded. But we are hitting "exhausted"
before we ever ask for an element from the starzip2 iterators, so it looks to
me like all the pairs from the first iterator are read into memory before the
second iterators are ever accessed...

Steve

Could you expand on what you mean by exhaust the iterators too early?

The reason I ask is that the * operator is applied to the tuple
((1,1),(2,2),...(9,9)). Actually to the iterator which is called 10
times, each time returning (i,i) for 0<=0<10. When the iterator is
called the 11th time, it prints "exhausted".

So the operation of the itertools.izip(range10(),range10()) is completed
and "exhausted" is printed before the * operation is applied. The iter()
simply converts the result of the inverse izip operation which into an
iterator. I hope the above was not too confusing. :)

I am trying to understand what goes on inside izip myself. Thanks.

Regards,
Satchit
 
S

Steven Bethard

Peter Otten said:
However, your sample data is badly chosen. Unless I have made a typo
repeating your demo, you are getting the same (last) sequence twice due to
late binding of i.
[snip]
map(list, starzip(it.izip("123", "abc"))) [['1', '2', '3'], ['a', 'b', 'c']]
x, y = starzip(it.izip("123", "abc"))
list(x) ['a', 'b', 'c']
list(y) ['a', 'b', 'c']

I knew there was something funny about binding in generators, but I
couldn't remember what... Could you explain why 'map(list, ...)'
works, but 'x, y = ...' doesn't? I read the PEP, but I'm still not
clear on this point.

Thanks,

Steve
 
S

Steven Bethard

Satchidanand Haridas said:
Could you expand on what you mean by exhaust the iterators too early?

The reason I ask is that the * operator is applied to
((1,1),(2,2),....(9,9)). The operation of the
itertools.izip(range10(),range10()) is completed before the * operation
is applied. And the iter() simply converts the result of the inverse
izip operation into an iterator. I hope the above was not too
confusing. :)

Yeah, the difference is a little subtle here. What we have before you
use the * operator is an iterator that will yield (1,1) then (2,2) up
to (9,9). Note that we don't actually have the tuple
((1,1),(2,2),....(9,9)) yet, just an iterator that will produce the
same elements. If your list is very large and you don't want to keep
it all in memory at once, it's crucial that we have the iterator here,
not the tuple.

When you use the * operator, Python converts the iterable following
the * into the argument list of the function. This means that if
you're using an iterable, it reads all of the elements of the iterable
into memory at once. That's why my range10 iterators printed
"exhausted" after the * application -- all their elements had been
read into memory. Again, if your list is very large, this is a bad
thing because you now have all the elements of the list in memory at
the same time. My other solution (well, Peter Otten's correction of
my solution) never has the whole list in memory at the same time --
each time enumerate generates a tuple and it's index, each of the
iterators returned by starzip generates their appropriate items.[*]

Steve

[*] Of course, if you exhaust one of the iterators before the others,
itertools.tee's implicit cache will actually store all the elements,
so starzip would really only be efficient if you wanted to iterate
through the sub-iterators in lockstep. This means you'd probably want
to itertools.izip them back together at some point, but being able to
starzip them means you can wrap the individual iterators with extra
functionality if necessary.
 
P

Peter Otten

Steven said:
Peter Otten said:
However, your sample data is badly chosen. Unless I have made a typo
repeating your demo, you are getting the same (last) sequence twice due
to late binding of i.
[snip]
map(list, starzip(it.izip("123", "abc")))
[['1', '2', '3'], ['a', 'b', 'c']]
x, y = starzip(it.izip("123", "abc"))
list(x) ['a', 'b', 'c']
list(y) ['a', 'b', 'c']

I knew there was something funny about binding in generators, but I
couldn't remember what... Could you explain why 'map(list, ...)'
works, but 'x, y = ...' doesn't? I read the PEP, but I'm still not
clear on this point.

Maybe the following example can illustrate what I think is going on:

import itertools as it

def starzip(iterables):
return ((t for t in itr)
for (i, itr) in
enumerate(it.tee(iterables)))

# the order of calls equivalent to map(list, starzip(...))
s = starzip(it.izip("abc", "123"))
x = s.next()
# the local variable i in starzip() shared by x and y
# is now 0
print x.next(),
print x.next(),
print x.next()
y = s.next()
# i is now 1, but because no further calls to x.next()
# will occur it doesn't matter
print y.next(),
print y.next(),
print y.next()

s = starzip(it.izip("abc", "123"))
x = s.next() # i is 0
y = s.next() # i is 1
# both x an y yield t[1]
print x.next(),
print x.next(),
print x.next()

print y.next(),
print y.next(),
print y.next()

You can model the nested generator expressions' behaviour with the following
function - which I think is much clearer.

def starzip(iterables):
def inner(itr):
for t in itr:
yield t

for (i, itr) in enumerate(it.tee(iterables)):
yield inner(itr)

Note how itr is passed explicitly, i. e. it is not affected by later
rebindings in startzip() whereas i is looked up in inner()'s surrounding
namespace at every yield.

Peter
 
S

Steven Bethard

Peter Otten said:
You can model the nested generator expressions' behaviour with the following
function - which I think is much clearer.

def starzip(iterables):
def inner(itr):
for t in itr:
yield t

for (i, itr) in enumerate(it.tee(iterables)):
yield inner(itr)

Note how itr is passed explicitly, i. e. it is not affected by later
rebindings in startzip() whereas i is looked up in inner()'s surrounding
namespace at every yield.


Thanks, that was really helpful! It also clarifies why your solution works
right; your code basically does:

def starzip(iterables):
def inner(itr, i):
for t in itr:
yield t

for i, itr in enumerate(itertools.tee(iterables)):
yield inner(itr, i)

where i is now passed explicitly too.

Thanks again,

Steve
 
S

Satchidanand Haridas

Hi Steve,

Thanks for the explanation. I understand izip a little better now.

Regards,
Satchit



Steven said:
Could you expand on what you mean by exhaust the iterators too early?

The reason I ask is that the * operator is applied to
((1,1),(2,2),....(9,9)). The operation of the
itertools.izip(range10(),range10()) is completed before the * operation
is applied. And the iter() simply converts the result of the inverse
izip operation into an iterator. I hope the above was not too
confusing. :)

Yeah, the difference is a little subtle here. What we have before you
use the * operator is an iterator that will yield (1,1) then (2,2) up
to (9,9). Note that we don't actually have the tuple
((1,1),(2,2),....(9,9)) yet, just an iterator that will produce the
same elements. If your list is very large and you don't want to keep
it all in memory at once, it's crucial that we have the iterator here,
not the tuple.

When you use the * operator, Python converts the iterable following
the * into the argument list of the function. This means that if
you're using an iterable, it reads all of the elements of the iterable
into memory at once. That's why my range10 iterators printed
"exhausted" after the * application -- all their elements had been
read into memory. Again, if your list is very large, this is a bad
thing because you now have all the elements of the list in memory at
the same time. My other solution (well, Peter Otten's correction of
my solution) never has the whole list in memory at the same time --
each time enumerate generates a tuple and it's index, each of the
iterators returned by starzip generates their appropriate items.[*]

Steve

[*] Of course, if you exhaust one of the iterators before the others,
itertools.tee's implicit cache will actually store all the elements,
so starzip would really only be efficient if you wanted to iterate
through the sub-iterators in lockstep. This means you'd probably want
to itertools.izip them back together at some point, but being able to
starzip them means you can wrap the individual iterators with extra
functionality if necessary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top