Pool Module: iterator does not yield consistently with differentchunksizes

S

syockit

I've been playing around with custom iterators to map into Pool. When
I run the code below:

def arif(arr):
return arr

def permutate(n):
k = 0
a = list(range(6))
while k<n:
for i in range(6):
a.insert(0, a.pop(5)+6)
#yield a[:] <-- produces correct results
yield a
k += 1
return

def main():
from multiprocessing import Pool
pool = Pool()
chksize = 15
for x in pool.imap_unordered(arif, permutate(100), chksize):
print(x)

if __name__=="__main__":
main()

..... will output something like this:


[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[144, 145, 146, 147, 148, 149]

.... where results are duplicated number of times equal to chunk size,
and the results between the gap are lost. Using a[:] instead, i get:

[6, 7, 8, 9, 10, 11]
[12, 13, 14, 15, 16, 17]
[18, 19, 20, 21, 22, 23]
[24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35]
[36, 37, 38, 39, 40, 41]
[42, 43, 44, 45, 46, 47]
[48, 49, 50, 51, 52, 53]

..... it comes out okay. Any explanation for such behavior?

Ahmad Syukri
 
P

Peter Otten

syockit said:
I've been playing around with custom iterators to map into Pool. When
I run the code below:

def arif(arr):
return arr

def permutate(n):
k = 0
a = list(range(6))
while k<n:
for i in range(6):
a.insert(0, a.pop(5)+6)
#yield a[:] <-- produces correct results
yield a
k += 1
return

def main():
from multiprocessing import Pool
pool = Pool()
chksize = 15
for x in pool.imap_unordered(arif, permutate(100), chksize):
print(x)

if __name__=="__main__":
main()

.... will output something like this:


[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[144, 145, 146, 147, 148, 149]

... where results are duplicated number of times equal to chunk size,
and the results between the gap are lost. Using a[:] instead, i get:

[6, 7, 8, 9, 10, 11]
[12, 13, 14, 15, 16, 17]
[18, 19, 20, 21, 22, 23]
[24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35]
[36, 37, 38, 39, 40, 41]
[42, 43, 44, 45, 46, 47]
[48, 49, 50, 51, 52, 53]

.... it comes out okay. Any explanation for such behavior?

Ahmad Syukri

Python passes references araound, not copies. Consider

it = permutate(100)
chunksize = 15
from itertools import islice
while True:
chunk = tuple(islice(it, chunksize))
if not chunk:
break
# dispatch items in chunk
print chunk

chunksize items are calculated before they are dispatched. When you yield
the same list every time in permutate() previous items in the chunk will see
any changes you make on the list with the intention to update it to the next
value.

Peter
 
D

Dave Angel

syockit said:
I've been playing around with custom iterators to map into Pool. When
I run the code below:

def arif(arr):
return arr

def permutate(n):
k = 0
a = list(range(6))
while k<n:
for i in range(6):
a.insert(0, a.pop(5)+6)
#yield a[:] <-- produces correct results
yield a
k += 1
return

def main():
from multiprocessing import Pool
pool = Pool()
chksize = 15
for x in pool.imap_unordered(arif, permutate(100), chksize):
print(x)

if __name__=="__main__":
main()

.... will output something like this:


[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[36, 37, 38, 39, 40, 41]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[72, 73, 74, 75, 76, 77]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[108, 109, 110, 111, 112, 113]
[144, 145, 146, 147, 148, 149]

... where results are duplicated number of times equal to chunk size,
and the results between the gap are lost. Using a[:] instead, i get:

[6, 7, 8, 9, 10, 11]
[12, 13, 14, 15, 16, 17]
[18, 19, 20, 21, 22, 23]
[24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35]
[36, 37, 38, 39, 40, 41]
[42, 43, 44, 45, 46, 47]
[48, 49, 50, 51, 52, 53]

.... it comes out okay. Any explanation for such behavior?

Ahmad Syukri
While I didn't actually try to follow all your code, I suspect your
problem is that when you yield the same object multiple times, they're
being all saved, and then when evaluated, they all have the final
value. If the values are really independent, somebody has to copy the
list, and the [:] does that.

DaveA
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top