multiprocessing and dictionaries

B

Bjorn Meyer

I am trying to convert a piece of code that I am using the thread module with
to the multiprocessing module.

The way that I have it set up is a chunk of code reads a text file and assigns
a dictionary key multiple values from the text file. I am using locks to write
the values to the dictionary.
The way that the values are written is as follows:
mydict.setdefault(key, []).append(value)

The problem that I have run into is that using multiprocessing, the key gets
set, but the values don't get appended.
I've even tried the Manager().dict() option, but it doesn't seem to work.

Is this not supported at this time or am I missing something?

Thanks in advance.

Bjorn
 
P

Piet van Oostrum

Bjorn Meyer said:
BM> I am trying to convert a piece of code that I am using the thread module with
BM> to the multiprocessing module.
BM> The way that I have it set up is a chunk of code reads a text file and assigns
BM> a dictionary key multiple values from the text file. I am using locks to write
BM> the values to the dictionary.
BM> The way that the values are written is as follows:
BM> mydict.setdefault(key, []).append(value)
BM> The problem that I have run into is that using multiprocessing, the key gets
BM> set, but the values don't get appended.
BM> I've even tried the Manager().dict() option, but it doesn't seem to work.
BM> Is this not supported at this time or am I missing something?

I think you should give more information. Try to make a *minimal* program
that shows the problem and include it in your posting or supply a
download link.
 
B

Bjorn Meyer

Bjorn Meyer <[email protected]> (BM) wrote:

BM> I am trying to convert a piece of code that I am using the thread
module with BM> to the multiprocessing module.

BM> The way that I have it set up is a chunk of code reads a text file and
assigns BM> a dictionary key multiple values from the text file. I am
using locks to write BM> the values to the dictionary.
BM> The way that the values are written is as follows:
BM> mydict.setdefault(key, []).append(value)

BM> The problem that I have run into is that using multiprocessing, the
key gets BM> set, but the values don't get appended.
BM> I've even tried the Manager().dict() option, but it doesn't seem to
work.

BM> Is this not supported at this time or am I missing something?

I think you should give more information. Try to make a *minimal* program
that shows the problem and include it in your posting or supply a
download link.

Here is what I have been using as a test.
This pretty much mimics what I am trying to do.
I put both threading and multiprocessing in the example which shows the output
that I am looking for.

#!/usr/bin/env python

import threading
from multiprocessing import Manager, Process

name = ('test1','test2','test3')
data1 = ('dat1','dat2','dat3')
data2 = ('datA','datB','datC')

def thread_test(name,data1,data2, d):
for nam in name:
for num in range(0,3):
d.setdefault(nam, []).append(data1[num])
d.setdefault(nam, []).append(data2[num])
print 'Thread test dict:',d

def multiprocess_test(name,data1,data2, mydict):
for nam in name:
for num in range(0,3):
mydict.setdefault(nam, []).append(data1[num])
mydict.setdefault(nam, []).append(data2[num])
print 'Multiprocess test dic:',mydict

if __name__ == '__main__':
mgr = Manager()
md = mgr.dict()
d = {}

m = Process(target=multiprocess_test, args=(name,data1,data2,md))
m.start()
t = threading.Thread(target=thread_test, args=(name,data1,data2,d))
t.start()

m.join()
t.join()

print 'Thread test:',d
print 'Multiprocess test:',md


Thanks
Bjorn
 
P

Piet van Oostrum

Bjorn Meyer said:
BM> Here is what I have been using as a test.
BM> This pretty much mimics what I am trying to do.
BM> I put both threading and multiprocessing in the example which shows
BM> the output that I am looking for.
BM> #!/usr/bin/env python
BM> import threading
BM> from multiprocessing import Manager, Process
BM> name = ('test1','test2','test3')
BM> data1 = ('dat1','dat2','dat3')
BM> data2 = ('datA','datB','datC')
[snip]

BM> def multiprocess_test(name,data1,data2, mydict):
BM> for nam in name:
BM> for num in range(0,3):
BM> mydict.setdefault(nam, []).append(data1[num])
BM> mydict.setdefault(nam, []).append(data2[num])
BM> print 'Multiprocess test dic:',mydict

I guess what's happening is this:

d.setdefault(nam, []) returns a list, initially an empty list ([]). This
list gets appended to. However, this list is a local list in the
multi-process_test Process, therefore the result is not reflected in the
original list inside the manager. Therefore all your updates get lost.
You will have to do operations directly on the dictionary itself, not on
any intermediary objects. Of course with the threading the situation is
different as all operations are local.

This works:

def multiprocess_test(name,data1,data2, mydict):
print name, data1, data2
for nam in name:
for num in range(0,3):
mydict.setdefault(nam, [])
mydict[nam] += [data1[num]]
mydict[nam] += [data2[num]]
print 'Multiprocess test dic:',mydict

If you have more than one process operating on the dictionary
simultaneously you have to beware of race conditions!!
 
B

Bjorn Meyer

Bjorn Meyer <[email protected]> (BM) wrote:

BM> Here is what I have been using as a test.
BM> This pretty much mimics what I am trying to do.
BM> I put both threading and multiprocessing in the example which shows
BM> the output that I am looking for.

BM> #!/usr/bin/env python

BM> import threading
BM> from multiprocessing import Manager, Process

BM> name = ('test1','test2','test3')
BM> data1 = ('dat1','dat2','dat3')
BM> data2 = ('datA','datB','datC')
[snip]

BM> def multiprocess_test(name,data1,data2, mydict):
BM> for nam in name:
BM> for num in range(0,3):
BM> mydict.setdefault(nam, []).append(data1[num])
BM> mydict.setdefault(nam, []).append(data2[num])
BM> print 'Multiprocess test dic:',mydict

I guess what's happening is this:

d.setdefault(nam, []) returns a list, initially an empty list ([]). This
list gets appended to. However, this list is a local list in the
multi-process_test Process, therefore the result is not reflected in the
original list inside the manager. Therefore all your updates get lost.
You will have to do operations directly on the dictionary itself, not on
any intermediary objects. Of course with the threading the situation is
different as all operations are local.

This works:

def multiprocess_test(name,data1,data2, mydict):
print name, data1, data2
for nam in name:
for num in range(0,3):
mydict.setdefault(nam, [])
mydict[nam] += [data1[num]]
mydict[nam] += [data2[num]]
print 'Multiprocess test dic:',mydict

If you have more than one process operating on the dictionary
simultaneously you have to beware of race conditions!!

Excellent. That works perfectly.

Thank you for your response Piet.

Bjorn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top