Re: problem in implementing multiprocessing

Discussion in 'Python' started by James Mills, Jan 19, 2009.

  1. James Mills

    James Mills Guest

    On Mon, Jan 19, 2009 at 3:50 PM, gopal mishra <> wrote:
    > i know this is not an io - bound problem, i am creating heavy objects in the
    > process and add these objects in to queue and get that object in my main
    > program using queue.
    > you can test the this sample code
    > import time
    > from multiprocessing import Process, Queue
    >
    > class Data(object):
    > def __init__(self):
    > self.y = range(1, 1000000)
    >
    > def getdata(queue):
    > data = Data()
    > queue.put(data)
    >
    > if __name__=='__main__':
    > t1 = time.time()
    > d1 = Data()
    > d2 = Data()
    > t2 = time.time()
    > print "without multiProcessing total time:", t2-t1
    > #multiProcessing
    > queue = Queue()
    > Process(target= getdata, args=(queue, )).start()
    > Process(target= getdata, args=(queue, )).start()
    > s1 = queue.get()
    > s2 = queue.get()
    > t2 = time.time()
    > print "multiProcessing total time::", t2-t1


    The reason your code above doesn't work as you
    expect and the multiprocessing part takes longer
    is because your Data objects are creating a list
    (a rather large list) of ints. Use xrange instead of range.

    Here's what I get (using xrange):

    $ python test.py
    without multiProcessing total time: 1.50203704834e-05
    multiProcessing total time:: 0.116630077362

    cheers
    James
    James Mills, Jan 19, 2009
    #1
    1. Advertising

  2. James Mills

    Carl Banks Guest

    On Jan 18, 10:00 pm, "James Mills" <>
    wrote:
    > On Mon, Jan 19, 2009 at 3:50 PM, gopal mishra <> wrote:
    > > i know this is not an io - bound problem, i am creating heavy objects in the
    > > process and add these objects in to queue and get that object in my main
    > > program using queue.
    > > you can test the this sample code
    > > import time
    > > from multiprocessing import Process, Queue

    >
    > > class Data(object):
    > >    def __init__(self):
    > >        self.y = range(1, 1000000)

    >
    > > def getdata(queue):
    > >    data = Data()
    > >    queue.put(data)

    >
    > > if __name__=='__main__':
    > >    t1 = time.time()
    > >    d1 = Data()
    > >    d2 = Data()
    > >    t2 = time.time()
    > >    print "without multiProcessing total time:", t2-t1
    > >    #multiProcessing
    > >    queue = Queue()
    > >    Process(target= getdata, args=(queue, )).start()
    > >    Process(target= getdata, args=(queue, )).start()
    > >    s1 = queue.get()
    > >    s2 = queue.get()
    > >    t2 = time.time()
    > >    print "multiProcessing total time::", t2-t1

    >
    > The reason your code above doesn't work as you
    > expect and the multiprocessing part takes longer
    > is because your Data objects are creating a list
    > (a rather large list) of ints.


    I'm pretty sure gopal is creating a deliberately large object to use
    as a
    test case, so switching to xrange isn't going to help here.

    Since multiprocessing serializes and deserializes the data while
    passing
    it from process to process, passing very large objects would have a
    very
    high latency and overhead. IOW, gopal's diagnosis is correct. It's
    just not practical to share very large objects among seperate
    processes.

    For simple data like large arrays of floating point numbers, the data
    can be shared with an mmaped file or some other memory-sharing scheme,
    but actual Python objects can't be shared this way. If you have
    complex
    data (networks and heirarchies and such) it's a lot harder to share
    this
    information among processes.


    Carl Banks
    Carl Banks, Jan 19, 2009
    #2
    1. Advertising

  3. James Mills

    Aaron Brady Guest

    On Jan 19, 3:09 am, Carl Banks <> wrote:
    snip
    > Since multiprocessing serializes and deserializes the data while
    > passing
    > it from process to process, passing very large objects would have a
    > very
    > high latency and overhead.  IOW, gopal's diagnosis is correct.  It's
    > just not practical to share very large objects among seperate
    > processes.


    You could pass composite objects back and forth by passing pieces back
    and forth. You'd have to construct it so as not to need access to the
    entire data structure in any one piece; that is, only need access to
    other small pieces.

    > For simple data like large arrays of floating point numbers, the data
    > can be shared with an mmaped file or some other memory-sharing scheme,
    > but actual Python objects can't be shared this way.  If you have
    > complex
    > data (networks and heirarchies and such) it's a lot harder to share
    > this
    > information among processes.


    It wouldn't hurt to have a minimal set of Python objects that are
    'persistent live', that is, stored out of memory in their native
    form. The only problem is, they can't contain references to volatile
    objects. (I don't believe POSH addresses this.)
    Aaron Brady, Jan 19, 2009
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. gopal mishra

    problem in implementing multiprocessing

    gopal mishra, Jan 16, 2009, in forum: Python
    Replies:
    0
    Views:
    244
    gopal mishra
    Jan 16, 2009
  2. Wu Zhe
    Replies:
    2
    Views:
    455
    Piet van Oostrum
    May 27, 2009
  3. Replies:
    3
    Views:
    289
    Tennessee
    Sep 4, 2009
  4. Metalone
    Replies:
    0
    Views:
    212
    Metalone
    Jan 6, 2010
  5. Robert Kern
    Replies:
    2
    Views:
    316
    Wolodja Wentland
    Jan 12, 2010
Loading...

Share This Page