Is this a bug in multiprocessing or in my script?

Discussion in 'Python' started by erikcw, Aug 5, 2009.

  1. erikcw

    erikcw Guest

    Hi,

    I'm trying to get multiprocessing to working consistently with my
    script. I keep getting random tracebacks with no helpful
    information. Sometimes it works, sometimes it doesn't.

    Traceback (most recent call last):
    File "scraper.py", line 144, in <module>
    print pool.map(scrape, range(10))
    File "/usr/lib/python2.6/multiprocessing/pool.py", line 148, in map
    return self.map_async(func, iterable, chunksize).get()
    File "/usr/lib/python2.6/multiprocessing/pool.py", line 422, in get
    raise self._value
    TypeError: expected string or buffer

    It's not always the same traceback, but they are always short like
    this. I'm running Python 2.6.2 on Ubuntu 9.04.

    Any idea how I can debug this?

    Thanks!
    Erik
     
    erikcw, Aug 5, 2009
    #1
    1. Advertising

  2. erikcw

    sturlamolden Guest

    On Aug 5, 4:37 am, erikcw <> wrote:

    > It's not always the same traceback, but they are always short like
    > this. I'm running Python 2.6.2 on Ubuntu 9.04.
    >
    > Any idea how I can debug this?


    In my experience, multiprocessing is fragile. Scripts tend fo fail for
    no obvious reason, case processes to be orphaned and linger, system-
    wide resource leaks, etc. For example, multiprocessing uses os._exit
    to stop a spawned process, even though it inevitably results in
    resource leaks on Linux (it should use sys.exit). Gaël Varoquaux and I
    noticed this when we implemented shared memory ndarrays for numpy; we
    consistently got memory leaks with System V IPC for no obvious reason.
    Even after Jesse Noller was informed of the problem (about half a year
    ago), the bug still lingers. It is easy edit multiprocessing's
    forking.py file on you own, but bugs like this is a pain in the ass,
    and I suspect multiprocessing has many of them. Of course unless you
    show us you whole script, identifying the source of your bug will be
    impossible. But it may very likely be in multiprocessing as well. The
    quality of this module is not impressing. I am beginning to think that
    multiprocessing should never have made it into the Python standard
    library. The GIL cannot be that bad! If you can't stand the GIL, get a
    Unix (or Mac, Linux, Cygwin) and use os.fork. Or simply switch to a
    non-GIL Python: IronPython or Jython.

    Allow me to show you something better. With os.fork we can write code
    like this:

    class parallel(object):

    def __enter__(self):
    # call os.fork

    def __exit__(self, exc_type, exc_value, traceback):
    # call sys.exit in the child processes and
    # os.waitpid in the parent

    def __call__(self, iterable):
    # return different sub-subsequences depending on
    # child or parent status


    with parallel() as p:
    # parallel block starts here

    for item in p(iterable):
    # whatever

    # parallel block ends here

    This makes parallel code a lot cleaner than anything you can do with
    multiprocessing, allowing you to use constructs similar to OpenMP.
    Further, if you make 'parallel' a dummy context manager, you can
    develop and test the algorithms serially. The only drawback is that
    you have to use Cygwin to get os.fork on Windows, and forking will be
    less efficient (no copy-on-write optimization). Well, this is just one
    example of why Windows sucks from the perspective of the programmer.
    But it also shows that you can do much better by not using
    multiprocessing at all.

    The only case I can think of where multiprocessing would be usesful,
    is I/O bound code on Windows. But here you will almost always resort
    to C extension modules. For I/O bound code, Python tends to give you a
    200x speed penalty over C. If you are resorting to C anyway, you can
    just use OpenMP in C for your parallel processing. We can thus forget
    about multiprocessing here as well, given that we have access to the C
    code. If we don't, it is still very likely that the C code releases
    the GIL, and we can get away with using Python threads instead of
    multiprocessing.

    IMHO, if you are using multiprocessing, you are very likely to have a
    design problem.

    Regards,
    Sturla
     
    sturlamolden, Aug 5, 2009
    #2
    1. Advertising

  3. erikcw

    Jesse Noller Guest

    On Aug 5, 1:21 am, sturlamolden <> wrote:
    > On Aug 5, 4:37 am, erikcw <> wrote:
    >
    > > It's not always the same traceback, but they are always short like
    > > this.  I'm running Python 2.6.2 on Ubuntu 9.04.

    >
    > > Any idea how I can debug this?

    >
    > In my experience,multiprocessingis fragile. Scripts tend fo fail for
    > no obvious reason, case processes to be orphaned and linger, system-
    > wide resource leaks, etc. For example,multiprocessinguses os._exit
    > to stop a spawned process, even though it inevitably results in
    > resource leaks on Linux (it should use sys.exit). Gaël Varoquaux and I
    > noticed this when we implemented shared memory ndarrays for numpy; we
    > consistently got memory leaks with System V IPC for no obvious reason.
    > Even after Jesse Noller was informed of the problem (about half a year
    > ago), the bug still lingers. It is easy editmultiprocessing's
    > forking.py file on you own, but bugs like this is a pain in the ass,
    > and I suspectmultiprocessinghas many of them. Of course unless you
    > show us you whole script, identifying the source of your bug will be
    > impossible. But it may very likely be inmultiprocessingas well. The
    > quality of this module is not impressing. I am beginning to think thatmultiprocessingshould never have made it into the Python standard
    > library. The GIL cannot be that bad! If you can't stand the GIL, get a
    > Unix (or Mac, Linux, Cygwin) and use os.fork. Or simply switch to a
    > non-GIL Python: IronPython or Jython.
    >
    > Allow me to show you something better. With os.fork we can write code
    > like this:
    >
    > class parallel(object):
    >
    >    def __enter__(self):
    >        # call os.fork
    >
    >    def __exit__(self, exc_type, exc_value, traceback):
    >        # call sys.exit in the child processes and
    >        # os.waitpid in the parent
    >
    >    def __call__(self, iterable):
    >        # return different sub-subsequences depending on
    >        # child or parent status
    >
    > with parallel() as p:
    >     # parallel block starts here
    >
    >     for item in p(iterable):
    >         # whatever
    >
    >     # parallel block ends here
    >
    > This makes parallel code a lot cleaner than anything you can do withmultiprocessing, allowing you to use constructs similar to OpenMP.
    > Further, if you make 'parallel' a dummy context manager, you can
    > develop and test the algorithms serially. The only drawback is that
    > you have to use Cygwin to get os.fork on Windows, and forking will be
    > less efficient (no copy-on-write optimization). Well, this is just one
    > example of why Windows sucks from the perspective of the programmer.
    > But it also shows that you can do much better by notusingmultiprocessingat all.
    >
    > The only case I can think of wheremultiprocessingwould be usesful,
    > is I/O bound code on Windows. But here you will almost always resort
    > to C extension modules. For I/O bound code, Python tends to give you a
    > 200x speed penalty over C. If you are resorting to C anyway, you can
    > just use OpenMP in C for your parallel processing. We can thus forget
    > aboutmultiprocessinghere as well, given that we have access to the C
    > code. If we don't, it is still very likely that the C code releases
    > the GIL, and we can get away withusingPython threads instead ofmultiprocessing.
    >
    > IMHO, if you areusingmultiprocessing, you are very likely to have a
    > design problem.
    >
    > Regards,
    > Sturla


    Sturla;

    That bug was fixed unless I'm missing something. Also, patches and
    continued bug reports are welcome.

    jesse
     
    Jesse Noller, Aug 5, 2009
    #3
  4. erikcw

    sturlamolden Guest

    sturlamolden, Aug 5, 2009
    #4
  5. erikcw

    sturlamolden Guest

  6. erikcw

    ryles Guest

    On Aug 4, 10:37 pm, erikcw <> wrote:
    > Traceback (most recent call last):
    >   File "scraper.py", line 144, in <module>
    >     print pool.map(scrape, range(10))
    >   File "/usr/lib/python2.6/multiprocessing/pool.py", line 148, in map
    >     return self.map_async(func, iterable, chunksize).get()
    >   File "/usr/lib/python2.6/multiprocessing/pool.py", line 422, in get
    >     raise self._value
    > TypeError: expected string or buffer


    This is almost certainly due to your scrape call raising an exception.
    In the parent process, multiprocessing will detect if one of its
    workers have terminated with an exception and then re-raise it.
    However, only the exception and not the original traceback is made
    available, which is making debugging more difficult for you. Here's a
    simple example which demonstrates this behavior:

    *** from multiprocessing import Pool
    *** def evil_on_8(x):
    .... if x == 8: raise ValueError("I DONT LIKE THE NUMBER 8")
    .... return x + 1
    ....
    *** pool = Pool(processes=4)
    >>> pool.map(evil_on_8, range(5))

    [1, 2, 3, 4, 5]
    *** pool.map(evil_on_8, range(10)) # 8 will cause evilness.
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/bb/real/3ps/lib/python2.6/multiprocessing/pool.py", line 148,
    in map
    return self.map_async(func, iterable, chunksize).get()
    File "/bb/real/3ps/lib/python2.6/multiprocessing/pool.py", line 422,
    in get
    raise self._value
    ValueError: I DONT LIKE THE NUMBER 8
    ***

    My recommendation is that you wrap your scrape code inside a try/
    except and log any exception. I usually do this with logging.exception
    (), or if logging is not in use, the traceback module. After that you
    can simply re-raise it.
     
    ryles, Aug 5, 2009
    #6
  7. >>>>> sturlamolden <> (s) wrote:

    >s> On 5 Aug, 15:40, Jesse Noller <> wrote:
    >>> Sturla;
    >>>
    >>> That bug was fixed unless I'm missing something.


    >s> It is still in SVN. Change every call to os._exit to sys.exit
    >s> please. :)


    Calling os.exit in a child process may be dangerous. It can cause
    unflushed buffers to be flushed twice: once in the parent and once in
    the child.
    --
    Piet van Oostrum <>
    URL: http://pietvanoostrum.com [PGP 8DAE142BE17999C4]
    Private email:
     
    Piet van Oostrum, Aug 5, 2009
    #7
  8. erikcw

    Jesse Noller Guest

    On Aug 5, 3:40 pm, sturlamolden <> wrote:
    > On 5 Aug, 21:36, sturlamolden <> wrote:
    >
    > >http://svn.python.org/view/python/branches/release26-maint/Lib/multip...

    >
    > >http://svn.python.org/view/python/branches/release31-maint/Lib/multip...

    >
    > http://svn.python.org/view/python/trunk/Lib/multiprocessing/forking.p...


    Since the bug was never filed in the tracker (it was sent to my
    personal mail box, and I dropped it - sorry), I've filed a new one:

    http://bugs.python.org/issue6653

    In the future please use the bug tracker to file and track bugs with,
    so things are not as lossy.

    jesse
     
    Jesse Noller, Aug 5, 2009
    #8
  9. erikcw

    sturlamolden Guest

    On 5 Aug, 22:07, Piet van Oostrum <> wrote:

    > Calling os.exit in a child process may be dangerous. It can cause
    > unflushed buffers to be flushed twice: once in the parent and once in
    > the child.


    I assume you mean sys.exit. If this is the case, multiprocessing needs
    a mechanism to chose between os._exit and sys.exit for child
    processes. Calling os._exit might also be dangerous because it could
    prevent necessary clean-up code from executing (e.g. in C
    extensions). I had a case where shared memory on Linux (System V IPC)
    leaked due to os._exit. The deallocator for my extension type never
    got to execute in child processes. The deallocator was needed to
    release the shared segment when its reference count dropped to 0.
    Changing to sys.exit solved the problem. On Windows there was no leak,
    because the kernel did the reference counting.
     
    sturlamolden, Aug 5, 2009
    #9
  10. erikcw

    sturlamolden Guest

    On 5 Aug, 22:28, Jesse Noller <> wrote:

    > http://bugs.python.org/issue6653
    >
    > In the future please use the bug tracker to file and track bugs with,
    > so things are not as lossy.


    Ok, sorry :)

    Also see Piet's comment here. He has a valid case against sys.exit in
    some cases. Thus it appears that both ways of shutting down child
    processes might be dangerous: If we don't want buffers to flush we
    have to use os._exit. If we want clean-up code to execute we have to
    use sys.exit. If we want both we are screwed. :(
     
    sturlamolden, Aug 5, 2009
    #10
  11. erikcw

    Jesse Noller Guest

    On Aug 5, 4:41 pm, sturlamolden <> wrote:
    > On 5 Aug, 22:28, Jesse Noller <> wrote:
    >
    > >http://bugs.python.org/issue6653

    >
    > > In the future please use the bug tracker to file and track bugs with,
    > > so things are not as lossy.

    >
    > Ok, sorry :)
    >
    > Also see Piet's comment here. He has a valid case against sys.exit in
    > some cases. Thus it appears that both ways of shutting down child
    > processes might be dangerous: If we don't want buffers to flush we
    > have to use os._exit. If we want clean-up code to execute we have to
    > use sys.exit. If we want both we are screwed. :(


    Comments around this bug should go in the bug report - again, so we
    don't loose them. I do not personally subscribe to this group , so
    it's very easy to miss things.

    jesse
     
    Jesse Noller, Aug 5, 2009
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dmitrey
    Replies:
    1
    Views:
    347
    Terry Reedy
    Mar 14, 2009
  2. Michael Riedel
    Replies:
    2
    Views:
    1,401
    Michael
    Aug 27, 2009
  3. Felix
    Replies:
    1
    Views:
    891
    Robert Kern
    Oct 8, 2009
  4. Frank Millman

    Minor bug in multiprocessing?

    Frank Millman, Dec 19, 2009, in forum: Python
    Replies:
    2
    Views:
    270
    Frank Millman
    Jan 6, 2010
  5. Jerrad Genson
    Replies:
    0
    Views:
    488
    Jerrad Genson
    Nov 4, 2010
Loading...

Share This Page