interrupted system call w/ Queue.get

Discussion in 'Python' started by Philip Winston, Feb 18, 2011.

  1. We have a multiprocess Python program that uses Queue to communicate
    between processes. Recently we've seen some errors while blocked
    waiting on Queue.get:

    IOError: [Errno 4] Interrupted system call

    What causes the exception? Is it necessary to catch this exception
    and manually retry the Queue operation? Thanks.

    We have some Python 2.5 and 2.6 machines that have run this program
    for many 1,000 hours with no errors. But we have one 2.5 machine and
    one 2.7 machine that seem to get the error very often.
    Philip Winston, Feb 18, 2011
    1. Advertisements

  2. Philip Winston

    James Mills Guest

    Are you getting this when your application is shutdown ?

    I'm pretty sure you can safely ignore this exception and

    James Mills, Feb 18, 2011
    1. Advertisements

  3. Philip Winston

    Roy Smith Guest

    Unix divides system calls up into "slow" and "fast". The difference is
    how the react to signals.

    Fast calls are things which are expected to return quickly. A canonical
    example would get getuid(), which just returns a number it looks up in a
    kernel data structure. Fast syscalls cannot be interrupted by signals.
    If a signal arrives while a fast syscall is running, delivery of the
    signal is delayed until after the call returns.

    Slow calls are things which may take an indeterminate amount of time to
    return. An example would be a read on a network socket; it will block
    until a message arrives, which may be forever. Slow syscalls get
    interrupted by signals. If a signal arrives while a slow syscall is
    blocking, the call returns EINTR. This lets your code have a chance to
    do whatever is appropriate, which might be clean up in preparation for
    process shutdown, or maybe just ignore the interrupt and re-issue the
    system call.

    Here's a short python program which shows how this works (tested on
    MacOS-10.6, but should be portable to just about any posix box):

    #!/usr/bin/env python

    import socket
    import signal
    import os

    def handler(sig_num, stack_frame):

    print "my pid is", os.getpid()
    signal.signal(signal.SIGUSR1, handler)
    s = socket.socket(type=socket.SOCK_DGRAM)
    s.bind(("", 0))

    Run this in one window. It should print out its process number, then
    block on the recv() call. In another window, send it a SIGUSR1. You
    should get something like:

    play$ ./
    my pid is 6969
    Traceback (most recent call last):
    File "./", line 14, in <module>
    socket.error: [Errno 4] Interrupted system call
    That's a deeper question which I can't answer. My guess is the
    interrupted system call is the Queue trying to acquire a lock, but
    there's no predicting what the signal is. I'm tempted to say that it's
    a bug in Queue that it doesn't catch this exception internally, but
    people who know more about the Queue implementation than I do should
    chime in.
    Yup, that's the nature of signal delivery race conditions in
    multithreaded programs. Every machine will behave a little bit
    differently, with no rhyme or reason. Google "undefined behavior" for
    more details :) The whole posix signal delivery mechanism dates back
    to the earliest Unix implementations, long before there were threads or
    networks. At this point, it's got many layers of duct tape.
    Roy Smith, Feb 18, 2011
  4. The exception is caused by a syscall returning EINTR. A syscall will
    return EINTR when a signal arrives and interrupts whatever that
    was trying to do. Typically a signal won't interrupt the syscall
    unless you've installed a signal handler for that signal. However,
    you can avoid the interruption by using `signal.siginterrupt` to
    disable interruption on that signal after you've installed the

    As for the other questions - I don't know, it depends how and why it
    happens, and whether it prevents your application from working

    Jean-Paul Calderone, Feb 18, 2011
  5. We did not try "signal.siginterrupt" because we were not installing
    any signals, perhaps some library code is doing it without us knowing
    about it. Plus I still don't know what signal was causing the

    Instead based on Dan Stromberg's reply (
    lists/python-list/595310/) I wrote a drop-in replacement for Queue
    called RetryQueue which fixes the problem for us:

    from multiprocessing.queues import Queue
    import errno

    def retry_on_eintr(function, *args, **kw):
    while True:
    return function(*args, **kw)
    except IOError, e:
    if e.errno == errno.EINTR:

    class RetryQueue(Queue):
    """Queue which will retry if interrupted with EINTR."""
    def get(self, block=True, timeout=None):
    return retry_on_eintr(Queue.get, self, block, timeout)

    As to whether this is a bug or just our own malignant signal-related
    settings I'm not sure. Certainly it's not desirable to have your
    blocking waits interrupted. I did see several EINTR issues in Python
    but none obviously about Queue exactly:

    Philip Winston, Mar 22, 2011
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.