interrupted system call w/ Queue.get

Discussion in 'Python' started by Philip Winston, Feb 18, 2011.

  1. We have a multiprocess Python program that uses Queue to communicate
    between processes. Recently we've seen some errors while blocked
    waiting on Queue.get:

    IOError: [Errno 4] Interrupted system call

    What causes the exception? Is it necessary to catch this exception
    and manually retry the Queue operation? Thanks.

    We have some Python 2.5 and 2.6 machines that have run this program
    for many 1,000 hours with no errors. But we have one 2.5 machine and
    one 2.7 machine that seem to get the error very often.
     
    Philip Winston, Feb 18, 2011
    #1
    1. Advertising

  2. Philip Winston

    James Mills Guest

    On Fri, Feb 18, 2011 at 11:46 AM, Philip Winston <> wrote:
    > We have a multiprocess Python program that uses Queue to communicate
    > between processes.  Recently we've seen some errors while blocked
    > waiting on Queue.get:
    >
    > IOError: [Errno 4] Interrupted system call
    >
    > What causes the exception?  Is it necessary to catch this exception
    > and manually retry the Queue operation?  Thanks.


    Are you getting this when your application is shutdown ?

    I'm pretty sure you can safely ignore this exception and
    continue.

    cheers
    James

    --
    -- James Mills
    --
    -- "Problems are solved by method"
     
    James Mills, Feb 18, 2011
    #2
    1. Advertising

  3. Philip Winston

    Roy Smith Guest

    In article
    <>,
    Philip Winston <> wrote:

    > We have a multiprocess Python program that uses Queue to communicate
    > between processes. Recently we've seen some errors while blocked
    > waiting on Queue.get:
    >
    > IOError: [Errno 4] Interrupted system call
    >
    > What causes the exception?


    Unix divides system calls up into "slow" and "fast". The difference is
    how the react to signals.

    Fast calls are things which are expected to return quickly. A canonical
    example would get getuid(), which just returns a number it looks up in a
    kernel data structure. Fast syscalls cannot be interrupted by signals.
    If a signal arrives while a fast syscall is running, delivery of the
    signal is delayed until after the call returns.

    Slow calls are things which may take an indeterminate amount of time to
    return. An example would be a read on a network socket; it will block
    until a message arrives, which may be forever. Slow syscalls get
    interrupted by signals. If a signal arrives while a slow syscall is
    blocking, the call returns EINTR. This lets your code have a chance to
    do whatever is appropriate, which might be clean up in preparation for
    process shutdown, or maybe just ignore the interrupt and re-issue the
    system call.

    Here's a short python program which shows how this works (tested on
    MacOS-10.6, but should be portable to just about any posix box):

    -----------------------------------------------------
    #!/usr/bin/env python

    import socket
    import signal
    import os

    def handler(sig_num, stack_frame):
    return

    print "my pid is", os.getpid()
    signal.signal(signal.SIGUSR1, handler)
    s = socket.socket(type=socket.SOCK_DGRAM)
    s.bind(("127.0.0.1", 0))
    s.recv(1024)
    -----------------------------------------------------

    Run this in one window. It should print out its process number, then
    block on the recv() call. In another window, send it a SIGUSR1. You
    should get something like:

    play$ ./intr.py
    my pid is 6969
    Traceback (most recent call last):
    File "./intr.py", line 14, in <module>
    s.recv(1024)
    socket.error: [Errno 4] Interrupted system call

    > Is it necessary to catch this exception
    > and manually retry the Queue operation? Thanks.


    That's a deeper question which I can't answer. My guess is the
    interrupted system call is the Queue trying to acquire a lock, but
    there's no predicting what the signal is. I'm tempted to say that it's
    a bug in Queue that it doesn't catch this exception internally, but
    people who know more about the Queue implementation than I do should
    chime in.

    > We have some Python 2.5 and 2.6 machines that have run this program
    > for many 1,000 hours with no errors. But we have one 2.5 machine and
    > one 2.7 machine that seem to get the error very often.


    Yup, that's the nature of signal delivery race conditions in
    multithreaded programs. Every machine will behave a little bit
    differently, with no rhyme or reason. Google "undefined behavior" for
    more details :) The whole posix signal delivery mechanism dates back
    to the earliest Unix implementations, long before there were threads or
    networks. At this point, it's got many layers of duct tape.
     
    Roy Smith, Feb 18, 2011
    #3
  4. On Feb 17, 8:46 pm, Philip Winston <> wrote:
    > We have a multiprocess Python program that uses Queue to communicate
    > between processes.  Recently we've seen some errors while blocked
    > waiting on Queue.get:
    >
    > IOError: [Errno 4] Interrupted system call
    >
    > What causes the exception?  Is it necessary to catch this exception
    > and manually retry the Queue operation?  Thanks.
    >


    The exception is caused by a syscall returning EINTR. A syscall will
    return EINTR when a signal arrives and interrupts whatever that
    syscall
    was trying to do. Typically a signal won't interrupt the syscall
    unless you've installed a signal handler for that signal. However,
    you can avoid the interruption by using `signal.siginterrupt` to
    disable interruption on that signal after you've installed the
    handler.

    As for the other questions - I don't know, it depends how and why it
    happens, and whether it prevents your application from working
    properly.

    Jean-Paul
     
    Jean-Paul Calderone, Feb 18, 2011
    #4
  5. On Feb 18, 10:23 am, Jean-Paul Calderone
    <> wrote:
    > The exception is caused by a syscall returning EINTR.  A syscall will
    > return EINTR when a signal arrives and interrupts whatever that
    > syscall
    > was trying to do.  Typically a signal won't interrupt the syscall
    > unless you've installed a signal handler for that signal.  However,
    > you can avoid the interruption by using `signal.siginterrupt` to
    > disable interruption on that signal after you've installed the
    > handler.
    >
    > As for the other questions - I don't know, it depends how and why it
    > happens, and whether it prevents your application from working
    > properly.


    We did not try "signal.siginterrupt" because we were not installing
    any signals, perhaps some library code is doing it without us knowing
    about it. Plus I still don't know what signal was causing the
    problem.

    Instead based on Dan Stromberg's reply (http://code.activestate.com/
    lists/python-list/595310/) I wrote a drop-in replacement for Queue
    called RetryQueue which fixes the problem for us:

    from multiprocessing.queues import Queue
    import errno

    def retry_on_eintr(function, *args, **kw):
    while True:
    try:
    return function(*args, **kw)
    except IOError, e:
    if e.errno == errno.EINTR:
    continue
    else:
    raise

    class RetryQueue(Queue):
    """Queue which will retry if interrupted with EINTR."""
    def get(self, block=True, timeout=None):
    return retry_on_eintr(Queue.get, self, block, timeout)

    As to whether this is a bug or just our own malignant signal-related
    settings I'm not sure. Certainly it's not desirable to have your
    blocking waits interrupted. I did see several EINTR issues in Python
    but none obviously about Queue exactly:
    http://bugs.python.org/issue1068268
    http://bugs.python.org/issue1628205
    http://bugs.python.org/issue10956

    -Philip
     
    Philip Winston, Mar 22, 2011
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jakub Moscicki
    Replies:
    2
    Views:
    551
    Jakub Moscicki
    Oct 4, 2003
  2. Russell Warren

    Is Queue.Queue.queue.clear() thread-safe?

    Russell Warren, Jun 22, 2006, in forum: Python
    Replies:
    4
    Views:
    689
    Russell Warren
    Jun 27, 2006
  3. Marco
    Replies:
    0
    Views:
    744
    Marco
    Feb 6, 2007
  4. Marco
    Replies:
    7
    Views:
    4,529
    Gabriel Genellina
    Feb 17, 2007
  5. Kris
    Replies:
    0
    Views:
    491
Loading...

Share This Page