select.select and socket.setblocking

Discussion in 'Python' started by Laszlo Nagy, Dec 30, 2008.

  1. Laszlo Nagy

    Laszlo Nagy Guest

    I'm using this method to read from a socket:

    def read_data(self,size):
    """Read data from connection until a given size."""
    res = ""
    fd = self.socket.fileno()
    while not self.stop_requested.isSet():
    remaining = size - len(res)
    if remaining<=0:
    break
    # Give one second for an incoming connection so we can stop the
    # server in seconds when needed
    ready = select.select([fd], [], [], 1)
    if fd in ready[0]:
    data = self.socket.recv(min(remaining,8192)) # 8192 is
    recommended by socket.socket manual.
    if not data:
    # select returns the fd but there is no data to read
    -> connection closed!
    raise TransportError("Connection closed.")
    else:
    res += data
    else:
    pass
    if self.stop_requested.isSet():
    raise SystemExit(0)
    return res


    This works: if I close the socket on the other side, then I see this in
    the traceback:

    File "/usr/home/gandalf/Python/Projects/OrbToy/orb/endpoint.py", line
    233, in read_data
    raise TransportError("Connection closed.")
    TransportError: Connection closed.

    Also when I call stop_requested.set() then the thread stops within one
    seconds.

    Then I switch to non blocking mode, my code works exactly the same way,
    or at least I see no difference.

    I have read the socket programming howto (
    http://docs.python.org/howto/sockets.html#sockets ) but it does not
    explain how a blocking socket + select is different from a non blocking
    socket + select. Is there any difference?

    Thanks
    Laszlo Nagy, Dec 30, 2008
    #1
    1. Advertising

  2. Laszlo Nagy ha scritto:
    > I'm using this method to read from a socket:
    >
    > def read_data(self,size):
    > """Read data from connection until a given size."""
    > res = ""
    > fd = self.socket.fileno()
    > while not self.stop_requested.isSet():
    > remaining = size - len(res)
    > if remaining<=0:
    > break
    > # Give one second for an incoming connection so we can stop the
    > # server in seconds when needed
    > ready = select.select([fd], [], [], 1)
    > if fd in ready[0]:
    > data = self.socket.recv(min(remaining,8192)) # 8192 is
    > recommended by socket.socket manual.
    > if not data:
    > # select returns the fd but there is no data to read
    > -> connection closed!
    > raise TransportError("Connection closed.")
    > else:
    > res += data
    > else:
    > pass
    > if self.stop_requested.isSet():
    > raise SystemExit(0)
    > return res
    >
    >
    > This works: if I close the socket on the other side, then I see this in
    > the traceback:
    >
    > File "/usr/home/gandalf/Python/Projects/OrbToy/orb/endpoint.py", line
    > 233, in read_data
    > raise TransportError("Connection closed.")
    > TransportError: Connection closed.
    >
    > Also when I call stop_requested.set() then the thread stops within one
    > seconds.
    >
    > Then I switch to non blocking mode, my code works exactly the same way,
    > or at least I see no difference.
    >
    > I have read the socket programming howto (
    > http://docs.python.org/howto/sockets.html#sockets ) but it does not
    > explain how a blocking socket + select is different from a non blocking
    > socket + select. Is there any difference?
    >
    > Thanks
    >

    Couple of remarks:

    1. AFAIK, select in python accepts also socket objects, or anything
    which has a fileno() method returning an integer. So you don't need to
    extract the fileno from the socket (python will do for you) although it
    does no harm.

    2. IMO, the behaviour of your code is correct: with TCP protocol, when
    the remote ends disconnects, your end receives a 'read event' without
    data; you should just handle the fact that recv returns nothing as
    normal, not as error, and close your end of the connection.

    If you are interested in socket errors, you should
    also fill the third 'fd-set' in the select call, and after select
    returns check that fd is not in it anymore:

    ready = select.select( [fd],[], [fd] )
    if fd in ready[2]:
    # raise your error here

    3. AFAIK (sorry, I feel acronym-ly today ;), there is no difference in
    select between blocking and non-blocking mode. The difference is in the
    recv (again, assuming that you use TCP as protocol, that is AF_INET,
    SOCK_STREAM), which in the blocking case would wait to receive all the
    bytes that you requested, or the disconnection, in the other case would
    return immediately (and you should check the number of returned bytes,
    and when you read the remaining bytes of the message put the pieces
    together). I myself tend to avoid using non-blocking sockets, since
    blocking sockets are much easier to handle...

    HTH

    Ciao
    ------
    FB
    Francesco Bochicchio, Dec 30, 2008
    #2
    1. Advertising

  3. Grant Edwards ha scritto:
    > On 2008-12-30, Francesco Bochicchio <> wrote:
    >
    >> 3. AFAIK (sorry, I feel acronym-ly today ;), there is no difference in
    >> select between blocking and non-blocking mode. The difference is in the
    >> recv (again, assuming that you use TCP as protocol, that is AF_INET,
    >> SOCK_STREAM), which in the blocking case would wait to receive all the
    >> bytes that you requested,

    >
    > No, in blocking mode it will wait to receive _some_ data (1 or
    > more bytes). The "requested" amount is strictly an upper
    > limit: recv won't return more than the requested number of
    > bytes, but it might return less.
    >


    Uhm. In my experience, with TCP protocol recv only returned less than
    the required bytes if the remote end disconnects. I always check the
    returned value of recv and signal an error if the read bytes are less
    than the expected ones, but this error is never occurred (and its about
    20 years that I use sockets in various languages and various flavor of
    unix and occasionally on windows. Maybe have always been lucky ? :)

    And, on some unices system call recv also returns when a signal
    interrupts the syscall, but I half-remember reading that python recv in
    such a case repeat the system call by itself ... although this might be
    only my desire ...

    > In non-blocking mode, it will always return immediately, either
    > with some data, no data (other end closed), or an EAGAIN or
    > EWOULDBLOCK error (I forget which).
    >
    >> [...] I myself tend to avoid using non-blocking sockets, since
    >> blocking sockets are much easier to handle...

    >
    > That depends on whether you can tolerate blocking or not. In
    > an event-loop, blocking is generally not allowed.
    >

    What I usually do, when I cannot block is:

    - use socket in blocking mode
    - do a select with a very small timeout and do a recv only if the select
    returns with input events
    - (with TCP) do a recv for the exact amount of bytes that I expect (
    this mean having a user protocol that carries the message size in the
    header, but this is usually the case ).

    This usually worked for me.

    If my process (or thread) has only to deal with socket I/O, I make a
    blocking select, and then make an 'exact' recv on whichever socket the
    select signals.

    Ciao
    ----
    FB
    Francesco Bochicchio, Dec 31, 2008
    #3
  4. Francesco Bochicchio ha scritto:
    >>
    >> No, in blocking mode it will wait to receive _some_ data (1 or
    >> more bytes). The "requested" amount is strictly an upper
    >> limit: recv won't return more than the requested number of
    >> bytes, but it might return less.
    >>

    >
    > Uhm. In my experience, with TCP protocol recv only returned less than
    > the required bytes if the remote end disconnects. I always check the
    > returned value of recv and signal an error if the read bytes are less
    > than the expected ones, but this error is never occurred (and its about
    > 20 years that I use sockets in various languages and various flavor of
    > unix and occasionally on windows. Maybe have always been lucky ? :)
    >


    BTW, this is not a rethorical or ironic question... my applications
    mostly run on LANs or dedicated WANs so maybe they never experienced the
    kind of network congestion that could cause recv to return less than the
    expected amount of bytes ...

    but then, IIRC TCP guarantees that the packet is fully received by
    hand-shaking at transport level between sender and receiver. Ad once the
    packet is fully in the receiver buffer, why should recv choose to give
    back to the application only a piece of it?

    Ciao
    -----
    FB
    Francesco Bochicchio, Dec 31, 2008
    #4
  5. Laszlo Nagy

    Saju Pillai Guest

    On Dec 31, 2:01 pm, Francesco Bochicchio <> wrote:
    > Grant Edwards ha scritto:
    >
    > > On 2008-12-30, Francesco Bochicchio <> wrote:

    >
    > >> 3. AFAIK (sorry, I feel acronym-ly today ;), there is no difference in
    > >> select between blocking and non-blocking mode. The difference is in the
    > >> recv (again, assuming that you use TCP as protocol, that is AF_INET,
    > >> SOCK_STREAM), which in the blocking case would wait to receive all the
    > >> bytes that you requested,

    >
    > > No, in blocking mode it will wait to receive _some_ data (1 or
    > > more bytes).  The "requested" amount is strictly an upper
    > > limit: recv won't return more than the requested number of
    > > bytes, but it might return less.

    >
    > Uhm. In my experience, with TCP protocol recv only returned less than
    > the required bytes if the remote end disconnects. I always check the


    What if the sending end actually sent less than you asked for ?

    -srp

    > returned value of recv and signal an error if the read bytes are less
    > than the expected ones, but this error is never occurred (and its about
    > 20 years that I use sockets in various languages and various flavor of
    > unix and occasionally on windows. Maybe  have always been lucky ? :)
    >
    > And, on some unices  system call recv also returns when a signal
    > interrupts the syscall, but I half-remember reading that python recv in
    > such a case repeat the system call by itself ... although this might be
    > only my desire ...
    >
    > > In non-blocking mode, it will always return immediately, either
    > > with some data, no data (other end closed), or an EAGAIN or
    > > EWOULDBLOCK error (I forget which).

    >
    > >> [...] I myself tend to avoid using non-blocking sockets, since
    > >> blocking sockets are much easier to handle...

    >
    > > That depends on whether you can tolerate blocking or not.  In
    > > an event-loop, blocking is generally not allowed.

    >
    > What I usually do, when I cannot block is:
    >
    > - use socket in blocking mode
    > - do a select with a very small timeout and do a recv only if the select
    > returns with input events
    > - (with TCP) do a recv for the exact amount of bytes that I expect (
    > this mean having a user protocol that carries the message size in the
    > header, but this is usually the case ).
    >
    > This usually worked for me.
    >
    > If my process (or thread) has only to deal with socket I/O, I make a
    > blocking select, and then make an 'exact' recv on whichever socket the
    > select signals.
    >
    > Ciao
    > ----
    > FB
    Saju Pillai, Dec 31, 2008
    #5
  6. < ... >

    >> Uhm. In my experience, with TCP protocol recv only returned less than
    >> the required bytes if the remote end disconnects. I always check the

    >
    > What if the sending end actually sent less than you asked for ?
    >
    > -srp
    >


    In blocking mode and with TCP protocol, the recv waits until more bytes
    are received - mixing up the next message with the previous one and
    then loosing the 'sync' and being unable to interpretate the received
    data - or the remote end disconnects.

    Yes this is bad, and is a good reason why socket receive should be
    handled in non-blocking mode if you receive data from untrusted
    sources. But luckily for me, as I said in the other post, I used socket
    mostly to communicate between specific applications on a private LAN or
    WAN, so I could afford to ignore the problem.

    Ciao
    ----
    FB
    Francesco Bochicchio, Dec 31, 2008
    #6
  7. Laszlo Nagy

    Saju Pillai Guest

    On Dec 31, 7:48 pm, Francesco Bochicchio <> wrote:
    > < ... >
    >
    > >> Uhm. In my experience, with TCP protocol recv only returned less than
    > >> the required bytes if the remote end disconnects. I always check the

    >
    > > What if the sending end actually sent less than you asked for ?

    >
    > > -srp

    >
    > In blocking mode and with TCP protocol, the recv waits until more bytes
    > are received -  mixing up the next message with the previous one and


    Is this correct ? IIRC even in blocking mode recv() can return with
    less bytes than requested, unless the MSG_WAITALL flag is supplied.
    Blocking mode only guarantees that recv() will wait for a message if
    none is available - but not that it *will* return the number of bytes
    requested.

    -srp

    > then loosing the 'sync' and being unable to interpretate the received
    > data -  or the remote end disconnects.
    >
    > Yes this is bad,  and is a good reason why socket receive should be
    > handled   in non-blocking mode if you receive data from untrusted
    > sources. But luckily for me, as I said in the other post, I used socket
    > mostly to communicate between specific applications on a private LAN or
    > WAN, so I could afford to ignore the problem.
    >
    > Ciao
    > ----
    > FB
    Saju Pillai, Dec 31, 2008
    #7
  8. Saju Pillai ha scritto:
    > On Dec 31, 7:48 pm, Francesco Bochicchio <> wrote:
    >
    > Is this correct ? IIRC even in blocking mode recv() can return with
    > less bytes than requested, unless the MSG_WAITALL flag is supplied.
    > Blocking mode only guarantees that recv() will wait for a message if
    > none is available - but not that it *will* return the number of bytes
    > requested.
    >
    > -srp
    >


    You are right ... most of my socket experience predates MSG_WAITALL, and
    I forgot that now the default behaviour is different ... oops ...

    Ciao
    ------
    FB
    Francesco Bochicchio, Dec 31, 2008
    #8
  9. "Francesco Bochicchio" <> wrote:

    > but then, IIRC TCP guarantees that the packet is fully received by
    > hand-shaking at transport level between sender and receiver. Ad once the
    > packet is fully in the receiver buffer, why should recv choose to give
    > back to the application only a piece of it?


    This depends a lot on the definition of "package" -

    At the TCP/IP level, the protocol is quite complex - there
    are all sorts of info flowing back and forth, telling the
    transmitter how much space the receiver has available.
    So your "record" or "package" could be split up...

    But it gets worse, or better, depending on your point of view:

    At the ethernet level, a packet is less than 1.5k - so if your
    record is longer, it can also be split up - OTOH, if it all
    fits into one ethernet packet, there is every chance that
    it won't be split up, unless you send a lot of them in a row,
    without waiting for a response - if you are running something that
    sends a small request and listens for a small answer, then you
    will probably never see a record split - but if you run a kind
    of sliding window protocol that streams a lot of data (even in
    small packets) then sooner or later one of them will be partly
    delivered...

    - Hendrik
    Hendrik van Rooyen, Jan 1, 2009
    #9

  10. > Can you post an example program that exhibits the behavior you
    > describe?
    >
    >


    I was forgetting about the MSG_WAITALL flag ...
    When I started programming with sockets, it was on a platform (IIRC
    Solaris) that by default behaved like MSG_WAITALL was set by default
    (actually, I don't remember it being mentioned at all in the man pages).
    This sort of biased my understanding of the matter. I actually used that
    flag recently - on Linux - to get the same behavior I was used to, but
    forgot about that.

    My bad :)

    Ciao
    ------
    FB
    Francesco Bochicchio, Jan 1, 2009
    #10
  11. Laszlo Nagy

    Bryan Olson Guest

    Laszlo Nagy wrote:
    [...]
    > I have read the socket programming howto (
    > http://docs.python.org/howto/sockets.html#sockets ) but it does not
    > explain how a blocking socket + select is different from a non blocking
    > socket + select. Is there any difference?


    There is, but it may not effect you. There are cases where a socket can
    select() as readable, but not be readable by the time of a following
    recv() or accept() call. All such cases with which I'm familiar call for
    a non-blocking socket.

    Where does this come up? Suppose that to take advantage of multi-core
    processors, our server runs as four processes, each with a single thread
    that responds to events via select(). Clients all connect to the same
    server port, so the socket listening on that port is shared by all four
    processes. A perfectly reasonable architecture (though with many more
    processes the simple implementation suffers the "thundering herd problem").

    Two of our processors may be waiting on select() when a new connections
    comes in. The select() call returns in both processes, showing the
    socket ready for read, so both call accept() to complete the connection.
    The O.S. ensures that accept() [and recv()] are atomic, so one process
    gets the new connection; what happens in the other depends on whether we
    use a blocking or non-blocking socket, and clearly we want non-blocking.


    --
    --Bryan
    Bryan Olson, Jan 3, 2009
    #11
  12. Laszlo Nagy

    Saju Pillai Guest

    Bryan Olson <> wrote:

    >
    >Where does this come up? Suppose that to take advantage of multi-core
    >processors, our server runs as four processes, each with a single thread
    >that responds to events via select(). Clients all connect to the same
    >server port, so the socket listening on that port is shared by all four
    >processes. A perfectly reasonable architecture (though with many more
    >processes the simple implementation suffers the "thundering herd problem").



    Which is why it is common for real world servers to serialize the
    select()/accept() code - usually via a file lock or a semaphore.
    -srp
    --
    http://saju.net.in

    >
    >Two of our processors may be waiting on select() when a new connections
    >comes in. The select() call returns in both processes, showing the
    >socket ready for read, so both call accept() to complete the connection.
    > The O.S. ensures that accept() [and recv()] are atomic, so one process
    >gets the new connection; what happens in the other depends on whether we
    >use a blocking or non-blocking socket, and clearly we want non-blocking.
    >
    >
    >--
    >--Bryan
    Saju Pillai, Jan 3, 2009
    #12
  13. Laszlo Nagy

    Roy Smith Guest

    Bryan Olson <> wrote:

    > There are cases where a socket can select() as readable, but not be
    > readable by the time of a following recv() or accept() call. All such
    > cases with which I'm familiar call for a non-blocking socket.


    I used to believe that if select() said data was ready for reading, a
    subsequent read/recv/recvfrom() call could not block. It could return an
    error, but it could not block. I was confident of this until just a few
    months ago when reality blew up in my face.

    The specific incident involved a bug in the linux kernel. If you received
    an UDP packet with a checksum error, the select() would return when the
    packet arrived, *before* the checksum was checked. By the time you did the
    recv(), the packet had been discarded and the recv() would block.

    This led me on a big research quest (including some close readings of
    Stevens, which appeared to say that this couldn't happen). The more I
    read, the more I (re) discovered just how vague and poorly written the
    Berkeley Socket API docs are :)

    The bottom line is that Bryan is correct -- regardless of what the various
    man pages and textbooks say, in the real world, it is possible for a read()
    to block after select() says the descriptor is ready. The right way to
    think about select() is to treat it as a heuristic which can make a polling
    loop more efficient, but should never be relied upon to predict the future.

    Neither the negative nor positive behavior is guaranteed. There's no
    guaranteed response time; just because select() hasn't returned yet doesn't
    mean a descriptor couldn't be read without blocking in another thread right
    now. And, just because it has returned, that doesn't mean by the time you
    get around to reading, there will still be anything there.
    Roy Smith, Jan 3, 2009
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jean-Paul Calderone

    Re: select.select and socket.setblocking

    Jean-Paul Calderone, Dec 30, 2008, in forum: Python
    Replies:
    1
    Views:
    325
    Francesco Bochicchio
    Dec 31, 2008
  2. Jean-Paul Calderone

    Re: select.select and socket.setblocking

    Jean-Paul Calderone, Dec 30, 2008, in forum: Python
    Replies:
    0
    Views:
    451
    Jean-Paul Calderone
    Dec 30, 2008
  3. Laszlo Nagy
    Replies:
    1
    Views:
    4,817
    Mark Wooding
    Jan 27, 2009
  4. Jean-Paul Calderone
    Replies:
    0
    Views:
    966
    Jean-Paul Calderone
    Jan 27, 2009
  5. Laszlo Nagy
    Replies:
    0
    Views:
    545
    Laszlo Nagy
    Feb 1, 2009
Loading...

Share This Page