select.select and socket.setblocking

L

Laszlo Nagy

I'm using this method to read from a socket:

def read_data(self,size):
"""Read data from connection until a given size."""
res = ""
fd = self.socket.fileno()
while not self.stop_requested.isSet():
remaining = size - len(res)
if remaining<=0:
break
# Give one second for an incoming connection so we can stop the
# server in seconds when needed
ready = select.select([fd], [], [], 1)
if fd in ready[0]:
data = self.socket.recv(min(remaining,8192)) # 8192 is
recommended by socket.socket manual.
if not data:
# select returns the fd but there is no data to read
-> connection closed!
raise TransportError("Connection closed.")
else:
res += data
else:
pass
if self.stop_requested.isSet():
raise SystemExit(0)
return res


This works: if I close the socket on the other side, then I see this in
the traceback:

File "/usr/home/gandalf/Python/Projects/OrbToy/orb/endpoint.py", line
233, in read_data
raise TransportError("Connection closed.")
TransportError: Connection closed.

Also when I call stop_requested.set() then the thread stops within one
seconds.

Then I switch to non blocking mode, my code works exactly the same way,
or at least I see no difference.

I have read the socket programming howto (
http://docs.python.org/howto/sockets.html#sockets ) but it does not
explain how a blocking socket + select is different from a non blocking
socket + select. Is there any difference?

Thanks
 
F

Francesco Bochicchio

Laszlo Nagy ha scritto:
I'm using this method to read from a socket:

def read_data(self,size):
"""Read data from connection until a given size."""
res = ""
fd = self.socket.fileno()
while not self.stop_requested.isSet():
remaining = size - len(res)
if remaining<=0:
break
# Give one second for an incoming connection so we can stop the
# server in seconds when needed
ready = select.select([fd], [], [], 1)
if fd in ready[0]:
data = self.socket.recv(min(remaining,8192)) # 8192 is
recommended by socket.socket manual.
if not data:
# select returns the fd but there is no data to read
-> connection closed!
raise TransportError("Connection closed.")
else:
res += data
else:
pass
if self.stop_requested.isSet():
raise SystemExit(0)
return res


This works: if I close the socket on the other side, then I see this in
the traceback:

File "/usr/home/gandalf/Python/Projects/OrbToy/orb/endpoint.py", line
233, in read_data
raise TransportError("Connection closed.")
TransportError: Connection closed.

Also when I call stop_requested.set() then the thread stops within one
seconds.

Then I switch to non blocking mode, my code works exactly the same way,
or at least I see no difference.

I have read the socket programming howto (
http://docs.python.org/howto/sockets.html#sockets ) but it does not
explain how a blocking socket + select is different from a non blocking
socket + select. Is there any difference?

Thanks
Couple of remarks:

1. AFAIK, select in python accepts also socket objects, or anything
which has a fileno() method returning an integer. So you don't need to
extract the fileno from the socket (python will do for you) although it
does no harm.

2. IMO, the behaviour of your code is correct: with TCP protocol, when
the remote ends disconnects, your end receives a 'read event' without
data; you should just handle the fact that recv returns nothing as
normal, not as error, and close your end of the connection.

If you are interested in socket errors, you should
also fill the third 'fd-set' in the select call, and after select
returns check that fd is not in it anymore:

ready = select.select( [fd],[], [fd] )
if fd in ready[2]:
# raise your error here

3. AFAIK (sorry, I feel acronym-ly today ;), there is no difference in
select between blocking and non-blocking mode. The difference is in the
recv (again, assuming that you use TCP as protocol, that is AF_INET,
SOCK_STREAM), which in the blocking case would wait to receive all the
bytes that you requested, or the disconnection, in the other case would
return immediately (and you should check the number of returned bytes,
and when you read the remaining bytes of the message put the pieces
together). I myself tend to avoid using non-blocking sockets, since
blocking sockets are much easier to handle...

HTH

Ciao
 
F

Francesco Bochicchio

Grant Edwards ha scritto:
No, in blocking mode it will wait to receive _some_ data (1 or
more bytes). The "requested" amount is strictly an upper
limit: recv won't return more than the requested number of
bytes, but it might return less.

Uhm. In my experience, with TCP protocol recv only returned less than
the required bytes if the remote end disconnects. I always check the
returned value of recv and signal an error if the read bytes are less
than the expected ones, but this error is never occurred (and its about
20 years that I use sockets in various languages and various flavor of
unix and occasionally on windows. Maybe have always been lucky ? :)

And, on some unices system call recv also returns when a signal
interrupts the syscall, but I half-remember reading that python recv in
such a case repeat the system call by itself ... although this might be
only my desire ...
In non-blocking mode, it will always return immediately, either
with some data, no data (other end closed), or an EAGAIN or
EWOULDBLOCK error (I forget which).
[...] I myself tend to avoid using non-blocking sockets, since
blocking sockets are much easier to handle...

That depends on whether you can tolerate blocking or not. In
an event-loop, blocking is generally not allowed.
What I usually do, when I cannot block is:

- use socket in blocking mode
- do a select with a very small timeout and do a recv only if the select
returns with input events
- (with TCP) do a recv for the exact amount of bytes that I expect (
this mean having a user protocol that carries the message size in the
header, but this is usually the case ).

This usually worked for me.

If my process (or thread) has only to deal with socket I/O, I make a
blocking select, and then make an 'exact' recv on whichever socket the
select signals.

Ciao
 
F

Francesco Bochicchio

Francesco Bochicchio ha scritto:
Uhm. In my experience, with TCP protocol recv only returned less than
the required bytes if the remote end disconnects. I always check the
returned value of recv and signal an error if the read bytes are less
than the expected ones, but this error is never occurred (and its about
20 years that I use sockets in various languages and various flavor of
unix and occasionally on windows. Maybe have always been lucky ? :)

BTW, this is not a rethorical or ironic question... my applications
mostly run on LANs or dedicated WANs so maybe they never experienced the
kind of network congestion that could cause recv to return less than the
expected amount of bytes ...

but then, IIRC TCP guarantees that the packet is fully received by
hand-shaking at transport level between sender and receiver. Ad once the
packet is fully in the receiver buffer, why should recv choose to give
back to the application only a piece of it?

Ciao
 
S

Saju Pillai

Grant Edwards ha scritto:



Uhm. In my experience, with TCP protocol recv only returned less than
the required bytes if the remote end disconnects. I always check the

What if the sending end actually sent less than you asked for ?

-srp
returned value of recv and signal an error if the read bytes are less
than the expected ones, but this error is never occurred (and its about
20 years that I use sockets in various languages and various flavor of
unix and occasionally on windows. Maybe  have always been lucky ? :)

And, on some unices  system call recv also returns when a signal
interrupts the syscall, but I half-remember reading that python recv in
such a case repeat the system call by itself ... although this might be
only my desire ...
In non-blocking mode, it will always return immediately, either
with some data, no data (other end closed), or an EAGAIN or
EWOULDBLOCK error (I forget which).
[...] I myself tend to avoid using non-blocking sockets, since
blocking sockets are much easier to handle...
That depends on whether you can tolerate blocking or not.  In
an event-loop, blocking is generally not allowed.

What I usually do, when I cannot block is:

- use socket in blocking mode
- do a select with a very small timeout and do a recv only if the select
returns with input events
- (with TCP) do a recv for the exact amount of bytes that I expect (
this mean having a user protocol that carries the message size in the
header, but this is usually the case ).

This usually worked for me.

If my process (or thread) has only to deal with socket I/O, I make a
blocking select, and then make an 'exact' recv on whichever socket the
select signals.

Ciao
 
F

Francesco Bochicchio

What if the sending end actually sent less than you asked for ?

-srp

In blocking mode and with TCP protocol, the recv waits until more bytes
are received - mixing up the next message with the previous one and
then loosing the 'sync' and being unable to interpretate the received
data - or the remote end disconnects.

Yes this is bad, and is a good reason why socket receive should be
handled in non-blocking mode if you receive data from untrusted
sources. But luckily for me, as I said in the other post, I used socket
mostly to communicate between specific applications on a private LAN or
WAN, so I could afford to ignore the problem.

Ciao
 
S

Saju Pillai

In blocking mode and with TCP protocol, the recv waits until more bytes
are received -  mixing up the next message with the previous one and

Is this correct ? IIRC even in blocking mode recv() can return with
less bytes than requested, unless the MSG_WAITALL flag is supplied.
Blocking mode only guarantees that recv() will wait for a message if
none is available - but not that it *will* return the number of bytes
requested.

-srp
 
F

Francesco Bochicchio

Saju Pillai ha scritto:
Is this correct ? IIRC even in blocking mode recv() can return with
less bytes than requested, unless the MSG_WAITALL flag is supplied.
Blocking mode only guarantees that recv() will wait for a message if
none is available - but not that it *will* return the number of bytes
requested.

-srp

You are right ... most of my socket experience predates MSG_WAITALL, and
I forgot that now the default behaviour is different ... oops ...

Ciao
 
H

Hendrik van Rooyen

Francesco Bochicchio said:
but then, IIRC TCP guarantees that the packet is fully received by
hand-shaking at transport level between sender and receiver. Ad once the
packet is fully in the receiver buffer, why should recv choose to give
back to the application only a piece of it?

This depends a lot on the definition of "package" -

At the TCP/IP level, the protocol is quite complex - there
are all sorts of info flowing back and forth, telling the
transmitter how much space the receiver has available.
So your "record" or "package" could be split up...

But it gets worse, or better, depending on your point of view:

At the ethernet level, a packet is less than 1.5k - so if your
record is longer, it can also be split up - OTOH, if it all
fits into one ethernet packet, there is every chance that
it won't be split up, unless you send a lot of them in a row,
without waiting for a response - if you are running something that
sends a small request and listens for a small answer, then you
will probably never see a record split - but if you run a kind
of sliding window protocol that streams a lot of data (even in
small packets) then sooner or later one of them will be partly
delivered...

- Hendrik
 
F

Francesco Bochicchio

Can you post an example program that exhibits the behavior you
describe?

I was forgetting about the MSG_WAITALL flag ...
When I started programming with sockets, it was on a platform (IIRC
Solaris) that by default behaved like MSG_WAITALL was set by default
(actually, I don't remember it being mentioned at all in the man pages).
This sort of biased my understanding of the matter. I actually used that
flag recently - on Linux - to get the same behavior I was used to, but
forgot about that.

My bad :)

Ciao
 
B

Bryan Olson

Laszlo Nagy wrote:
[...]
I have read the socket programming howto (
http://docs.python.org/howto/sockets.html#sockets ) but it does not
explain how a blocking socket + select is different from a non blocking
socket + select. Is there any difference?

There is, but it may not effect you. There are cases where a socket can
select() as readable, but not be readable by the time of a following
recv() or accept() call. All such cases with which I'm familiar call for
a non-blocking socket.

Where does this come up? Suppose that to take advantage of multi-core
processors, our server runs as four processes, each with a single thread
that responds to events via select(). Clients all connect to the same
server port, so the socket listening on that port is shared by all four
processes. A perfectly reasonable architecture (though with many more
processes the simple implementation suffers the "thundering herd problem").

Two of our processors may be waiting on select() when a new connections
comes in. The select() call returns in both processes, showing the
socket ready for read, so both call accept() to complete the connection.
The O.S. ensures that accept() [and recv()] are atomic, so one process
gets the new connection; what happens in the other depends on whether we
use a blocking or non-blocking socket, and clearly we want non-blocking.
 
S

Saju Pillai

Bryan Olson said:
Where does this come up? Suppose that to take advantage of multi-core
processors, our server runs as four processes, each with a single thread
that responds to events via select(). Clients all connect to the same
server port, so the socket listening on that port is shared by all four
processes. A perfectly reasonable architecture (though with many more
processes the simple implementation suffers the "thundering herd problem").


Which is why it is common for real world servers to serialize the
select()/accept() code - usually via a file lock or a semaphore.
-srp
--
http://saju.net.in
Two of our processors may be waiting on select() when a new connections
comes in. The select() call returns in both processes, showing the
socket ready for read, so both call accept() to complete the connection.
The O.S. ensures that accept() [and recv()] are atomic, so one process
gets the new connection; what happens in the other depends on whether we
use a blocking or non-blocking socket, and clearly we want non-blocking.
 
R

Roy Smith

Bryan Olson said:
There are cases where a socket can select() as readable, but not be
readable by the time of a following recv() or accept() call. All such
cases with which I'm familiar call for a non-blocking socket.

I used to believe that if select() said data was ready for reading, a
subsequent read/recv/recvfrom() call could not block. It could return an
error, but it could not block. I was confident of this until just a few
months ago when reality blew up in my face.

The specific incident involved a bug in the linux kernel. If you received
an UDP packet with a checksum error, the select() would return when the
packet arrived, *before* the checksum was checked. By the time you did the
recv(), the packet had been discarded and the recv() would block.

This led me on a big research quest (including some close readings of
Stevens, which appeared to say that this couldn't happen). The more I
read, the more I (re) discovered just how vague and poorly written the
Berkeley Socket API docs are :)

The bottom line is that Bryan is correct -- regardless of what the various
man pages and textbooks say, in the real world, it is possible for a read()
to block after select() says the descriptor is ready. The right way to
think about select() is to treat it as a heuristic which can make a polling
loop more efficient, but should never be relied upon to predict the future.

Neither the negative nor positive behavior is guaranteed. There's no
guaranteed response time; just because select() hasn't returned yet doesn't
mean a descriptor couldn't be read without blocking in another thread right
now. And, just because it has returned, that doesn't mean by the time you
get around to reading, there will still be anything there.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top