asynchat sends data on async_chat.push and .push_with_producer

L

ludvig.ericson

Hello,

My question concerns asynchat in particular. With the following half-
pseudo code in mind:

class Example(asynchat.async_chat):
def readable(self):
if foo:
self.push_with_producer(ProducerA())
return asynchat.async_chat.readable(self)

Now, asyncore will call the readable function just before a select(),
and it is meant to determine whether or not to include that asyncore
dispatcher in the select map for reading.

The problem with this code is that it has the unexpected side-effect
of _immediately_ trying to send, disregarding if the async_chat object
is indeed writable or not.

The asynchat.push_with_producer (and .push as well)
call .initiate_send(), which in turn calls .send if there's data
buffered. While this might seem logical, it isn't at all.

Insinuate that when Example.readable is called, the socket has already
been closed. There are two possible scenarios where it could be
closed. a) The remote endpoint closed the connection, and b) the
producer ProducerA somehow closed the connection (my case).

Obviously, calling send on a socket that has been closed will result
in an error - EBADF, "Bad file descriptor".

So, my question is: Why does asynchat.push* call self.initiate_send?
Nothing in the name "push" suggests that it'll transmit immediately,
disregarding potential "closedness". Removing the two calls
to .initiate_send() in the two push functions would still mean data is
sent, but only when data can be sent - which is, IMO, what should be
done.

Thankful for insights,
Ludvig.
 
J

Josiah Carlson

Ludvig,

In a substantial way, I agree with you. Calling initiate_send()
within push or push_with_producer is arguably a misfeature (which you
have argued).

In a pure world, the only writing that is done would be within the
handle_send() callbacks within the select loop. Then again, in a
perfect world, calling readable() and writable() would have no strange
side affects (as your example below has), and all push*() calls would
be made within the handle_*() methods.

We do not live in a pure world, Python isn't pure (practicality beats
purity), and by attempting to send some data each time a .push*()
method is called, there are measurable increases in transfer rates.

In the particular case you are looking at (and complaining about ;) ),
if you want to bypass the initiate_send() call, you can dig into the
particular implementation of asynchat you are using (the internals may
change in 2.6 and 3.x versus 2.5 and previous), and append your output
to the outgoing queue. You could even abstract out the push*() calls
for a non-auto-sending version (easy), write your own initiate_send()
method that checks the stack to verify that it's being called from
handle_send() (also easy), or any one of many other work-arounds.

Yes, it would be convenient to not have push*() actually send data
when called in some cases, but in others, the increase in data
transfer rates and/or reduction in latency is substantial.

- Josiah
 
L

ludvig.ericson

In a pure world, the only writing that is done would be within the
handle_send() callbacks within the select loop.  Then again, in a
perfect world, calling readable() and writable() would have no strange
side affects (as your example below has), and all push*() calls would
be made within the handle_*() methods.

It wouldn't have those side-effects if push really just pushed. :p
We do not live in a pure world, Python isn't pure (practicality beats
purity), and by attempting to send some data each time a .push*()
method is called, there are measurable increases in transfer rates.

-- 8< --
Yes, it would be convenient to not have push*() actually send data
when called in some cases, but in others, the increase in data
transfer rates and/or reduction in latency is substantial.

If it increases transfer speed that much, the calling application
almost has to be broken, or at least not designed as it should be - of
course there are such applications, but you know...

Anyway, I went for a subclassing way of dealing with it, and it works
fine.

Thanks for the reply though, hadn't considered possibly "flawed"
applications where the asyncore loop isn't revisited as often as it
should. :->
Ludvig
 
G

Giampaolo Rodola'

We do not live in a pure world, Python isn't pure (practicality beats
purity), and by attempting to send some data each time a .push*()
method is called, there are measurable increases in transfer rates.

Good point. I'd like to ask a question: if we'd have a default
asyncore.loop timeout of (say) 0.01 ms instead of 30 could we avoid
such problem?
I've always found weird that asyncore has such an high default timeout
value.
Twisted, for example, uses a default of 0.01 ms for all its reactors.
 
G

Giampaolo Rodola'

Anyway, I went for a subclassing way of dealing with it, and it works
fine.

As Josiah already stated pay attention to the changes that will be
applied to asyncore internals in Python 2.6 and 3.0 (in detail you
could take a look at how things will be changed by taking a look at
the patch provided in bug #1736190).
Your subclass could not work on all implementations.

--- Giampaolo
http://code.google.com/p/pyftpdlib
 
J

Josiah Carlson

It wouldn't have those side-effects if push really just pushed. :p


-- 8< --


If it increases transfer speed that much, the calling application
almost has to be broken, or at least not designed as it should be - of
course there are such applications, but you know...

It's not a matter of being broken at all, it's a matter of control
flow. When we immediately try to send whenever a .push() call is
made, the underlying TCP/IP stack will accept a reasonably large
amount of data before it actually fills up (the most recent FreeBSD,
from what I understand, will accept up to 1 meg, which is how they are
able to saturate 10Gbit links), and by tossing the data into the the
TCP/IP buffer early, the data gets sent earlier, thus reducing
latency.

Further, because we are making more actual calls to socket.send(),
assuming the underlying TCP/IP buffer isn't filled (which may or may
not be a good assumption), and assuming that the link has more
capacity than is being used (usually the case on LANs and high-speed
internet links), putting more data into the buffer to be handled by
the underlying link layers will also increase transfer speeds.

When the socket.send() calls are delayed until the next pass through
the loop, and we aren't doing an initial send, then we don't get the
benefit of the underlying TCP/IP socket layer buffering.

In my experience over high-speed connections (LANs, Gbit WAN links,
local machine connections), I have found that increasing block sizes
to 32k to significantly improve performance for bandwidth constrained
applications, as there are far fewer blocks to toss to the underlying
layers, less Python code execution (Python 2.5 has a default block
size of 512 bytes, or 64x as much Python execution to send the same
amount of data, and one of the proposed 2.6 changes is to up this to a
more reasonable 4096 bytes), and more effective use of the TCP/IP
buffers (which are typically 64k or larger).
Anyway, I went for a subclassing way of dealing with it, and it works
fine.

Thanks for the reply though, hadn't considered possibly "flawed"
applications where the asyncore loop isn't revisited as often as it
should. :->
Ludvig

Again, it's not about the application being flawed, it's a matter of
control flow. ;) Also, it's not a matter of any timeouts in the
select/poll loops (as Giampaolo suggested); if any socket is readable
or writable, those calls will return immediately (a few hundred
microseconds per call isn't bad).

- Josiah
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top