Scatter/gather on sockets?

R

Roy Smith

I've got a bunch of strings in a list:

vector = []
vector.append ("foo")
vector.append ("bar")
vector.append ("baz")

I want to send all of them out a socket in a single send() call, so
they end up in a single packet (assuming the MTU is large enough). I
can do:

mySocket.send ("".join (vector))

but that involves creating an intermediate string. Is there a more
efficient way, that doesn't involve that extra data copy?
 
P

Peter Hansen

Roy said:
I've got a bunch of strings in a list:

vector = []
vector.append ("foo")
vector.append ("bar")
vector.append ("baz")

I want to send all of them out a socket in a single send() call, so
they end up in a single packet (assuming the MTU is large enough). I
can do:

mySocket.send ("".join (vector))

but that involves creating an intermediate string. Is there a more
efficient way, that doesn't involve that extra data copy?

Two possible answers that I can see:

A. No, the send call is implemented in C and requires a single buffer
with the entire piece of data that will be sent. Ultimately this gets
passed down to the NIC hardware in some fashion, so there's certainly no
hope of using something like a generator to send it in pieces.

B. Don't bother trying, because even if the MTU is large enough there is
absolutely no guarantee that the packet will stay intact all the way
through the network anyway (even if you use sendall() instead of send()).

So fixing your design not to require this appears to be the only viable
solution.

-Peter
 
A

Anthony Greene

I've got a bunch of strings in a list:

vector = []
vector.append ("foo")
vector.append ("bar")
vector.append ("baz")

I want to send all of them out a socket in a single send() call, so
they end up in a single packet (assuming the MTU is large enough). I
can do:

mySocket.send ("".join (vector))

but that involves creating an intermediate string. Is there a more
efficient way, that doesn't involve that extra data copy?

Is sendall() what you're looking for?
 
R

Roy Smith

Peter Hansen said:
B. Don't bother trying, because even if the MTU is large enough there is
absolutely no guarantee that the packet will stay intact all the way
through the network anyway (even if you use sendall() instead of send()).

This is true, but I'm generating the message being sent in very small
chunks (often as small as 4 bytes at a time), and typically need to flush a
packet out onto the network after a few dozen bytes. Maybe at most a few
hundred. I don't know of any networks with MTU's smaller than that.
Measurements show a 10-fold improvement in protocol throughput with large
packets vs. small ones. The only question is what's the most efficient way
in Python to generate the large packets.
So fixing your design not to require this appears to be the only viable
solution.

My design is not broken. I'm writing code to drive a pre-existing binary
communications protocol. It is what it is. The functionality I seek
exists at the Unix system call level (writev, sendmsg), but doesn't appear
to be exposed in the Python socket API.
 
R

Roy Smith

Anthony Greene said:
I've got a bunch of strings in a list:

vector = []
vector.append ("foo")
vector.append ("bar")
vector.append ("baz")

I want to send all of them out a socket in a single send() call, so
they end up in a single packet (assuming the MTU is large enough). I
can do:

mySocket.send ("".join (vector))

but that involves creating an intermediate string. Is there a more
efficient way, that doesn't involve that extra data copy?

Is sendall() what you're looking for?

No. Sendall() is actually what I'm using now. It handles the other side
of the issue; issuing repeated send() calls if the system fragments your
buffer. I'm trying to aggregate lots of small buffers into one large one.
 
P

Paul Rubin

Roy Smith said:
This is true, but I'm generating the message being sent in very small
chunks (often as small as 4 bytes at a time), and typically need to flush a
packet out onto the network after a few dozen bytes. Maybe at most a few
hundred. I don't know of any networks with MTU's smaller than that.
Measurements show a 10-fold improvement in protocol throughput with large
packets vs. small ones. The only question is what's the most efficient way
in Python to generate the large packets.

Probably: build up the packet with cStringIO or with the array module
instead of as a list of small strings. But if you time both versions
I don't think it'll matter much. Python (at least CPython) simply
will not be very fast no matter what you do. The overhead of building
a large string (with ''.join, say) from a bunch of small ones isn't
that big a deal compared with what you already lose in interpreter
overhead running the application.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top