Sending binary pickled data through TCP

D

David Hirschfield

I have a pair of programs which trade python data back and forth by
pickling up lists of objects on one side (using
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket
connection to the receiver, who unpickles the data and uses it.

So far this has been working fine, but I now need a way of separating
multiple chunks of pickled binary data in the stream being sent back and
forth.

Questions:

Is it safe to do what I'm doing? I didn't think there was anything
fundamentally wrong with sending binary pickled data, especially in the
closed, safe environment these programs operate under...but maybe I'm
making a poor assumption?

I was going to separate the chunks of pickled data with some well-formed
string, but couldn't that string potentially randomly appear in the
pickled data? Do I just pick an extremely
unlikely-to-be-randomly-generated string as the separator? Is there some
string that will definitely NEVER show up in pickled binary data?

I thought about base64 encoding the data, and then decoding on the
opposite side (like what xmlrpclib does), but that turns out to be a
very expensive operation, which I want to avoid, speed is of the essence
in this situation.

Is there a reliable way to determine the byte count of some pickled
binary data? Can I rely on len(<pickled data>) == bytes?

Thanks for all responses,
-David
 
N

Nick Craig-Wood

Paul Rubin said:
As for the network representation, DJB proposes this format:
http://cr.yp.to/proto/netstrings.txt

Netstrings are cool and you'll find some python implementations if you
search.

But it is basically "number:string,", ie "12:hello world!,"

Or you could use escaping which is what I usually do. This has the
advantage that you don't need to know how long the data is in advance.

Eg, these are from a scheme which uses \t to seperate arguments and
\r or \n to seperate transactions. These are then escaped in the
actual data using these functions

def escape(s):
"""This escapes the string passed in, changing CR, LF, TAB and \\ into
\\r, \\n, \\t and \\\\"""
s = s.replace("\\", "\\\\")
s = s.replace("\r", "\\r")
s = s.replace("\n", "\\n")
s = s.replace("\t", "\\t")
return s

def unescape(s, _unescape_mapping = string.maketrans('tnr','\t\n\r'), _unescape_re = re.compile(r'\\([(rnt\\)])')):
"""This unescapes the string passed in, changing \\r, \\n, \\t and \\any_char into
CR, LF, TAB and any_char"""
def _translate(m):
return m.group(1).translate(_unescape_mapping)
return _unescape_re.sub(_translate, s)

(These functions have been through the optimisation mill which is why
they may not look immediately like how you might first think of
writing them!)
 
M

MRAB

David said:
I have a pair of programs which trade python data back and forth by
pickling up lists of objects on one side (using
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket
connection to the receiver, who unpickles the data and uses it.

So far this has been working fine, but I now need a way of separating
multiple chunks of pickled binary data in the stream being sent back and
forth.

Questions:

Is it safe to do what I'm doing? I didn't think there was anything
fundamentally wrong with sending binary pickled data, especially in the
closed, safe environment these programs operate under...but maybe I'm
making a poor assumption?

I was going to separate the chunks of pickled data with some well-formed
string, but couldn't that string potentially randomly appear in the
pickled data? Do I just pick an extremely
unlikely-to-be-randomly-generated string as the separator? Is there some
string that will definitely NEVER show up in pickled binary data?

I thought about base64 encoding the data, and then decoding on the
opposite side (like what xmlrpclib does), but that turns out to be a
very expensive operation, which I want to avoid, speed is of the essence
in this situation.

Is there a reliable way to determine the byte count of some pickled
binary data? Can I rely on len(<pickled data>) == bytes?
Instead of communicating directly with the TCP socket, you could talk
to it via an object which precedes each chunk with a byte count, and if
you're working with multiple streams of picked data, then each chunk
could also have an identifier which specified which stream it belonged
to.
 
I

Irmen de Jong

David said:
I have a pair of programs which trade python data back and forth by
pickling up lists of objects on one side (using
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket
connection to the receiver, who unpickles the data and uses it.

So far this has been working fine, but I now need a way of separating
multiple chunks of pickled binary data in the stream being sent back and
forth.
[...]

Save yourself the trouble of implementing some sort of IPC mechanism
over sockets, and give Pyro a swing: http://pyro.sourceforge.net

In Pyro almost all of the nastyness that is usually associated with socket
programming is shielded from you and you'll get much more as well
(a complete pythonic IPC library).

It may be a bit heavy for what you are trying to do but it may
be the right choice to avoid troubles later when your requirements
get more complex and/or you discover problems with your networking code.

Hth,
---Irmen de Jong
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top