Custom Protocol over TCP

J

Jeffrey Spoon

Hello I was toying with the idea of writing a relatively simple P2P
system. I decided using TCP would be the best bet and maybe UDP for
discovering other nodes. However, initially I was going to use binary
headers for my protocol to keep overhead down, but apparently this is
problematic and not at all friendly to anything non-C++, as well as
byte-ordering issues etc. Could be an issue as I'll be using Java...

Anyway, should I just use plain text for my protocol messages? It still
seems wasteful, as you're basically wasting 8 bits with each character,
whereas you could stuff a lot more info into a binary packet. I suppose
this is more of a general programming/networking question, but since I
will be using the Java I thought this was a good place to start.


Cheers.
 
Y

Yamin

The issue you will be dealing with is 'endianess'. Java is pefectly
capable of dealing with it. Almost any language is.

plain text is a little more interesting. You will have to choose your
text encoding...which you seem want to use something like UTF-8.
Supposing you do that, once you get down to actually transmitting data,
you will be wasting about twice as much bandwidth.
It takes 2 hexadecimal characters (as text) to uniquely represent one
byte.

Since this is a p2p application, its doubtful you'd want to have
everyone wasting this kind of bandwidth. I'd go with binary data and
specify the use of network byte order. Network byte order is the
default for java streams I think as well....so that may help you out as
well.

Yamin
 
S

Steve Horsley

Jeffrey said:
Hello I was toying with the idea of writing a relatively simple P2P
system. I decided using TCP would be the best bet and maybe UDP for
discovering other nodes. However, initially I was going to use binary
headers for my protocol to keep overhead down, but apparently this is
problematic and not at all friendly to anything non-C++, as well as
byte-ordering issues etc. Could be an issue as I'll be using Java...

Java and most other languages have no problem with binary data, PROVIDED
that the data format is defined properly. This definition must state
byte-by-byte way what is sent over the wire. The problem comes when lazy
C/C++ programmers send an image of an in-memory structure and don't
actually KNOW what they are sending over the wire (which would probably
change if they used a different compiler or different compile options
against the same source code). It is very hard to decode messages where
even the sender doesn't know what he's sending, and equally hard to encode
messages when the recipient doesn't know what format he expects. The same
issue applies with binary file data transferred between programs, although
for some reason file contents do tend to get documented more accurately.
Anyway, should I just use plain text for my protocol messages? It still
seems wasteful, as you're basically wasting 8 bits with each character,
whereas you could stuff a lot more info into a binary packet. I suppose
this is more of a general programming/networking question, but since I
will be using the Java I thought this was a good place to start.

Unless you have real bandwidth concerns, I would suggest that the easier
debugging will outweigh the extra bandwidth, and that text is probably
the better choice. It makes the protocol easier for others to understand
too, which is a consideration if you hope for wide adoption.

Text headers also tend to be easier to extend, add to, abuse etc.

Steve
 
J

Jeffrey Spoon

In message said:
The issue you will be dealing with is 'endianess'. Java is pefectly
capable of dealing with it. Almost any language is.

plain text is a little more interesting. You will have to choose your
text encoding...which you seem want to use something like UTF-8.
Supposing you do that, once you get down to actually transmitting data,
you will be wasting about twice as much bandwidth.
It takes 2 hexadecimal characters (as text) to uniquely represent one
byte.

Since this is a p2p application, its doubtful you'd want to have
everyone wasting this kind of bandwidth. I'd go with binary data and
specify the use of network byte order. Network byte order is the
default for java streams I think as well....so that may help you out as
well.

Yamin

Yes I was considering using UTF-8. Although I will actually be GZipping
the article body to keep bandwidth down (the article headers would be
too small to bother compressing). But I was considering using the actual
protocol packets as being binary.

I'm not too sure how to go about this in Java. I take it I would define
my protocol so a bit does whatever, at whatever offset. Then I create a
byte, then send it through the byte stream, or send a bunch of bytes for
a multiple byte header.


Thanks for the help.
 
J

Jeffrey Spoon

Steve Horsley said:
Java and most other languages have no problem with binary data, PROVIDED
that the data format is defined properly. This definition must state
byte-by-byte way what is sent over the wire. The problem comes when lazy
C/C++ programmers send an image of an in-memory structure and don't
actually KNOW what they are sending over the wire (which would probably
change if they used a different compiler or different compile options
against the same source code). It is very hard to decode messages where
even the sender doesn't know what he's sending, and equally hard to
encode messages when the recipient doesn't know what format he expects.
The same issue applies with binary file data transferred between
programs, although for some reason file contents do tend to get
documented more accurately.

Okay. I was actually looking at the Gnutella protocol to see how they
move things around. They seem to use a binary system (apart from direct
file transfers which are HTTP). I was a bit confused as to how they
actually encode the data into bytes though and how I would achieve that
in Java.
Unless you have real bandwidth concerns, I would suggest that the easier
debugging will outweigh the extra bandwidth, and that text is probably
the better choice. It makes the protocol easier for others to understand
too, which is a consideration if you hope for wide adoption.

Text headers also tend to be easier to extend, add to, abuse etc.


Well bandwidth is a pretty major concern. Also I'd be worried about the
abuse of headers (I suppose I could include a hash in the packets or
something). But I can see it's going to be a lot easier to do. I still
haven't decided which way to go yet, until I know what I'm doing.

Thanks.
 
P

Phil Staite

Note, UDP is problematic for anyone behind a firewall... :-(

You may want to include in your protocol an exchange of "known
servers/nodes" so that if/once you get out past your firewall you can
rapidly expand the list of known nodes.
 
J

Jeffrey Spoon

Phil Staite <[email protected]> said:
Note, UDP is problematic for anyone behind a firewall... :-(

You may want to include in your protocol an exchange of "known
servers/nodes" so that if/once you get out past your firewall you can
rapidly expand the list of known nodes.

Ah. Originally I though of doing it entirely in TCP but thought that
would be wasteful for what would basically be a ping/pong at the start,
before going onto the more permanent connection. I was going to use the
local cache host scheme (less centralised than having a few host lookup
servers). I suppose I could use a sort of "push" through other nodes to
get out through the firewall somehow.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top