Sizes and types for network programming

J

Jorgen Grahn

Hi Everyone,
I have a question, and I hope its not obvious!

As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

I am writing a program which writes information across a network, to
both 32 or 64 bit architectures. I am writing in Linux (sorry I don't
want to get os specific), what I do not understand is how I should
define my types. Basically, if I am writing out from a 32bit machine,
and readling from both 32 and 64 bit machines, how should I define my
datatypes.

In a way it's very simple. TCP and UDP push octets around -- as
infinite streams of octets in the TCP case, as chunks of N octets in
the UDP case.

If you only use the socket interface, that's the only data structure
you have, so that's the terms in which you have to define your protocol.
Octet by octet.

If you're typecasting something else to send or receive, you're doing
something wrong.

/Jorgen
 
J

Jorgen Grahn

.
If you are sure you want a binary transfer, then you have to first fix
the interchange format, e.g. the packet starts with a 16-bit unsigned
integer in little-endian order, followed by a 2-byte gap, then a signed
32-bit integer in little-endian order, etc. This would be similar to the
TIFF format, for example.

It's IMHO much better to specify network byte order (big-endian):
- there are standard functions (htons() etc) in the socket library
for doing the conversion
- even on little-endian, these are almost cost-free because they're
compiler builtins, probably mapping to specific CPU instructions
- that's what everyone else uses
Then you can implement this interchange format
on all needed platforms. When doing that you can use the fixed-size types
from <stdint.h>; in case of structs you might need compiler-specific
struct packing and alignment pragmas or directives.

I strongly recommend against typecasting raw structs and sending them.
- The compiler-specific problem you mention above
- It removes focus from the idea of having a stable protocol, to
"the format is whatever my struct looks like today"
- You lose as soon as the struct contains a pointer, a std::string,
an enum or just about anything besides integers and arrays.
- You have to take a copy of the struct anyways, to do the
byte-swapping. (Byte-swapping the original leads to insanity,
at least in the transmission case.)

Better to have a function

FooMessage parse(const unsigned char* buf, unsigned len);

which picks octet by octet according to the rules of the protocol, and
builds a FooMessage, which can be a quite normal C++ object.

Or, you can pass around the buffer and parse data from it when it's
needed. I've used that approach successfully too.

/Jorgen
 
Ö

Öö Tiib

I agree with Yannick.  I've been involved in debugging an ASN.1
encoding error since back in June, and we're still not done. It's easy
to see that the encoding is incorrect, but very hard to see why and
where it goes wrong.


Look at some of the popular protocols: HTTP, SMTP, NNTP. Very easy to
parse; even easier to read manually during debugging (which I assume
was what Yannick was thinking of).



Then you'd have to maintain two protocols: the binary you use live,
and a text protocol, with a debugging utility which converts to/from
the binary protocol.

Plus, that tool would be useless when you're looking at captured
network traffic, whereas tcpdump, tcpflow, Wireshark and similar tools
handle text-based protocols quite well.

I *really* recommend text-based protocols for all normal uses.  If
it's too much data, add an optional zlib compression layer.

Probably this is typical bait to one very old argued to death
discussion again?

I have seen lot of strong suggestions and recommendations to use text
protocols, OTOH most real things that are not meant to transport human
readable/editable text/script/config files (like smpt, nntp and http
are) are in practice using binary protocols as rule. I think they do
it very correctly.

I have worked with both types of protocols and i have noticed that
amounts of problems with text are bigger. On cases when products made
by different organizations are using same protocol (or data files) the
integration works on case of text were greater. So now i agree with
text protocol only when it is for reading/editing by human or it is
XML with well defined schema (like WXS or XSD). How complex is text
can be easily realized by reviewing Apache xerces code-base.

1) Size. Binary formats zip usually to even smaller results. Also size
of unpacked binary may be predicted easier.
2) Navigability. Text needs to be read and parsed to navigate in it.
For example to bypass irrelevant (sometimes huge) parts of it.
3) Comparison. There are lots of text variants that can be considered
equal with each other. Text (if it is ever meant to be read or edited
by a human) is often formatted to provide better readability. For
example white space indents are added at will, or trailing zeroes /
thousands separators for numbers.
4) Binary compatibility. Sometimes there is need to apply some sort of
digital processing. For example a need to digitally sign parts of
data. That is achieved with algorithm to turn it (or its sub-parts)
into a "canonical form". The endianness/byte size etc. code for binary
format is a child play if to compare with algorithm converting a text
to some canonical form.
5) Usability in human interface. Computer-generated text is not really
readable (especially the canonical form) for human and so there is
tendency to add beautifier and localization algorithms or to convert
it into some graphical form.

Finally i have realized that it is way more complex to mess with text
than to use binary and to convert from a binary to text (or tree, or
graph, or table) where needed. This is not the case with XML since
there are relatively good libraries supporting it, but other text i
prefer to avoid where only i can. XML i like only if i may use XML-
processing libraries in product.
 
A

Alf P. Steinbach /Usenet

* Öö Tiib, on 18.09.2010 22:48:
I have seen lot of strong suggestions and recommendations to use text
protocols, OTOH most real things that are not meant to transport human
readable/editable text/script/config files (like smpt, nntp and http
are) are in practice using binary protocols as rule. I think they do
it very correctly.

Huh. Amazingly it works using Telnet to do e.g. FTP, even though FTP is binary.
I'm guessing the Telnet program converts betwen text and binary?


Cheers, & hth.,

- Alf (simpleton)
 
Ö

Öö Tiib

* Öö Tiib, on 18.09.2010 22:48:




Huh. Amazingly it works using Telnet to do e.g. FTP, even though FTP is binary.
I'm guessing the Telnet program converts betwen text and binary?

You mean when you are transporting binary data with text? On these
places base64 encoding or something similar are used. Base64 is not
much human-readable but is a necessary hack to keep the binary intact
over text protocol. Or maybe i misunderstood what you mean.
 
B

Brian Wood

I agree with Yannick.  I've been involved in debugging an ASN.1
encoding error since back in June, and we're still not done. It's easy
to see that the encoding is incorrect, but very hard to see why and
where it goes wrong.


I seldom run into the sort of problem you mention when using
binary, but I'm working with it daily and am not afraid of
it, so that may help. Maintaining and improving a disciplined
process is essential to having happy binary days.


Brian Wood
Ebenezer Enterprises
http://webEbenezer.net
(651) 251-9384

But everything should be done in a fitting and orderly way.
1 Corinthians 14:40
 
A

Alf P. Steinbach /Usenet

* Öö Tiib, on 18.09.2010 23:22:
You mean when you are transporting binary data with text? On these
places base64 encoding or something similar are used. Base64 is not
much human-readable but is a necessary hack to keep the binary intact
over text protocol. Or maybe i misunderstood what you mean.

I'm just not aware of any commonly used binary high level protocols.

In the old days we had some binary RPC protocols, for sure, even monstrosities
like CORBA, but then came SOAP and other text based thingies; some stuck in the
dino world may be using binary protocols but that's them.

Btw., FTP is a text based protocol (from the dino age).


Cheers & hth.,

- Alf
 
Ö

Öö Tiib

* Öö Tiib, on 18.09.2010 23:22:





I'm just not aware of any commonly used binary high level protocols.

In the old days we had some binary RPC protocols, for sure, even monstrosities
like CORBA, but then came SOAP and other text based thingies; some stuck in the
dino world may be using binary protocols but that's them.

SSL/TLS. I think most data in network that has some value goes with
something like that.

CORBA is really better than SOAP when speed is needed. Dinosaur that
is being smaller, quicker and more flexible than opponent? I don't
think that CORBA goes anywhere. Also most SOAP uses MTOM to gain some
speed back and that turns it sort of nonsense since it is again binary
that is transferred.

SOAP is XML based protocol like open document file format is XML-based
file format. XML is good for data transferring when using good
libraries. Since application code does not have to deal with that xml
text, xml is low level transport format there. However when someone
starts to send/receive (possibly XML-like) texts constructing and
parsing them with iostreams then it is usually a LOT more pain than
binary.
 
J

James Kanze

[...]
It's IMHO much better to specify network byte order (big-endian):

More generally, it's probably better to just choose some
existing format, like XDR, and use it. Why bother going through
the work of specifying one yourself, when you don't have to?
 
J

James Kanze

* James Kanze, on 18.09.2010 12:16:
* James Kanze, on 17.09.2010 12:36:
[snip]
I'd be interested to know which current machine has 9-bit
bytes.
Unisys 2200.
Presumably that would be a machine using 18-bit or
36-bit word addressing.
It's 36 bit one's complement.
I've posted this information before (including a link to the C++
programming manual, IIRC).

Hopefully that's the very last of the dinosaur-machines?

My impression from their web pages is that they are gradually
migrating away from it. The their MPC architecture: 8 bit
bytes, but 48 bit signed magnitude:).
 
J

James Kanze

* Öö Tiib, on 18.09.2010 23:22:
I'm just not aware of any commonly used binary high level
protocols.

FTP has a binary option. On the FTP clients I've used, you
chose between binary and ascii (with the default being ascii);
what was really meant was transparent or text.

This is really only an issue because FTP is used to transmit
arbitrary user data (and the option only affects that data).
Historically necessary because text needed to be transcoded
(ASCII/EBCDIC, etc.), but still relevant today for the line
endings: tranmit a file in ASCII mode from a Unix machine to
Windows, and the line endings will be adjusted.
In the old days we had some binary RPC protocols, for sure,
even monstrosities like CORBA, but then came SOAP and other
text based thingies; some stuck in the dino world may be using
binary protocols but that's them.

In the dino world, most protocols were binary, because when your
top transmission speed is 4800 KBaud, every byte makes a
difference. Today, it depends. Most protocols designed for a
wide area network are text, because it's basically impossible to
know who's at fault when they don't work otherwise, but there
are still a number of binary protocols in wide spread use: NFS
(based on RPC, which uses XDR), or SMB. For purely local use, I
could see using XDR today, espeically in a Unix only world
(since you likely have an implementation of it already on your
machine). But it is the exceptional case.
Btw., FTP is a text based protocol (from the dino age).

On the control channel. The data channel supports several
different formats, including transparent, which is literally
just a binary dump (but which is probably the most used today,
since it can handle compressed data, which other formats can't).

BTW: your "from the dino age" carries a strong pejorative
connotation, at least to me. FTP was designed after I started
programming. And a lot of things back then were very good. In
some cases (e.g. Intel architecture), one can argue that it was
the worst solution that prevailed. There's nothing wrong with
having withstood the tests of time.
 
G

Goran

Then you'd have to maintain two protocols: the binary you use live,
and a text protocol, with a debugging utility which converts to/from
the binary protocol.

I think we have a misunderstanding. I didn't think "have both text and
binary transfer". I merely suggested "have conversion to textual (or
even somehing else visual) representation.
Plus, that tool would be useless when you're looking at captured
network traffic, whereas tcpdump, tcpflow, Wireshark and similar tools
handle text-based protocols quite well.

Heh, binary protocol I need to use is recognized by Wireshark.
Obvoiusly, there's a way to get your own protocol handler in (or
whatever it's called in wireshark or any other such product).

Goran.
 
K

Keith H Duggar

On Sep 18, 11:56 pm, "Alf P. Steinbach /Usenet"
BTW: your "from the dino age" carries a strong pejorative
connotation, at least to me.  FTP was designed after I started
programming.  And a lot of things back then were very good.  In
some cases (e.g. Intel architecture), one can argue that it was
the worst solution that prevailed.  There's nothing wrong with
having withstood the tests of time.

It is a common human character flaw to look back on those who and
that which came before with dismissal and disrespect. This flaw has
prevailed for at least two thousand years and it is why civilization
is doomed to repeat failure, reinvent inferior solutions, establish
new jargon for old ideas, etc.

Very very few people have the wisdom and humility to appreciate that
they stand on the shoulders of giants (or at the very least equally
intelligent and resourceful ancestors) as Einstein realized. Fewer
still have the brains and good sense to avail themselves of the view
(and its foundation).

To put it more politely, many of those in cyberspace who /act/ and
want to been /seen/ as smart amount to very little more than Beavis
and Butt-head with a wikipeducation and a google-brain.

KHD
 
A

Alf P. Steinbach /Usenet

* Keith H Duggar, on 20.09.2010 03:30:
It is a common human character flaw to look back on those who and
that which came before with dismissal and disrespect. This flaw has
prevailed for at least two thousand years and it is why civilization
is doomed to repeat failure, reinvent inferior solutions, establish
new jargon for old ideas, etc.

Very very few people have the wisdom and humility to appreciate that
they stand on the shoulders of giants (or at the very least equally
intelligent and resourceful ancestors) as Einstein realized. Fewer
still have the brains and good sense to avail themselves of the view
(and its foundation).

To put it more politely, many of those in cyberspace who /act/ and
want to been /seen/ as smart amount to very little more than Beavis
and Butt-head with a wikipeducation and a google-brain.

Well, to put it bluntly, it seems that you think innuendo is a great thing.

:)


Cheers,

- Alf (old dino)
 
J

Jorgen Grahn

I think we have a misunderstanding. I didn't think "have both text and
binary transfer". I merely suggested "have conversion to textual (or
even somehing else visual) representation.

No misunderstanding, but I was not clear enough. I just find that to
test the protocol, I need a command-line tool for using it, which in
turn means a well-defined text-based language -- for both input and
output.

(The same feature as telnet and netcat provide for text protocols.)

Not a big burden, perhaps, if you use a "scripting" language like
Python or Tcl.
Heh, binary protocol I need to use is recognized by Wireshark.
Obvoiusly, there's a way to get your own protocol handler in (or
whatever it's called in wireshark or any other such product).

Yes, but that's an additional significant burden if you choose to
design a new binary protocol. (I think we were discussing new
protocols here; I see no reason to refuse to use /existing/ binary
protocols, if they are well-defined.)

/Jorgen
 
J

Jorgen Grahn

I seldom run into the sort of problem you mention when using
binary, but I'm working with it daily and am not afraid of
it, so that may help.

I can assure you that fear or inexperience are not the reasons I
prefer text to binary protocols.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,252
Latest member
MeredithPl

Latest Threads

Top