Sizes and types for network programming

Michael Hull · Sep 15, 2010

Hi Everyone,
I have a question, and I hope its not obvious!

As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

I am writing a program which writes information across a network, to
both 32 or 64 bit architectures. I am writing in Linux (sorry I don't
want to get os specific), what I do not understand is how I should
define my types. Basically, if I am writing out from a 32bit machine,
and readling from both 32 and 64 bit machines, how should I define my
datatypes. I hope this is not OT - I am using linux sockets, but I
guess the question applies for all os's/libraries.

I would have expected that some header file would define int8, int16,
int32, int64 for these kind of scenarios, and I'm sure that I have
seen these kind of definitions, but can seem to find them on my local
system, nor google.

Any advice would be gratefully appreciated,

Thanks

Mike

Ian Collins · Sep 15, 2010

Hi Everyone,
I have a question, and I hope its not obvious!

As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

I am writing a program which writes information across a network, to
both 32 or 64 bit architectures. I am writing in Linux (sorry I don't
want to get os specific), what I do not understand is how I should
define my types. Basically, if I am writing out from a 32bit machine,
and readling from both 32 and 64 bit machines, how should I define my
datatypes. I hope this is not OT - I am using linux sockets, but I
guess the question applies for all os's/libraries.

I would have expected that some header file would define int8, int16,
int32, int64 for these kind of scenarios, and I'm sure that I have
seen these kind of definitions, but can seem to find them on my local
system, nor google.

Any advice would be gratefully appreciated,

It's more than likely your system will have the C99 header <stdint.h>,
which declares the C99 fixed width integer types (intN_t). While not
part of the current C++ standard, these are widely supported.

Goran Pusic · Sep 15, 2010

Hi Everyone,
I have a question, and I hope its not obvious!

As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

I am writing a program which writes information across a network, to
both 32 or 64 bit architectures. I am writing in Linux (sorry I don't
want to get os specific), what I do not understand is how I should
define my types. Basically, if I am writing out from a 32bit machine,
and readling from both 32 and 64 bit machines, how should I define my
datatypes. I hope this is not OT - I am using linux sockets, but I
guess the question applies for all os's/libraries.

I would have expected that some header file would define int8, int16,
int32, int64 for these kind of scenarios, and I'm sure that I have
seen these kind of definitions, but can seem to find them on my local
system, nor google.

There is always a way to define your data types with correct sizes in
your toolchain (compiler is most affected). AFAIK, exact sizes are not
specified by C++ standard (it is by C, I think). You use a "tailor"
header for that. Said header is shipped for any toolchain you will
support.

You define your serialized/streamed/marshaled format in a toolchain-
agnostic manner. Specifically for numbers, you need to know the
endiannes, too. That influences they way you (de)serialize them on the
sending/receiving sides.

That's kinda the base.

hth, Goran.

Brian Wood · Sep 15, 2010

Hi Everyone,
I have a question, and I hope its not obvious!

As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

I am writing a program which writes information across a network, to
both 32 or 64 bit architectures. I am writing in Linux (sorry I don't
want to get os specific), what I do not understand is how I should
define my types. Basically, if I am writing out from a 32bit machine,
and readling from both 32 and 64 bit machines, how should I define my
datatypes. I hope this is not OT - I am using linux sockets, but I
guess the question applies for all os's/libraries.

I would have expected that some header file would define int8, int16,
int32, int64 for these kind of scenarios, and I'm sure that I have
seen these kind of definitions, but can seem to find them on my local
system, nor google.

Any advice would be gratefully appreciated,

Thanks

Mike

Perhaps the C++ Middleware Writer would be of interest
to you -- http://webEbenezer.net. Also there's some
software in an archive on this page --
http://webEbenezer.net/build_integration.html -- that
deals with portably marshalling the types you've
mentioned.

Brian Wood

Brian Wood · Sep 15, 2010

The datatype sizes are only a tip of the iceberg.

In network transfer the data is typically serialized in some more
portable format, e.g. textual. There are also specific libraries for
that. The network transfer is typically slow so the
serialization/deserialization overhead is probably insignificant.

In a relay race, if one of the team members is slow
the other members shouldn't say, "Joe is so slow it
doesn't really matter how we run." The other teams
have their own slow "Joe" so if you want to win, the
whole team is going to have to work for it.

Brian Wood
www.wnd.com

Bo Persson · Sep 15, 2010

Goran said:
There is always a way to define your data types with correct sizes
in your toolchain (compiler is most affected). AFAIK, exact sizes
are not specified by C++ standard (it is by C, I think).

Well, C99 typedefs int32_t for a 32 bit type *if there is one*. If int
happens to be 36 or 48 bits, the typedef would be missing.

C++0x will do the same.

You use a
"tailor" header for that. Said header is shipped for any toolchain
you will support.

You define your serialized/streamed/marshaled format in a toolchain-
agnostic manner. Specifically for numbers, you need to know the
endiannes, too. That influences they way you (de)serialize them on
the sending/receiving sides.

Right, you define the transport format for the network, and implement
that on each platform. VERY hard to do that portably.

Bo Persson

James Kanze · Sep 15, 2010

In a relay race, if one of the team members is slow
the other members shouldn't say, "Joe is so slow it
doesn't really matter how we run." The other teams
have their own slow "Joe" so if you want to win, the
whole team is going to have to work for it.

The analogy doesn't hold. In the case of network transfers, the
serialization/deserialization will largely be taking place in
parallel with the transmission. And even if it weren't: the
difference in times will typically be several orders of
magnitude, so the serialization/deserialization really doesn't
make a significant difference (supposing it's not completely
stupid).

Joshua Maurice · Sep 15, 2010

Hi Everyone,
I have a question, and I hope its not obvious!

As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

I'm not sure exactly what you are saying, so let me make sure it's
clear. sizeof(char) == 1, aka the size of a char is exactly one C++
byte. A C++ byte is not the same thing as a byte in other contexts. In
other contexts, generally a byte is an octet, aka 8 bits. According to
the C++ standard, and some exotic hardware perhaps, a C++ byte may be
8 bits, 9 bits, 64 bits, etc. Thus, the effective size of char is also
implementation dependent. Ex: you can't serialize a 64 bit char to a
data stream, then deserialize those 64 bits of information into an 8
bit char on a regular desktop.

James Kanze · Sep 16, 2010

James said:
James said:

Goran Pusic wrote:

Click to expand...

[...]

You define your serialized/streamed/marshaled format in a
toolchain- agnostic manner. Specifically for numbers, you need to
know the endiannes, too. That influences they way you
(de)serialize them on the sending/receiving sides.
Right, you define the transport format for the network, and
implement that on each platform. VERY hard to do that portably.

Click to expand...

Not that hard. Floating point can be tricky, but the rest is
generally pretty straightforward (and the C library has
functions which simplify the floating point as well, provided
you use a binary format in the transport).
And of course, there is a question of how portable you really
have to be. For a lot of people, 2's complement and IEEE
floating point are acceptable restrictions (even though they
exclude most, if not all, mainframes).

Click to expand...

Ok, less hard if you can live without some of harder parts.

Well put. But when was the last time you had to take Unisys
mainframes into account? Or even an IBM mainframe (which will
only cause problems for floating point)?

James Kanze · Sep 16, 2010

I'm not sure exactly what you are saying, so let me make sure it's
clear. sizeof(char) == 1, aka the size of a char is exactly one C++
byte. A C++ byte is not the same thing as a byte in other contexts.

No. C++ requires that a byte be at least 8 bits: historically,
6 and 7 bit bytes were common. (The reason Fortran uses such
a small character set, and doesn't distinguish case, is that it
was first developed on a machine with 6 bit bytes.) Also, C++
requires that an integral number of bytes occupy all of the bits
in any integral type: a PDP-10 traditionally used 5 seven bit
bytes in a 36 bit word (but the size of a byte was programmable,
so 4 nine bit bytes would work for C/C++).

In other contexts, generally a byte is an octet, aka 8 bits.

This is the most frequent situation today. It wasn't in the
past, and the first use of byte refered to six bit bytes.

There's still one platform today where a byte is 9 bits. And
I think some of the embedded processors punt, and make a byte 32
bits (and sizeof(int) 1); this doesn't correspond to the
classical definition, however, which requires that a byte be
smaller than a word.

According to the C++ standard, and some exotic hardware
perhaps, a C++ byte may be 8 bits, 9 bits, 64 bits, etc. Thus,
the effective size of char is also implementation dependent.
Ex: you can't serialize a 64 bit char to a data stream, then
deserialize those 64 bits of information into an 8 bit char on
a regular desktop.

Sure you can (and people do). It just requires some special
handling.

Goran Pusic · Sep 16, 2010

Right, you define the transport format for the network, and implement
that on each platform. VERY hard to do that portably.

Yeah, possibly, but what else do you suggest? You can simplify issues
by going "all is text" route, and even there, you need to know that
you work with e.g. octets (or something else?), what text encoding is
used, how are floating point numbers written out...

In practice, if you go binary, you define your transport format so
that it fits well your "most important" platform (or, ;-), second most
important, so that you are forced to verify on the "first" one), and
you tailor for others. Effectively, you "write" code that converts
"network" data types to "host" data types (reception side) and vice
verso (transmission side).

Goran.

red floyd · Sep 16, 2010

Well put. But when was the last time you had to take Unisys
mainframes into account? Or even an IBM mainframe (which will
only cause problems for floating point)?

Last week. EBCDIC vs. ASCII issues are a PITA.

Joshua Maurice · Sep 16, 2010

No. C++ requires that a byte be at least 8 bits: historically,
6 and 7 bit bytes were common. (The reason Fortran uses such
a small character set, and doesn't distinguish case, is that it
was first developed on a machine with 6 bit bytes.) Also, C++
requires that an integral number of bytes occupy all of the bits
in any integral type: a PDP-10 traditionally used 5 seven bit
bytes in a 36 bit word (but the size of a byte was programmable,
so 4 nine bit bytes would work for C/C++).

Yo James. I'm not sure what's going on here. It sounds like you're
trying to correct some misinformation, but I don't understand exactly
how you're correcting me. You ripped my post apart, but I don't see
any real disagreement, nor any real corrections.

For example, why say "No."? I said that sizeof(char) == 1 always, and
I said a C++ byte may not be the same thing as byte in other contexts,
like Java, common desktops, common networking, etc. Exactly to what
does "No." refer? You mention hardware which does not have 8 bit bytes
as though that contradicts something which I said. I don't see what
that could possibly be. Moreover, immediately following this quote in
the same post, I admit of the existence of hardware which does not
have 8 bit bytes, so I don't see the benefit of singling out my post
and adding this as though you're correcting me.

This is the most frequent situation today. It wasn't in the
past, and the first use of byte refered to six bit bytes.

There's still one platform today where a byte is 9 bits. And
I think some of the embedded processors punt, and make a byte 32
bits (and sizeof(int) 1); this doesn't correspond to the
classical definition, however, which requires that a byte be
smaller than a word.

Indeed. Would you be happier if I said "in other contexts /today/,
generally a byte is an octet, aka 8 bits."?

Moreover, in the very next sentence of my first post in the thread
(quoted below), I mention that there is hardware, (today) exotic,
which does not have 8 bit bytes.

Sure you can (and people do). It just requires some special
handling.

No, you cannot take the arbitrary information in a 64 bit char on one
system and shove that into an 8 bit char on another system, which is
exactly what I said. However, yes you can do serialization between
separate hardware which has differently sized bytes, which I did not
deny.

James Kanze · Sep 17, 2010

Yo James. I'm not sure what's going on here. It sounds like you're
trying to correct some misinformation, but I don't understand exactly
how you're correcting me. You ripped my post apart, but I don't see
any real disagreement, nor any real corrections.

The only real misinformation was the impression your post gave,
that the "common" meaning of byte (outside of C/C++) was eight
bits. I wasn't disagreeing with your facts concerning C++, but
trying to point out that in the non-C++ context, byte is even
less precise than in C++; that your statement "In other
contexts, generally a byte is an octet, aka 8 bits" isn't quite
true---you may have said "generally", but historically, even
"generally" wouldn't hold. There are differences between the
C++ definition of byte (e.g. in C++, a byte can be the same size
as a word, and it must be at least 8 bits, neither which are
true "generally"), but your posting seemed to give the
impression that the C++ definition was somehow "looser"; that in
other contexts a byte could only be 8 bits.

For example, why say "No."? I said that sizeof(char) ==
1 always, and I said a C++ byte may not be the same thing as
byte in other contexts, like Java, common desktops, common
networking, etc.

The intent of the standard is that the C++ byte be the same
thing as a byte in other contexts. (Not all other contexts, of
course. Java redefines byte even more than C++.) There are
only two cases where this should be violated: on machines which
don't have bytes, and on machines whose natural byte size is
less than 8 bits.

On common desktops, bytes are 8 bits. Both in general use, and
in C++.

In common networking, there aren't bytes. Networking protocols
are defined in terms of octets, not bytes, precisely because
byte isn't precise enough for them.

Exactly to what
does "No." refer? You mention hardware which does not have 8 bit bytes
as though that contradicts something which I said. I don't see what
that could possibly be. Moreover, immediately following this quote in
the same post, I admit of the existence of hardware which does not
have 8 bit bytes, so I don't see the benefit of singling out my post
and adding this as though you're correcting me.

Indeed. Would you be happier if I said "in other contexts /today/,
generally a byte is an octet, aka 8 bits."?

Not really, unless you defined the contexts. There are still
machines being sold today with 9 bit bytes.

Moreover, in the very next sentence of my first post in the thread
(quoted below), I mention that there is hardware, (today) exotic,
which does not have 8 bit bytes.

No, you cannot take the arbitrary information in a 64 bit char on one
system and shove that into an 8 bit char on another system, which is
exactly what I said. However, yes you can do serialization between
separate hardware which has differently sized bytes, which I did not
deny.

You cannot take arbitrary information on one system, and just
shove it into some arbitrary data type on another system.
That's true even if the size of byte is the same on both
systems. You certainly can serialize a 64 bit char in a way
that it can be read, without loss of information, on a system
with 8 bit char.

Goran · Sep 17, 2010

- Don't forget that transmission over the network may not be 100%
reliable. It is much easier to verify data serialised to text and
recover than it is for binary.

What a strange thing to say! And IMHO untrue, too. Parsing text gets
complicated quite easily.

On the plus side, with text, one often uses some sort of markup (e.g.
tag=value, or XML, or whatever). That helps with recovery. But that
only __seems__ easier, because same is just as easily done with
binary. Just tag every datum with an identifier, and you have same
thing (and there's still no need to parse numbers and markup).

Text is only interesting when stuff coming over the wire actually must
be read by a human, or during debugging (but then, converting binary
to text gives the same result).

Goran.

Alf P. Steinbach /Usenet · Sep 17, 2010

* James Kanze, on 17.09.2010 12:36:
[snip]

The only real misinformation was the impression your post gave,
that the "common" meaning of byte (outside of C/C++) was eight
bits. I wasn't disagreeing with your facts concerning C++, but
trying to point out that in the non-C++ context, byte is even
less precise than in C++; that your statement "In other
contexts, generally a byte is an octet, aka 8 bits" isn't quite
true---you may have said "generally", but historically, even
"generally" wouldn't hold. There are differences between the
C++ definition of byte (e.g. in C++, a byte can be the same size
as a word, and it must be at least 8 bits, neither which are
true "generally"), but your posting seemed to give the
impression that the C++ definition was somehow "looser"; that in
other contexts a byte could only be 8 bits.

Uhm, "generally" and the present tense "is" covers it well.

Some embedded computers have 16-bit bytes. E.g., <url:
http://focus.ti.com/lit/ug/spru024e/spru024e.pdf> documents a C compiler with
CHAR_BIT=16, and it's not older than that it may still be in use. However, is
there really any C++ compiler with CHAR_BIT=16?

I'd be interested to know which current machine has 9-bit bytes. Presumably that
would be a machine using 18-bit or 36-bit word addressing. And presumably you're
not referring to the EDSAC? <g>

Cheers,

- Alf

Keith H Duggar · Sep 18, 2010

Hi Everyone,
I have a question, and I hope its not obvious!

As I understand it, C++does not define the number of bits in a byte,
this is architecture independant. The thing we always know are that
sizeof(char) == 1. The sizees of int, long, etc will be architecture
dependant, ie the number of bits in an int on a 32bit machine may be
different to that on a 64bit machine.

... I am writing in Linux ...

Excellent. Type the following in your command line

man xdr

and go from there.

KHD

James Kanze · Sep 18, 2010

* James Kanze, on 17.09.2010 12:36:
[snip]

I'd be interested to know which current machine has 9-bit
bytes.

Unisys 2200.

Presumably that would be a machine using 18-bit or
36-bit word addressing.

It's 36 bit one's complement.

I've posted this information before (including a link to the C++
programming manual, IIRC).

Alf P. Steinbach /Usenet · Sep 18, 2010

* James Kanze, on 18.09.2010 12:16:

* James Kanze, on 17.09.2010 12:36:
[snip]

Click to expand...

I'd be interested to know which current machine has 9-bit
bytes.

Click to expand...

Unisys 2200.

Presumably that would be a machine using 18-bit or
36-bit word addressing.

Click to expand...

It's 36 bit one's complement.

I've posted this information before (including a link to the C++
programming manual, IIRC).

Thanks.

Hopefully that's the very last of the dinosaur-machines?

Cheers,

- Alf

Jorgen Grahn · Sep 18, 2010

What a strange thing to say! And IMHO untrue, too.

I agree with Yannick. I've been involved in debugging an ASN.1
encoding error since back in June, and we're still not done. It's easy
to see that the encoding is incorrect, but very hard to see why and
where it goes wrong.

Parsing text gets complicated quite easily.

Look at some of the popular protocols: HTTP, SMTP, NNTP. Very easy to
parse; even easier to read manually during debugging (which I assume
was what Yannick was thinking of).

On the plus side, with text, one often uses some sort of markup (e.g.
tag=value, or XML, or whatever). That helps with recovery. But that
only __seems__ easier, because same is just as easily done with
binary. Just tag every datum with an identifier, and you have same
thing (and there's still no need to parse numbers and markup).

Text is only interesting when stuff coming over the wire actually must
be read by a human, or during debugging (but then, converting binary
to text gives the same result).

Then you'd have to maintain two protocols: the binary you use live,
and a text protocol, with a debugging utility which converts to/from
the binary protocol.

Plus, that tool would be useless when you're looking at captured
network traffic, whereas tcpdump, tcpflow, Wireshark and similar tools
handle text-based protocols quite well.

I *really* recommend text-based protocols for all normal uses. If
it's too much data, add an optional zlib compression layer.

/Jorgen

Linux: using "clone3" and "waitid"	0	Oct 17, 2023
Integer sizes	14	Dec 4, 2003
C++/QT/ARM Processors Cross-compiling/Programming Problem	1	Jan 4, 2014
64 bit C++ and OS defined types	71	Apr 3, 2009
Types	58	Dec 10, 2006
[a litle OT] network programming	13	Sep 15, 2004
[ANN] Packet 0.1.3 - Library for Event Driven Network Programming	0	Feb 11, 2008
Looking for tips for portable printf() specifiers for signed ints ofvarying sizes	13	Jan 11, 2009

Sizes and types for network programming

Michael Hull

Ian Collins

Goran Pusic

Brian Wood

Brian Wood

Bo Persson

James Kanze

Joshua Maurice

James Kanze

James Kanze

Goran Pusic

red floyd

Joshua Maurice

James Kanze

Goran

Alf P. Steinbach /Usenet

Keith H Duggar

James Kanze

Alf P. Steinbach /Usenet

Jorgen Grahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads