htons, htonl, ntohs, ntohl

J

James Harris

On the plus side htons and its friends are a great idea. On the minus side
they seem to be badly specified or at least incomplete.

I would say they are a great idea for two reasons:

1. they can be null operations on some hardware and thus cost nothing to use

2. rather than a programmer having to encode various byte extracts and
shifts htons etc can use the instructions provided in the machine's
instruction set to swap bytes around.

To illustrate that latter point, if we had to reverse the byte order of
two-byte and four-byte unsigned values in an HLL the code might be

((val >> 8) & 0xff) | ((val & 0xff) << 8)
((val & 0xff) << 24) | ((val & 0xff00) << 8) | ((val >> 8) & 0xff00) |
((val >> 24) & 0xff)

Hopefully the compiler will realise that it can drop some of the operations
but without such as htons or ntohs the programmer still has to write some
long-winded code. By contrast, many CPUs can carry out such changes much
more simply. For example, on a Pentium the two seqeunces above could be one
instruction each

rol ax, 8
bswap eax

The first swaps the two bytes of a 16-bit value. The second reverses the
order of all four bytes of a 32-bit value.

(Incidentally, that code gives the lie to the oft-made assertion that
assembly code has to be longer than HLL code!)

My point is that htons and friends are great in that they remove the need
for a programmer to fiddle with that stuff and can be extremely fast.
However, they have some weaknesses:

1. htons doesn't address the issue of communicating with a machine which has
a different idea of the size of a short. AIUI a short on one machine might
be 16-bit but on another 64-bit. (Hence it's poorly specified.)

2. htons is not designed for handling data that we know to be one endianness
or the other. In that case we know the endianness of the data; that's
defined by a spec. But we don't know the endianness of the machine we are
running on! ISTM the existing operations should remain because they do have
their uses but that there should be other similar operations for dealing
with specific sizes. (Hence they are incomplete.)

The above text is already long so I'll post separately about defining such
operations.

James
 
S

Siri Cruise

James Harris said:
My point is that htons and friends are great in that they remove the need
for a programmer to fiddle with that stuff and can be extremely fast.
However, they have some weaknesses:

These were intended for internet programming. Binary integers and unsigneds in
IPV4 packets are either one byte, two bytes, or four bytes, and multibyte
integers all have defined byte order, the network byte order. htons et al
transparently convert host representations and network packet representations.

Outside of Microsoft, other vendors were faced with binary representations that
had to work on different hosts, such as TIFF tags or tables in removable disc
packs. Some of these settled on the network byte order and depended on htons et
al.
 
J

James Kuyper

On 08/23/2013 08:17 AM, James Harris wrote:
....
1. htons doesn't address the issue of communicating with a machine which has
a different idea of the size of a short. AIUI a short on one machine might
be 16-bit but on another 64-bit. (Hence it's poorly specified.)

POSIX requires that htons be declared as
uint16_t htons(uint16_t hostshort);

uint16_t is required to have exactly 16 bits, and the size of a short is
irrelevant. If your system has a declaration that is in terms of short
int, then it's a different htons(), one that doesn't conform to POSIX,
at least not to the current version.
 
K

Keith Thompson

James Kuyper said:
On 08/23/2013 08:17 AM, James Harris wrote:
...

POSIX requires that htons be declared as
uint16_t htons(uint16_t hostshort);

uint16_t is required to have exactly 16 bits, and the size of a short is
irrelevant. If your system has a declaration that is in terms of short
int, then it's a different htons(), one that doesn't conform to POSIX,
at least not to the current version.

It could conform to POSIX on a system where uint16_t is a typedef for
unsigned short (since typedefs, as you know, don't create new types).
 
J

Joe Pfeiffer

James Harris said:
On the plus side htons and its friends are a great idea. On the minus side
they seem to be badly specified or at least incomplete.

I would say they are a great idea for two reasons:

1. they can be null operations on some hardware and thus cost nothing to use

2. rather than a programmer having to encode various byte extracts and
shifts htons etc can use the instructions provided in the machine's
instruction set to swap bytes around.

To illustrate that latter point, if we had to reverse the byte order of
two-byte and four-byte unsigned values in an HLL the code might be

((val >> 8) & 0xff) | ((val & 0xff) << 8)
((val & 0xff) << 24) | ((val & 0xff00) << 8) | ((val >> 8) & 0xff00) |
((val >> 24) & 0xff)

Hopefully the compiler will realise that it can drop some of the operations
but without such as htons or ntohs the programmer still has to write some
long-winded code. By contrast, many CPUs can carry out such changes much
more simply. For example, on a Pentium the two seqeunces above could be one
instruction each

rol ax, 8
bswap eax

The first swaps the two bytes of a 16-bit value. The second reverses the
order of all four bytes of a 32-bit value.

(Incidentally, that code gives the lie to the oft-made assertion that
assembly code has to be longer than HLL code!)

My point is that htons and friends are great in that they remove the need
for a programmer to fiddle with that stuff and can be extremely fast.
However, they have some weaknesses:

1. htons doesn't address the issue of communicating with a machine which has
a different idea of the size of a short. AIUI a short on one machine might
be 16-bit but on another 64-bit. (Hence it's poorly specified.)

The names are unfortunate; the functions are perfectly well specified.
The name htons() gives the impression that it's for 'short's, whatever
that means on the host machine, but it's actually declared (according to
the man page on my machine) as

uint16_t htons(uint16_t hostshort);

So it actually does the right thing with a 16 bit value, no matter what
the host machine's idea of a short is.
2. htons is not designed for handling data that we know to be one endianness
or the other. In that case we know the endianness of the data; that's
defined by a spec. But we don't know the endianness of the machine we are
running on! ISTM the existing operations should remain because they do have
their uses but that there should be other similar operations for dealing
with specific sizes. (Hence they are incomplete.)

I don't understand your point here. The whole idea is to make it
so we don't have to care about the endianness of the machine we're
writing our code on.
 
J

James Kuyper

It could conform to POSIX on a system where uint16_t is a typedef for
unsigned short (since typedefs, as you know, don't create new types).

You're right, of course. I was thinking mainly in terms of cases like
James Harris' hypothetical 64-bit short, for which that would not be
possible.
 
G

glen herrmannsfeldt

Joe Pfeiffer said:
(snip)

(snip)

The names are unfortunate; the functions are perfectly well specified.
The name htons() gives the impression that it's for 'short's, whatever
that means on the host machine, but it's actually declared (according to
the man page on my machine) as
uint16_t htons(uint16_t hostshort);
So it actually does the right thing with a 16 bit value, no matter what
the host machine's idea of a short is.

Seems to me less of a problem than the host machine's idea of long.

Except for some strange cases, short has been pretty consisitently
just 16 bits, but htonl() and ntohl(), were defined in terms of long.

In some years passed, int was either 16 or 32 bits, and long
was 32 bits. When Alpha came out, with 64 bit long, as well as I
understand it, all the IP code failed to compile.

-- glen
 
J

James Harris

Joe Pfeiffer said:
....


I don't understand your point here. The whole idea is to make it
so we don't have to care about the endianness of the machine we're
writing our code on.

I didn't express it well. I was thinking of the preprocessor's ignorance. I
meant that there should be macros for reading and writing data with specific
sizes and endianness such as

ui16_LE and ui16_BE
ui32_LE and ui32_BE
ui64_LE and ui64_BE
and possibly ui32_PE ;-)

On the subject of how things 'should' be the real solution would be if C
allowed multibyte declarations to be tagged with specific endiannesses, and
for structures to be defined without automatic padding. Then the above
macros would not be needed.

James
 
J

Joe Pfeiffer

glen herrmannsfeldt said:
Seems to me less of a problem than the host machine's idea of long.

Except for some strange cases, short has been pretty consisitently
just 16 bits, but htonl() and ntohl(), were defined in terms of long.

No, from the same man page:

uint32_t htonl(uint32_t hostlong);
In some years passed, int was either 16 or 32 bits, and long
was 32 bits. When Alpha came out, with 64 bit long, as well as I
understand it, all the IP code failed to compile.

Hopefully this resulted in the code being rewritten in terms of uint32_t
instead of long.
 
I

Ian Collins

Joe said:
No, from the same man page:

uint32_t htonl(uint32_t hostlong);

Note the past tense! The POSIX interfaces were updated post-C99, just
in time for the increase in popularity of little-endian 64 bit
platforms. I guess newcomers will miss the significance of the names.
Hopefully this resulted in the code being rewritten in terms of uint32_t
instead of long.

Alpha predated C99.
 
J

Joe Pfeiffer

James Harris said:
I didn't express it well. I was thinking of the preprocessor's ignorance. I
meant that there should be macros for reading and writing data with specific
sizes and endianness such as

ui16_LE and ui16_BE
ui32_LE and ui32_BE
ui64_LE and ui64_BE
and possibly ui32_PE ;-)

Ah, OK. Yes, I can see that could be a useful enhancement.
 
J

Jorgen Grahn

.
My point is that htons and friends are great in that they remove the need
for a programmer to fiddle with that stuff and can be extremely fast.

Well, other things can be extremely fast too. Compilers can optimize
the p[0] + 256*p[1] example I gave in one of the earlier threads, and
I wouldn't be surprised if someone told me they already do.
However, they have some weaknesses:

1. htons doesn't address the issue of communicating with a machine which has
a different idea of the size of a short. AIUI a short on one machine might
be 16-bit but on another 64-bit. (Hence it's poorly specified.)

2. htons is not designed for handling data that we know to be one endianness
or the other. [...]

3. When you've reached the point where you can and need to call
ntohs() you've already done something dangerous. In ntohs(n), where
did n come from? If not from the BSD socket API, most likely from
an expression involving ugly and not-obviously-safe casts such as
*(uint16_t*)buf.

/Jorgen
 
S

Stephen Sprunk

My point is that htons and friends are great in that they remove
the need for a programmer to fiddle with that stuff and can be
extremely fast.

Well, other things can be extremely fast too. Compilers can
optimize the p[0] + 256*p[1] example I gave in one of the earlier
threads, and I wouldn't be surprised if someone told me they already
do.

I've seen GCC on x86 recognize and replace the shift/or idiom with a
load (and byteswap, if applicable). It might also strength-reduce the
multiply/add version and then recognize the idiom, but the code would be
clearer to human readers if you used the idiom in the first place;
that's the point of idioms.

S
 
J

Jorgen Grahn

My point is that htons and friends are great in that they remove
the need for a programmer to fiddle with that stuff and can be
extremely fast.

Well, other things can be extremely fast too. Compilers can
optimize the p[0] + 256*p[1] example I gave in one of the earlier
threads, and I wouldn't be surprised if someone told me they already
do.

I've seen GCC on x86 recognize and replace the shift/or idiom with a
load (and byteswap, if applicable). It might also strength-reduce the
multiply/add version and then recognize the idiom, but the code would be
clearer to human readers if you used the idiom in the first place;
that's the point of idioms.

That's what I do in real code, but in that other thread I chose the
"+ 256*" thing to indicate that it was a sketch, not something to copy
& paste into your own code. I admit it wasn't quite clear there, and
it didn't become clearer when J.H. restarted the thread three times.

/Jorgen
 
J

James Harris

Stephen Sprunk said:
My point is that htons and friends are great in that they remove
the need for a programmer to fiddle with that stuff and can be
extremely fast.

Well, other things can be extremely fast too. Compilers can
optimize the p[0] + 256*p[1] example I gave in one of the earlier
threads, and I wouldn't be surprised if someone told me they already
do.

I've seen GCC on x86 recognize and replace the shift/or idiom with a
load (and byteswap, if applicable). It might also strength-reduce the
multiply/add version and then recognize the idiom, but the code would be
clearer to human readers if you used the idiom in the first place;
that's the point of idioms.

I'm puzzled as to why Jorgen's expression isn't idiomatic as it stands.

p[0] + 256 * p[1]
p[0] + (p[1] << 8)
p[0] | p[1] << 8

Is any one of these more or less idiomatic than the others?

That aside, the latter two should be faster if the compiler does nothing
clever.

James
 
J

James Kuyper

On 08/27/2013 06:36 PM, James Harris wrote:
....
p[0] + 256 * p[1]
p[0] + (p[1] << 8)
p[0] | p[1] << 8

Is any one of these more or less idiomatic than the others?

That aside, the latter two should be faster if the compiler does nothing
clever.

That's no necessarily true; I've heard rumors of machines where the
first one is the one that a naive compiler will generate the fastest
code for. Of course, compilers dumb enough to generate significantly
different code for those three expressions are pretty rare nowadays,
unless you deliberately disable optimization.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,900
Latest member
Nell636132

Latest Threads

Top