htons, htonl, ntohs, ntohl

Discussion in 'C Programming' started by James Harris, Aug 23, 2013.

  1. James Harris

    James Harris Guest

    On the plus side htons and its friends are a great idea. On the minus side
    they seem to be badly specified or at least incomplete.

    I would say they are a great idea for two reasons:

    1. they can be null operations on some hardware and thus cost nothing to use

    2. rather than a programmer having to encode various byte extracts and
    shifts htons etc can use the instructions provided in the machine's
    instruction set to swap bytes around.

    To illustrate that latter point, if we had to reverse the byte order of
    two-byte and four-byte unsigned values in an HLL the code might be

    ((val >> 8) & 0xff) | ((val & 0xff) << 8)
    ((val & 0xff) << 24) | ((val & 0xff00) << 8) | ((val >> 8) & 0xff00) |
    ((val >> 24) & 0xff)

    Hopefully the compiler will realise that it can drop some of the operations
    but without such as htons or ntohs the programmer still has to write some
    long-winded code. By contrast, many CPUs can carry out such changes much
    more simply. For example, on a Pentium the two seqeunces above could be one
    instruction each

    rol ax, 8
    bswap eax

    The first swaps the two bytes of a 16-bit value. The second reverses the
    order of all four bytes of a 32-bit value.

    (Incidentally, that code gives the lie to the oft-made assertion that
    assembly code has to be longer than HLL code!)

    My point is that htons and friends are great in that they remove the need
    for a programmer to fiddle with that stuff and can be extremely fast.
    However, they have some weaknesses:

    1. htons doesn't address the issue of communicating with a machine which has
    a different idea of the size of a short. AIUI a short on one machine might
    be 16-bit but on another 64-bit. (Hence it's poorly specified.)

    2. htons is not designed for handling data that we know to be one endianness
    or the other. In that case we know the endianness of the data; that's
    defined by a spec. But we don't know the endianness of the machine we are
    running on! ISTM the existing operations should remain because they do have
    their uses but that there should be other similar operations for dealing
    with specific sizes. (Hence they are incomplete.)

    The above text is already long so I'll post separately about defining such
    operations.

    James
     
    James Harris, Aug 23, 2013
    #1
    1. Advertisements

  2. James Harris

    Siri Cruise Guest

    These were intended for internet programming. Binary integers and unsigneds in
    IPV4 packets are either one byte, two bytes, or four bytes, and multibyte
    integers all have defined byte order, the network byte order. htons et al
    transparently convert host representations and network packet representations.

    Outside of Microsoft, other vendors were faced with binary representations that
    had to work on different hosts, such as TIFF tags or tables in removable disc
    packs. Some of these settled on the network byte order and depended on htons et
    al.
     
    Siri Cruise, Aug 23, 2013
    #2
    1. Advertisements

  3. James Harris

    James Kuyper Guest

    On 08/23/2013 08:17 AM, James Harris wrote:
    ....
    POSIX requires that htons be declared as
    uint16_t htons(uint16_t hostshort);

    uint16_t is required to have exactly 16 bits, and the size of a short is
    irrelevant. If your system has a declaration that is in terms of short
    int, then it's a different htons(), one that doesn't conform to POSIX,
    at least not to the current version.
     
    James Kuyper, Aug 23, 2013
    #3
  4. It could conform to POSIX on a system where uint16_t is a typedef for
    unsigned short (since typedefs, as you know, don't create new types).
     
    Keith Thompson, Aug 23, 2013
    #4
  5. James Harris

    Joe Pfeiffer Guest

    The names are unfortunate; the functions are perfectly well specified.
    The name htons() gives the impression that it's for 'short's, whatever
    that means on the host machine, but it's actually declared (according to
    the man page on my machine) as

    uint16_t htons(uint16_t hostshort);

    So it actually does the right thing with a 16 bit value, no matter what
    the host machine's idea of a short is.
    I don't understand your point here. The whole idea is to make it
    so we don't have to care about the endianness of the machine we're
    writing our code on.
     
    Joe Pfeiffer, Aug 23, 2013
    #5
  6. James Harris

    James Kuyper Guest

    You're right, of course. I was thinking mainly in terms of cases like
    James Harris' hypothetical 64-bit short, for which that would not be
    possible.
     
    James Kuyper, Aug 23, 2013
    #6
  7. Seems to me less of a problem than the host machine's idea of long.

    Except for some strange cases, short has been pretty consisitently
    just 16 bits, but htonl() and ntohl(), were defined in terms of long.

    In some years passed, int was either 16 or 32 bits, and long
    was 32 bits. When Alpha came out, with 64 bit long, as well as I
    understand it, all the IP code failed to compile.

    -- glen
     
    glen herrmannsfeldt, Aug 23, 2013
    #7
  8. James Harris

    James Harris Guest

    I didn't express it well. I was thinking of the preprocessor's ignorance. I
    meant that there should be macros for reading and writing data with specific
    sizes and endianness such as

    ui16_LE and ui16_BE
    ui32_LE and ui32_BE
    ui64_LE and ui64_BE
    and possibly ui32_PE ;-)

    On the subject of how things 'should' be the real solution would be if C
    allowed multibyte declarations to be tagged with specific endiannesses, and
    for structures to be defined without automatic padding. Then the above
    macros would not be needed.

    James
     
    James Harris, Aug 23, 2013
    #8
  9. James Harris

    Joe Pfeiffer Guest

    No, from the same man page:

    uint32_t htonl(uint32_t hostlong);
    Hopefully this resulted in the code being rewritten in terms of uint32_t
    instead of long.
     
    Joe Pfeiffer, Aug 23, 2013
    #9
  10. James Harris

    Ian Collins Guest

    Note the past tense! The POSIX interfaces were updated post-C99, just
    in time for the increase in popularity of little-endian 64 bit
    platforms. I guess newcomers will miss the significance of the names.
    Alpha predated C99.
     
    Ian Collins, Aug 24, 2013
    #10
  11. James Harris

    Joe Pfeiffer Guest

    Ah, OK. Yes, I can see that could be a useful enhancement.
     
    Joe Pfeiffer, Aug 25, 2013
    #11
  12. (snip, I wrote)
    When did uint32_t come out? Alpha was about 1992.

    -- glen
     
    glen herrmannsfeldt, Aug 26, 2013
    #12
  13. James Harris

    James Kuyper Guest

    It came out in 1999, as part of C99.
     
    James Kuyper, Aug 26, 2013
    #13
  14. James Harris

    Ian Collins Guest

    Didn't I answer that last week?
     
    Ian Collins, Aug 26, 2013
    #14
  15. James Harris

    Jorgen Grahn Guest

    .
    Well, other things can be extremely fast too. Compilers can optimize
    the p[0] + 256*p[1] example I gave in one of the earlier threads, and
    I wouldn't be surprised if someone told me they already do.
    3. When you've reached the point where you can and need to call
    ntohs() you've already done something dangerous. In ntohs(n), where
    did n come from? If not from the BSD socket API, most likely from
    an expression involving ugly and not-obviously-safe casts such as
    *(uint16_t*)buf.

    /Jorgen
     
    Jorgen Grahn, Aug 27, 2013
    #15
  16. I've seen GCC on x86 recognize and replace the shift/or idiom with a
    load (and byteswap, if applicable). It might also strength-reduce the
    multiply/add version and then recognize the idiom, but the code would be
    clearer to human readers if you used the idiom in the first place;
    that's the point of idioms.

    S
     
    Stephen Sprunk, Aug 27, 2013
    #16
  17. James Harris

    Jorgen Grahn Guest

    That's what I do in real code, but in that other thread I chose the
    "+ 256*" thing to indicate that it was a sketch, not something to copy
    & paste into your own code. I admit it wasn't quite clear there, and
    it didn't become clearer when J.H. restarted the thread three times.

    /Jorgen
     
    Jorgen Grahn, Aug 27, 2013
    #17
  18. James Harris

    James Harris Guest

    He did?

    James
     
    James Harris, Aug 27, 2013
    #18
  19. James Harris

    James Harris Guest

    I'm puzzled as to why Jorgen's expression isn't idiomatic as it stands.

    p[0] + 256 * p[1]
    p[0] + (p[1] << 8)
    p[0] | p[1] << 8

    Is any one of these more or less idiomatic than the others?

    That aside, the latter two should be faster if the compiler does nothing
    clever.

    James
     
    James Harris, Aug 27, 2013
    #19
  20. James Harris

    James Kuyper Guest

    On 08/27/2013 06:36 PM, James Harris wrote:
    ....
    That's no necessarily true; I've heard rumors of machines where the
    first one is the one that a naive compiler will generate the fastest
    code for. Of course, compilers dumb enough to generate significantly
    different code for those three expressions are pretty rare nowadays,
    unless you deliberately disable optimization.
     
    James Kuyper, Aug 27, 2013
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.