Endian swaps with C++; comments please

Discussion in 'C++' started by Aaron Graham, Jan 31, 2006.

  1. Aaron Graham

    Aaron Graham Guest

    /**
    * Sample usage:
    * unsigned long longvar = 0x12345678;
    * unsigned long be_longvar = endian::host_to_big(longvar);
    * unsigned short shortvar = 0x1234;
    * unsigned short le_shortvar = endian::host_to_little(shortvar);
    */

    // for std::reverse:
    #include <algorithm>
    #include <limits>

    // for endian information:
    #include <endian.h>
    // Linux uses __BYTE_ORDER
    // FreeBSD and Apple/Darwin use _BYTE_ORDER
    // Some other BSD variants use BYTE_ORDER
    #if (defined __BYTE_ORDER && __BYTE_ORDER==__BIG_ENDIAN) || \
    (defined _BYTE_ORDER && _BYTE_ORDER== _BIG_ENDIAN) || \
    (defined BYTE_ORDER && BYTE_ORDER== BIG_ENDIAN)
    #define IS_BIG_ENDIAN 1
    #else
    #define IS_BIG_ENDIAN 0
    #endif

    namespace endian {

    // This function will copy the supplied value and return a byte-swapped
    // version of it. This function may/should be optimized for specific
    // architectures when necessary. It may also be necessary to create
    // partial specializations for certain types, since the current state
    // of this function only allows some fundamental types to be swapped.
    template <typename _type>
    _type byteswap(_type val) {
    if (std::numeric_limits<_type>::is_specialized &&
    !std::numeric_limits<_type>::is_signed) {
    // Found a type that is specialized and is unsigned.
    switch (sizeof(_type)) {
    case 1:
    return val;
    case 2:
    return ((val & 0x00ff) << 8) | ((val & 0xff00) >> 8);
    case 4:
    return ((val & 0x000000ff) << 24) | ((val & 0x0000ff00) << 8) |
    ((val & 0x00ff0000) >> 8) | ((val & 0xff000000) >> 24);
    }
    }
    // Swap this type using a different/fallback/hacky method:
    unsigned char* v = reinterpret_cast<unsigned char*>(&val);
    std::reverse(v, v + sizeof(_type));
    return val;
    }

    template <typename _type>
    _type host_to_big(_type val) {
    return IS_BIG_ENDIAN ? val : byteswap(val);
    }

    template <typename _type>
    _type host_to_little(_type val) {
    return IS_BIG_ENDIAN ? byteswap(val) : val;
    }

    template <typename _type>
    _type big_to_host(_type val) {
    return IS_BIG_ENDIAN ? val : byteswap(val);
    }

    template <typename _type>
    _type little_to_host(_type val) {
    return IS_BIG_ENDIAN ? byteswap(val) : val;
    }

    } // end namespace endian

    // Don't need this definition anymore:
    #undef IS_BIG_ENDIAN
    Aaron Graham, Jan 31, 2006
    #1
    1. Advertising

  2. Aaron Graham wrote:
    > [...]
    > switch (sizeof(_type)) {
    > case 1:
    > return val;
    > case 2:
    > return ((val & 0x00ff) << 8) | ((val & 0xff00) >> 8);


    This assumes that 'sizeof' returns the number of octets. It doesn't.
    It returns the number of 'bytes'. Please read up on the difference.

    > case 4:
    > return ((val & 0x000000ff) << 24) | ((val & 0x0000ff00) << 8) |
    > ((val & 0x00ff0000) >> 8) | ((val & 0xff000000) >> 24);
    > }
    > }
    > [..]


    V
    Victor Bazarov, Jan 31, 2006
    #2
    1. Advertising

  3. Aaron Graham

    Howard Guest

    "Aaron Graham" <> wrote in message
    news:...
    > /**
    > * Sample usage:
    > * unsigned long longvar = 0x12345678;
    > * unsigned long be_longvar = endian::host_to_big(longvar);
    > * unsigned short shortvar = 0x1234;
    > * unsigned short le_shortvar = endian::host_to_little(shortvar);
    > */


    I've never seen a need to swap actual integer variable values. The only
    time I execute any swapping code is when I'm writing out an integer-type
    variable to disk (or reading it back), when that data might be read on
    another platform. We decided on a standard for all integers in the files,
    and all platforms must write (and read) in that format.

    So, on each platform, we have read and write functions for the numeric data
    types, which stream in/out the data in the order we need.

    On the Mac, for example, the read and write functions simply read/write the
    bytes from first memory location to last, while on Windows, we read/write
    the bytes in reverse order.

    This way, there's never a stored numeric variable in memory (aside from
    perhaps in a buffer), which we have to worry about the "endianness" of.

    -Howard
    Howard, Jan 31, 2006
    #3
  4. Aaron Graham

    andrea Guest

    hello,

    have a look at:

    man htonl

    ("network" byte order is bigendian)


    > /**
    > * Sample usage:
    > * unsigned long longvar = 0x12345678;
    > * unsigned long be_longvar = endian::host_to_big(longvar);
    > * unsigned short shortvar = 0x1234;
    > * unsigned short le_shortvar = endian::host_to_little(shortvar);
    > */
    >
    > // for std::reverse:
    > #include <algorithm>
    > #include <limits>
    >
    > // for endian information:
    > #include <endian.h>
    > // Linux uses __BYTE_ORDER
    > // FreeBSD and Apple/Darwin use _BYTE_ORDER
    > // Some other BSD variants use BYTE_ORDER
    > #if (defined __BYTE_ORDER && __BYTE_ORDER==__BIG_ENDIAN) || \
    > (defined _BYTE_ORDER && _BYTE_ORDER== _BIG_ENDIAN) || \
    > (defined BYTE_ORDER && BYTE_ORDER== BIG_ENDIAN)
    > #define IS_BIG_ENDIAN 1
    > #else
    > #define IS_BIG_ENDIAN 0
    > #endif
    >
    > namespace endian {
    >
    > // This function will copy the supplied value and return a byte-swapped
    > // version of it. This function may/should be optimized for specific
    > // architectures when necessary. It may also be necessary to create
    > // partial specializations for certain types, since the current state
    > // of this function only allows some fundamental types to be swapped.
    > template <typename _type>
    > _type byteswap(_type val) {
    > if (std::numeric_limits<_type>::is_specialized &&
    > !std::numeric_limits<_type>::is_signed) {
    > // Found a type that is specialized and is unsigned.
    > switch (sizeof(_type)) {
    > case 1:
    > return val;
    > case 2:
    > return ((val & 0x00ff) << 8) | ((val & 0xff00) >> 8);
    > case 4:
    > return ((val & 0x000000ff) << 24) | ((val & 0x0000ff00) << 8) |
    > ((val & 0x00ff0000) >> 8) | ((val & 0xff000000) >> 24);
    > }
    > }
    > // Swap this type using a different/fallback/hacky method:
    > unsigned char* v = reinterpret_cast<unsigned char*>(&val);
    > std::reverse(v, v + sizeof(_type));
    > return val;
    > }
    >
    > template <typename _type>
    > _type host_to_big(_type val) {
    > return IS_BIG_ENDIAN ? val : byteswap(val);
    > }
    >
    > template <typename _type>
    > _type host_to_little(_type val) {
    > return IS_BIG_ENDIAN ? byteswap(val) : val;
    > }
    >
    > template <typename _type>
    > _type big_to_host(_type val) {
    > return IS_BIG_ENDIAN ? val : byteswap(val);
    > }
    >
    > template <typename _type>
    > _type little_to_host(_type val) {
    > return IS_BIG_ENDIAN ? byteswap(val) : val;
    > }
    >
    > } // end namespace endian
    >
    > // Don't need this definition anymore:
    > #undef IS_BIG_ENDIAN
    >
    andrea, Jan 31, 2006
    #4
  5. Aaron Graham

    Aaron Graham Guest

    andrea wrote:
    > have a look at:
    >
    > man htonl


    I'm already very familiar with it. I was looking for a more general
    solution. htonl only works for long, and htons only works for short.
    What about 64-bit quantities?

    And #include <netinet/in.h> brings in a lot of baggage (#defines
    mostly) that is not desirable in portable C++ code. For instance, if
    you #include <netinet/in.h> in vxWorks (and likely other BSD systems),
    you get #defines of the following symbols: m_len, m_data, m_type,
    m_flags, and many others. You can imagine what kind of problems you
    would have trying to port/compile code that uses hungarian notation
    (not that I use HN).

    Thanks for your suggestion.
    Aaron
    Aaron Graham, Jan 31, 2006
    #5
  6. Aaron Graham

    Aaron Graham Guest

    > This assumes that 'sizeof' returns the number of octets. It doesn't.
    > It returns the number of 'bytes'. Please read up on the difference.


    I was not familiar with the distinction. I suppose systems that use
    differently-sized-bytes would have to port this function, or let it
    fall back to std::reverse. I'm not averse to having to port this
    function for specific architectures, as long as the porting is highly
    localized. Obviously, some architectures have native endian swapping
    capabilities in their instruction sets, and it would be best to take
    advantage of those as well (as I said in the comments).

    Thanks for you input.
    Aaron
    Aaron Graham, Jan 31, 2006
    #6
  7. Aaron Graham

    Aaron Graham Guest

    > I've never seen a need to swap actual integer variable values. The only
    > time I execute any swapping code is when I'm writing out an integer-type
    > variable to disk (or reading it back), when that data might be read on
    > another platform. We decided on a standard for all integers in the files,
    > and all platforms must write (and read) in that format.
    >
    > So, on each platform, we have read and write functions for the numeric data
    > types, which stream in/out the data in the order we need.


    This begs the question a little bit. Somewhere, something has to do
    the endian swapping. Besides, I don't always have control over file
    formats I read and write. For instance, FLAC files use big endian for
    metadata blocks, but the Vorbis comment metadata block uses little
    endian internally.

    Aaron
    Aaron Graham, Jan 31, 2006
    #7
  8. Aaron Graham

    red floyd Guest

    andrea wrote:
    > hello,
    >
    > have a look at:
    >
    > man htonl
    >


    [redacted]

    1. Please do not top post.
    2. htonl is a good solution, but it is not part of Standard C++. It is
    a POSIX-ism that is implemented practically everywhere, but it's not in
    the Standard. As such, it doesn't meet the OP's choice for a standard
    C++ only solution. (of course, <endian.h> is also system specific....)
    red floyd, Jan 31, 2006
    #8
  9. Aaron Graham

    Aaron Graham Guest

    [...]
    > 2. htonl is a good solution, but it is not part of Standard C++. It is
    > a POSIX-ism that is implemented practically everywhere, but it's not in
    > the Standard. As such, it doesn't meet the OP's choice for a standard
    > C++ only solution. (of course, <endian.h> is also system specific....)


    htonl really _isn't_ a good solution, because it doesn't do anything on
    big-endian machines. What if you're trying to read little-endian data
    on a big-endian machine?

    I agree that the #include <endian.h> is an ugly wart. Is there a
    better way to know endianness at compile time? Is there a better
    standard compiler built-in that will give you this information?

    Aaron
    Aaron Graham, Feb 1, 2006
    #9
  10. Aaron Graham

    red floyd Guest

    Aaron Graham wrote:
    > [...]
    >> 2. htonl is a good solution, but it is not part of Standard C++. It is
    >> a POSIX-ism that is implemented practically everywhere, but it's not in
    >> the Standard. As such, it doesn't meet the OP's choice for a standard
    >> C++ only solution. (of course, <endian.h> is also system specific....)

    >
    > htonl really _isn't_ a good solution, because it doesn't do anything on
    > big-endian machines. What if you're trying to read little-endian data
    > on a big-endian machine?


    Oh, good point. I got fixated on putting stuff into network byte order,
    and forgot the general byteswap case.

    I think you'll have to go compiler dependent and use the appropriate
    manifest defines, or specify a command line option (or a custom endian.h
    for each target platform).
    red floyd, Feb 1, 2006
    #10
  11. Aaron Graham wrote:
    > /**
    > * Sample usage:
    > * unsigned long longvar = 0x12345678;
    > * unsigned long be_longvar = endian::host_to_big(longvar);
    > * unsigned short shortvar = 0x1234;
    > * unsigned short le_shortvar = endian::host_to_little(shortvar);
    > */


    Check this post:
    http://groups.google.com/group/comp.lang.c /msg/061db1be797a255f?hl=en&

    Usage:

    NetworkOrder<int> val = 0x12345678;

    int x = val;
    Gianni Mariani, Feb 1, 2006
    #11
  12. Aaron Graham wrote:

    > htonl really _isn't_ a good solution, because it doesn't do anything on
    > big-endian machines. What if you're trying to read little-endian data
    > on a big-endian machine?
    >
    > I agree that the #include <endian.h> is an ugly wart. Is there a
    > better way to know endianness at compile time? Is there a better
    > standard compiler built-in that will give you this information?


    Why do you need to know at compile-time ?

    The compiler's optimizer can (and does on compilers I've tested)
    eliminate dead code when doing a "run time" endianness check.

    This is one of those classic premature optimization issues.
    Gianni Mariani, Feb 1, 2006
    #12
  13. Aaron Graham

    andrea Guest

    >>2. htonl is a good solution, but it is not part of Standard C++. It is
    >>a POSIX-ism that is implemented practically everywhere, but it's not in
    >>the Standard. As such, it doesn't meet the OP's choice for a standard
    >>C++ only solution. (of course, <endian.h> is also system specific....)


    Well, it is not the Standard but from your snippet it was clear that you
    are working in a unix-like environment...

    > htonl really _isn't_ a good solution, because it doesn't do anything on
    > big-endian machines. What if you're trying to read little-endian data
    > on a big-endian machine?


    I understand the desire to generalize the code as much as possible but,
    IMHO, one should mainly aim at simplicity and efficiency. Foreseeing the
    possibility to read little-endian data could be good for completeness
    but I would write the data in bigendian, instead.

    bye,
    andrea
    andrea, Feb 1, 2006
    #13
  14. Aaron Graham wrote:
    > andrea wrote:
    > > have a look at:
    > >
    > > man htonl

    >
    > I'm already very familiar with it. I was looking for a more general
    > solution. htonl only works for long, and htons only works for short.
    > What about 64-bit quantities?


    Note that on 64-bit linux 8 == sizeof(long). I wonder if htonl operates
    on long rather than int32_t.
    Maxim Yegorushkin, Feb 1, 2006
    #14
  15. Aaron Graham

    Aaron Graham Guest

    > Well, it is not the Standard but from your snippet it was clear that you
    > are working in a unix-like environment...


    Well, I'm not working in a Windows environment, anyway...

    > I understand the desire to generalize the code as much as possible but,
    > IMHO, one should mainly aim at simplicity and efficiency. Foreseeing the
    > possibility to read little-endian data could be good for completeness
    > but I would write the data in bigendian, instead.


    I think my solution is simple and efficient and general. The compiler
    will take care of the optimizations to the point where it's just as
    efficient for longs as htonl is (more efficient, if you consider that
    swapbytes can be optimized for specific architectures).

    It's not possible to always write files in big-endian, because I don't
    dictate the endian-ness of popular file formats. If I write a wma file
    using big-endian, for instance, nobody else will be able to read it.

    Aaron
    Aaron Graham, Feb 1, 2006
    #15
  16. Aaron Graham

    Aaron Graham Guest

    > Why do you need to know at compile-time ?

    Endian swapping is used in tight loops all the time, and is commonly
    used on resource-lean embedded systems (I am often in situations where
    both of these points are applicable).

    > The compiler's optimizer can (and does on compilers I've tested)
    > eliminate dead code when doing a "run time" endianness check.


    I'm not sure I understand what you're saying. If the compiler can't
    determine at compile-time which branch you're going to be taking, it
    can't assume there's any dead code. If you mean that endian-ness
    checks that are commonly regarded as "runtime" are actually "compile
    time" checks with some compilers, then I think that may be true. But
    the most common one:

    unsigned x = 1;
    return !(*(char*)(&x));

    .... is not optimized away by gcc, even at the highest optimization
    level (at least, not in any of the disassemblies I've looked at).
    There's probably a good reason for it, but I don't know what that is.

    > This is one of those classic premature optimization issues.


    How do you know this? How do you know I'm not attempting to create a
    good general solution to a problem where I've determined that endian
    swapping is a significant contributor to slow performance?

    Aaron
    Aaron Graham, Feb 1, 2006
    #16
  17. Aaron Graham wrote:

    > unsigned x = 1;
    > return !(*(char*)(&x));
    >
    > ... is not optimized away by gcc, even at the highest optimization
    > level (at least, not in any of the disassemblies I've looked at).
    > There's probably a good reason for it, but I don't know what that is.


    In my investigations, that *was* optimized away including the dead code.

    What did you test ?
    Gianni Mariani, Feb 1, 2006
    #17
  18. Aaron Graham

    Howard Guest

    "Aaron Graham" <> wrote in message
    news:...
    >> I've never seen a need to swap actual integer variable values. The only
    >> time I execute any swapping code is when I'm writing out an integer-type
    >> variable to disk (or reading it back), when that data might be read on
    >> another platform. We decided on a standard for all integers in the
    >> files,
    >> and all platforms must write (and read) in that format.
    >>
    >> So, on each platform, we have read and write functions for the numeric
    >> data
    >> types, which stream in/out the data in the order we need.

    >
    > This begs the question a little bit. Somewhere, something has to do
    > the endian swapping. Besides, I don't always have control over file
    > formats I read and write. For instance, FLAC files use big endian for
    > metadata blocks, but the Vorbis comment metadata block uses little
    > endian internally.
    >
    > Aaron
    >


    There doesn't ever have to be any swapping, as such. All data-ordering can
    be done while reading and writing. If you know the ordering of the data to
    be read or written, code that into your reading and writing routines for the
    specific data you're handling. And there's no need to know what your
    machine's internal byte-ordering is, since you can use mask&shift (or
    multiplication/division) operations, which work the same, regardless of the
    internal physical byte-ordering.

    I'm pretty sure this is covered in the FAQ...?

    -Howard
    Howard, Feb 1, 2006
    #18
  19. Aaron Graham

    Aaron Graham Guest

    Gianni Mariani wrote:
    > Aaron Graham wrote:
    >
    > > unsigned x = 1;
    > > return !(*(char*)(&x));
    > >
    > > ... is not optimized away by gcc, even at the highest optimization
    > > level (at least, not in any of the disassemblies I've looked at).
    > > There's probably a good reason for it, but I don't know what that is.

    >
    > In my investigations, that *was* optimized away including the dead code.
    >
    > What did you test ?


    Okay, you're right: I tried a couple more compilers that I have
    sitting around on one of my dev machines. It seems that gcc 2.95.2
    does the optimization, but I can't get it to happen with the latest gcc
    4.0.2 for linux-x86. Maybe a bug?

    Try this:
    #include <stdio.h>
    void tell_endian() {
    unsigned x = 1;
    if (*(char*)&x) printf("little endian\n");
    else printf("big endian\n");
    }

    Doing an objdump of the results of "gcc-4.0.2 -O3 -c -o foo foo.c"
    gives me this:

    00000000 <tell_endian>:
    0: 55 push %ebp
    1: 89 e5 mov %esp,%ebp
    3: 83 ec 18 sub $0x18,%esp
    6: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
    d: 80 7d fc 00 cmpb $0x0,0xfffffffc(%ebp)
    11: 74 15 je 28 <tell_endian+0x28>
    13: 83 ec 0c sub $0xc,%esp
    16: 68 00 00 00 00 push $0x0
    17: R_386_32 .rodata.str1.1
    1b: e8 fc ff ff ff call 1c <tell_endian+0x1c>
    1c: R_386_PC32 printf
    20: 83 c4 10 add $0x10,%esp
    23: c9 leave
    24: c3 ret
    25: 8d 76 00 lea 0x0(%esi),%esi
    28: 83 ec 0c sub $0xc,%esp
    2b: 68 1b 00 00 00 push $0x1b
    2c: R_386_32 .rodata.str1.1
    30: e8 fc ff ff ff call 31 <tell_endian+0x31>
    31: R_386_PC32 printf
    35: 83 c4 10 add $0x10,%esp
    38: c9 leave
    39: c3 ret
    Aaron Graham, Feb 1, 2006
    #19
  20. Aaron Graham wrote:
    > Gianni Mariani wrote:
    >
    >>Aaron Graham wrote:
    >>
    >>
    >>> unsigned x = 1;
    >>> return !(*(char*)(&x));
    >>>
    >>>... is not optimized away by gcc, even at the highest optimization
    >>>level (at least, not in any of the disassemblies I've looked at).
    >>>There's probably a good reason for it, but I don't know what that is.

    >>
    >>In my investigations, that *was* optimized away including the dead code.
    >>
    >>What did you test ?

    >
    >
    > Okay, you're right: I tried a couple more compilers that I have
    > sitting around on one of my dev machines. It seems that gcc 2.95.2
    > does the optimization, but I can't get it to happen with the latest gcc
    > 4.0.2 for linux-x86. Maybe a bug?


    I changed it to:

    bool tell_endian()
    {
    unsigned x = 1;
    return *(char*)&x;
    }

    g++ 3.4.2 produces:

    00000000 <_Z11tell_endianv>:
    0: 55 push %ebp
    1: 89 e5 mov %esp,%ebp
    3: b8 01 00 00 00 mov $0x1,%eax
    8: c9 leave
    9: c3 ret


    g++ 4.0.0 produces:

    0: 55 push %ebp
    1: 89 e5 mov %esp,%ebp
    3: 83 ec 10 sub $0x10,%esp
    6: c7 45 fc 01 00 00 00 movl $0x1,0xfffffffc(%ebp)
    d: 31 c0 xor %eax,%eax
    f: 80 7d fc 00 cmpb $0x0,0xfffffffc(%ebp)
    13: 0f 95 c0 setne %al
    16: c9 leave
    17: c3 ret

    compile line:
    g++ -O3 -c -o endian_test.o endian_test.cpp


    Seem like a serious optimizer regression to me.

    With g++ 3.4.2 it appears that it creates the right code even on -O1
    level optimization.
    Gianni Mariani, Feb 2, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hicham
    Replies:
    2
    Views:
    9,007
    dxcoder
    Jul 2, 2003
  2. Ernst Murnleitner

    float: IEEE, big endian, little endian

    Ernst Murnleitner, Jan 13, 2004, in forum: C++
    Replies:
    0
    Views:
    851
    Ernst Murnleitner
    Jan 13, 2004
  3. invincible

    Little Endian to Big Endian

    invincible, Jun 14, 2005, in forum: C++
    Replies:
    9
    Views:
    14,325
    Old Wolf
    Jun 14, 2005
  4. invincible
    Replies:
    1
    Views:
    537
    red floyd
    Jun 14, 2005
  5. hicham

    convert from big-endian to little-endian

    hicham, Jun 30, 2003, in forum: C Programming
    Replies:
    0
    Views:
    1,514
    hicham
    Jun 30, 2003
Loading...

Share This Page