Big-endian, little-endian and sizeof() in different systems

Discussion in 'C++' started by Javier, Jun 15, 2007.

  1. Javier

    Javier Guest

    Hello people,
    I'm recoding a library that made a few months ago, and now that I'm
    reading what I wrote I have some questions.

    My program reads black and white images from a bitmap (BMP 24bpp
    without compression). It has it's words and dwords stored in little-
    endian, so I do a conversion to big-endian when reading full words or
    dwords.
    I have done this because my system is big-endian.
    But now... what if one compiles the library in a little-endian system?

    And... I use char (which I have readed that is equal to unsigned short
    int) as 'byte'.
    And this is the other question: is sizeof(char) a 'byte' always?
    How can I define byte, word and dword (8, 16, 32 bits) without making
    the asumption that are sizeof(char) is a byte (8 bits).

    Thanks.
     
    Javier, Jun 15, 2007
    #1
    1. Advertising

  2. Javier

    dasjotre Guest

    On 15 Jun, 14:57, Javier <> wrote:
    > Hello people,
    > I'm recoding a library that made a few months ago, and now that I'm
    > reading what I wrote I have some questions.
    >
    > My program reads black and white images from a bitmap (BMP 24bpp
    > without compression). It has it's words and dwords stored in little-
    > endian, so I do a conversion to big-endian when reading full words or
    > dwords.


    hton and ntoh, mostly used in networking.

    > I have done this because my system is big-endian.
    > But now... what if one compiles the library in a little-endian system?


    if you use hton and ntoh when reading the files you will have no
    problem.

    > And... I use char (which I have readed that is equal to unsigned short
    > int) as 'byte'.
    > And this is the other question: is sizeof(char) a 'byte' always?
    > How can I define byte, word and dword (8, 16, 32 bits) without making
    > the asumption that are sizeof(char) is a byte (8 bits).
    >


    sizeof(char) is always 1.
    you could use stdint.h
    it s C header (boost has cstdint.hpp too)
    if defines fixed width types
    like intXX_t where XX is number of bits

    regards

    DS
     
    dasjotre, Jun 15, 2007
    #2
    1. Advertising

  3. Javier

    Andre Kostur Guest

    Javier <> wrote in news:1181915876.026011.283520
    @q66g2000hsg.googlegroups.com:

    > Hello people,
    > I'm recoding a library that made a few months ago, and now that I'm
    > reading what I wrote I have some questions.
    >
    > My program reads black and white images from a bitmap (BMP 24bpp
    > without compression). It has it's words and dwords stored in little-
    > endian, so I do a conversion to big-endian when reading full words or
    > dwords.
    > I have done this because my system is big-endian.
    > But now... what if one compiles the library in a little-endian system?


    Drifting somewhat off-topic (endianness is a platform-specific issue).
    Many systems have some sort of include file which will define a macro
    which will tell you the endianness of the platform that you're compiling
    for. Using that knowledge you can construct a function which converts
    from little-endian to the endianness of the platform that you're on.
    (I'd suggest using the hton* and ntoh* family of functions but those go
    between host and big-endian). So anytime you need to be concerned about
    the endianness you can pass it to your function to convert it from
    little-endian to host-endian (which means that for some platforms your
    function does nothing, and some it does the byte flip).

    > And... I use char (which I have readed that is equal to unsigned short
    > int) as 'byte'.


    Where did you read that? All the standard says:

    sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

    (And certain minimum range constraints) In most platforms that I've dealt
    with, sizeof(char) is 1, and sizeof(short) is 2.

    > And this is the other question: is sizeof(char) a 'byte' always?
    > How can I define byte, word and dword (8, 16, 32 bits) without making
    > the asumption that are sizeof(char) is a byte (8 bits).


    Use platform-specific includes. (Or more recent C headers, IIRC). Or
    find some sort of portability layer library. Some compilers define
    things like uint8_t, uint16_t and the like. Or libraries such as ACE
    defines ACE_UINT32, ACE_UINT64, and that sort of thing.
     
    Andre Kostur, Jun 15, 2007
    #3
  4. Javier

    dasjotre Guest

    On 15 Jun, 16:53, Andre Kostur <> wrote:
    > Where did you read that? All the standard says:
    >
    > sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)
    >
    > (And certain minimum range constraints) In most platforms that I've dealt
    > with, sizeof(char) is 1, and sizeof(short) is 2.


    sizeof(char) must be 1 regardless of actual implementation.

    regards

    DS
     
    dasjotre, Jun 15, 2007
    #4
  5. Javier

    James Kanze Guest

    On Jun 15, 3:57 pm, Javier <> wrote:

    > I'm recoding a library that made a few months ago, and now that I'm
    > reading what I wrote I have some questions.


    > My program reads black and white images from a bitmap (BMP 24bpp
    > without compression). It has it's words and dwords stored in little-
    > endian, so I do a conversion to big-endian when reading full words or
    > dwords.
    > I have done this because my system is big-endian.
    > But now... what if one compiles the library in a little-endian system?


    The endian-ness of the internal represention shouldn't make a
    difference. You deal with values, not with physical
    representation. Basically, to read little endian, you use
    something like:

    uint32_t
    read( uint8_t const* buffer )
    {
    return (uint32_t( buffer[ 0 ] ) )
    || (uint32_t( buffer[ 1 ] ) << 8)
    || (uint32_t( buffer[ 2 ] ) << 16)
    || (uint32_t( buffer[ 3 ] ) << 24) ;
    }

    Works regardless of the byte order. (I've seen at least 3
    different byte orders for 32 bit integers.)

    > And... I use char (which I have readed that is equal to unsigned short
    > int) as 'byte'.


    That's generally not true. On most machines today (there are a
    few exceptions), char is 8 bits; short must be at least 16.
    Also, very often, char is signed. I tend to avoid it for that
    reason as well; shifting signed values doesn't always work as
    expected.

    > And this is the other question: is sizeof(char) a 'byte' always?


    That's the definition in the standard: char is a byte.
    Sizeof(char) is guaranteed to be 1. As I said above, on most
    machines today, it is 8 bits. The standard requires at least 8
    bits, although in the past, 6 and 7 bit bytes were common (as
    were 9 and 10 bits). From what I have heard, some DSP define
    char to have 32 bits, with all of the integral types having a
    sizeof 1. Also legal.

    > How can I define byte, word and dword (8, 16, 32 bits) without making
    > the asumption that are sizeof(char) is a byte (8 bits).


    How portable do you want to be? C has a header, <stdint.h>,
    which conditionally defines a certain number of integral types
    with fixed, exact length, i.e. uint8_t is an unsigned integral
    type with exactly 8 bits, int32_t is a signed, 2's complement
    integral type with exactly 32 bits, etc. If the underlying
    hardware doesn't support the type, it is not defined.
    Regretfully, support for this header seems to be rather spotty.
    But it's not too difficult to knock up your own version; put it
    in an isolated, system dependant directory, where you know that
    you have to adapt it each time you port to a new machine.

    As I said, however, the presence of the definitions are
    conditionned on the existance of the actual types. Not every
    machine around today uses 8 bit bytes, and not every machine
    uses 2's complement. Still, for many applications, portability
    to only those machines that do is a quite acceptable
    restriction.

    (And BTW: a word is normally 32 bits, and a dword 64. 16 bits
    is a hword, at least in IBM-speak.)

    --
    James Kanze (Gabi Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Jun 16, 2007
    #5
  6. Javier

    Gavin Deane Guest

    On 16 Jun, 11:57, James Kanze <> wrote:
    > The endian-ness of the internal represention shouldn't make a
    > difference. You deal with values, not with physical
    > representation. Basically, to read little endian, you use
    > something like:
    >
    > uint32_t
    > read( uint8_t const* buffer )
    > {
    > return (uint32_t( buffer[ 0 ] ) )
    > || (uint32_t( buffer[ 1 ] ) << 8)
    > || (uint32_t( buffer[ 2 ] ) << 16)
    > || (uint32_t( buffer[ 3 ] ) << 24) ;
    > }


    Did you mean | instead of || there?

    Gavin Deane
     
    Gavin Deane, Jun 16, 2007
    #6
  7. On Fri, 15 Jun 2007 13:57:56 -0000, Javier wrote:

    >Hello people,
    >I'm recoding a library that made a few months ago, and now that I'm
    >reading what I wrote I have some questions.
    >
    >My program reads black and white images from a bitmap (BMP 24bpp
    >without compression). It has it's words and dwords stored in little-
    >endian, so I do a conversion to big-endian when reading full words or
    >dwords.
    >I have done this because my system is big-endian.
    >But now... what if one compiles the library in a little-endian system?


    As James Kanze pointed out you don't have to worry about the internal
    representation used by your C++ implementation, only the external
    representation of the values. Unfortunately, that's a point that few
    people seem to understand (after I explained it in the corresponding
    talk page, for instance, someone still added a totally bogus
    "determining the byte order" example to the Endianness entry of the
    English Wikipedia).

    If the GNU GPL version 2 isn't a problem for you then you can find
    this useful:


    <http://breeze.svn.sourceforge.net/viewvc/breeze/trunk/breeze/endianness/endian_codec.hpp>

    (Since my library aims at being generally useful any feedback is very
    appreciated. NOTE: I haven't committed the file width.hpp yet: if you
    are dealing with unsigned types only then you can implement it as

    #include "breeze/meta/constant.hpp"
    #include <limits>

    namespace breeze {
    namespace meta {

    template< typename T >
    class width
    : public constant< T, std::numeric_limits< T >::digits >
    {
    };


    }
    }

    Eventually, you can also add a #include <cstddef> and this
    specialization

    template< typename T, std::size_t n >
    class width< T[ n ] >
    : public constant< std::size_t, n * width< T >::value >
    {
    };

    which will allow you to work with built-in arrays as well. Well, this
    is untested, I just typed it in the newsreader window, but it should
    work :))

    --
    Gennaro Prota -- Need C++ expertise? I'm available
    https://sourceforge.net/projects/breeze/
    (replace 'address' with 'name.surname' to mail)
     
    Gennaro Prota, Jun 16, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Eric J.Hu
    Replies:
    3
    Views:
    863
    Alexei A. Frounze
    Aug 29, 2005
  2. Replies:
    5
    Views:
    398
    Stephen Sprunk
    Aug 31, 2006
  3. Eric J.Hu
    Replies:
    7
    Views:
    558
    Jim Langston
    Sep 7, 2005
  4. aling
    Replies:
    8
    Views:
    983
    Pete Becker
    Oct 19, 2005
  5. bhatia

    Big Endian and Little Endian

    bhatia, Jul 7, 2006, in forum: C++
    Replies:
    2
    Views:
    502
    Robbie Hatley
    Jul 7, 2006
Loading...

Share This Page