Big-endian, little-endian and sizeof() in different systems

Javier · Jun 15, 2007

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.
And this is the other question: is sizeof(char) a 'byte' always?
How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

Thanks.

dasjotre · Jun 15, 2007

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.

hton and ntoh, mostly used in networking.

I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

if you use hton and ntoh when reading the files you will have no
problem.

And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.
And this is the other question: is sizeof(char) a 'byte' always?
How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

sizeof(char) is always 1.
you could use stdint.h
it s C header (boost has cstdint.hpp too)
if defines fixed width types
like intXX_t where XX is number of bits

regards

DS

Andre Kostur · Jun 15, 2007

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

Drifting somewhat off-topic (endianness is a platform-specific issue).
Many systems have some sort of include file which will define a macro
which will tell you the endianness of the platform that you're compiling
for. Using that knowledge you can construct a function which converts
from little-endian to the endianness of the platform that you're on.
(I'd suggest using the hton* and ntoh* family of functions but those go
between host and big-endian). So anytime you need to be concerned about
the endianness you can pass it to your function to convert it from
little-endian to host-endian (which means that for some platforms your
function does nothing, and some it does the byte flip).

And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.

Where did you read that? All the standard says:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

(And certain minimum range constraints) In most platforms that I've dealt
with, sizeof(char) is 1, and sizeof(short) is 2.

And this is the other question: is sizeof(char) a 'byte' always?
How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

Use platform-specific includes. (Or more recent C headers, IIRC). Or
find some sort of portability layer library. Some compilers define
things like uint8_t, uint16_t and the like. Or libraries such as ACE
defines ACE_UINT32, ACE_UINT64, and that sort of thing.

dasjotre · Jun 15, 2007

Where did you read that? All the standard says:

sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long)

(And certain minimum range constraints) In most platforms that I've dealt
with, sizeof(char) is 1, and sizeof(short) is 2.

sizeof(char) must be 1 regardless of actual implementation.

regards

DS

James Kanze · Jun 16, 2007

I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

The endian-ness of the internal represention shouldn't make a
difference. You deal with values, not with physical
representation. Basically, to read little endian, you use
something like:

uint32_t
read( uint8_t const* buffer )
{
return (uint32_t( buffer[ 0 ] ) )
|| (uint32_t( buffer[ 1 ] ) << 8)
|| (uint32_t( buffer[ 2 ] ) << 16)
|| (uint32_t( buffer[ 3 ] ) << 24) ;
}

Works regardless of the byte order. (I've seen at least 3
different byte orders for 32 bit integers.)

And... I use char (which I have readed that is equal to unsigned short
int) as 'byte'.

That's generally not true. On most machines today (there are a
few exceptions), char is 8 bits; short must be at least 16.
Also, very often, char is signed. I tend to avoid it for that
reason as well; shifting signed values doesn't always work as
expected.

And this is the other question: is sizeof(char) a 'byte' always?

That's the definition in the standard: char is a byte.
Sizeof(char) is guaranteed to be 1. As I said above, on most
machines today, it is 8 bits. The standard requires at least 8
bits, although in the past, 6 and 7 bit bytes were common (as
were 9 and 10 bits). From what I have heard, some DSP define
char to have 32 bits, with all of the integral types having a
sizeof 1. Also legal.

How can I define byte, word and dword (8, 16, 32 bits) without making
the asumption that are sizeof(char) is a byte (8 bits).

How portable do you want to be? C has a header, <stdint.h>,
which conditionally defines a certain number of integral types
with fixed, exact length, i.e. uint8_t is an unsigned integral
type with exactly 8 bits, int32_t is a signed, 2's complement
integral type with exactly 32 bits, etc. If the underlying
hardware doesn't support the type, it is not defined.
Regretfully, support for this header seems to be rather spotty.
But it's not too difficult to knock up your own version; put it
in an isolated, system dependant directory, where you know that
you have to adapt it each time you port to a new machine.

As I said, however, the presence of the definitions are
conditionned on the existance of the actual types. Not every
machine around today uses 8 bit bytes, and not every machine
uses 2's complement. Still, for many applications, portability
to only those machines that do is a quite acceptable
restriction.

(And BTW: a word is normally 32 bits, and a dword 64. 16 bits
is a hword, at least in IBM-speak.)

Gavin Deane · Jun 16, 2007

The endian-ness of the internal represention shouldn't make a
difference. You deal with values, not with physical
representation. Basically, to read little endian, you use
something like:

uint32_t
read( uint8_t const* buffer )
{
return (uint32_t( buffer[ 0 ] ) )
|| (uint32_t( buffer[ 1 ] ) << 8)
|| (uint32_t( buffer[ 2 ] ) << 16)
|| (uint32_t( buffer[ 3 ] ) << 24) ;
}

Did you mean | instead of || there?

Gavin Deane

Gennaro Prota · Jun 16, 2007

Hello people,
I'm recoding a library that made a few months ago, and now that I'm
reading what I wrote I have some questions.

My program reads black and white images from a bitmap (BMP 24bpp
without compression). It has it's words and dwords stored in little-
endian, so I do a conversion to big-endian when reading full words or
dwords.
I have done this because my system is big-endian.
But now... what if one compiles the library in a little-endian system?

As James Kanze pointed out you don't have to worry about the internal
representation used by your C++ implementation, only the external
representation of the values. Unfortunately, that's a point that few
people seem to understand (after I explained it in the corresponding
talk page, for instance, someone still added a totally bogus
"determining the byte order" example to the Endianness entry of the
English Wikipedia).

If the GNU GPL version 2 isn't a problem for you then you can find
this useful:

<http://breeze.svn.sourceforge.net/viewvc/breeze/trunk/breeze/endianness/endian_codec.hpp>

(Since my library aims at being generally useful any feedback is very
appreciated. NOTE: I haven't committed the file width.hpp yet: if you
are dealing with unsigned types only then you can implement it as

#include "breeze/meta/constant.hpp"
#include <limits>

namespace breeze {
namespace meta {

template< typename T >
class width
: public constant< T, std::numeric_limits< T >::digits >
{
};

}
}

Eventually, you can also add a #include <cstddef> and this
specialization

template< typename T, std::size_t n >
class width< T[ n ] >
: public constant< std::size_t, n * width< T >::value >
{
};

which will allow you to work with built-in arrays as well. Well, this
is untested, I just typed it in the newsreader window, but it should
work

)

Big Endian and Little Endian	2	Jul 7, 2006
Little Endian to Big Endian	9	Jun 14, 2005
How to eliminate the bitmap difference in little endian and big endian?	7	Aug 29, 2005
stack increase direction and big-endian or little-endia	3	Oct 23, 2005
Setting Bitfields on big endian and little endian	1	Aug 9, 2008
About little big endian in C	33	Oct 8, 2007
Reading little-endian data from a file in a portable manner	46	Jul 16, 2010
BIG or little endian	26	Oct 16, 2007

Big-endian, little-endian and sizeof() in different systems

Javier

dasjotre

Andre Kostur

dasjotre

James Kanze

Gavin Deane

Gennaro Prota

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads