Please someone test this on a Big-Endian System

T

ThazKool

I want to see if this code works the way it should on a Big-Endian
system. Also if anyone has any ideas on how determine this at
compile-time so that I use the right decoding or encoding functions, I
would greatly appreciate the help.

Thanks,
Ché


#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L'a';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;

if( testChar == 0 )
{
isLittleEndian = false;

// Big Endian should display '"Big Endian Success" here
std::cout << "Big Endian Success" << std::endl;

return 0;
}
 
H

Heinz Ozwirk

<quote>
I want to see if this code works the way it should on a Big-Endian
system. Also if anyone has any ideas on how determine this at
compile-time so that I use the right decoding or encoding functions, I
would greatly appreciate the help.

Thanks,
Ché


#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L'a';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;

if( testChar == 0 )
{
isLittleEndian = false;

// Big Endian should display '"Big Endian Success" here
std::cout << "Big Endian Success" << std::endl;

return 0;
}
</quote>

If might work, but it might not also do so. You are assuming that char and
wchar_t are different type. This may not always be the case. You also assume
that enough high bits of L'a' are zero to make a big endian system think a
char* pointing to a wchar_t actually points to an empty string. Then you are
using reinterpret_cast in a way that is undefined (or unspecified?)
behaviour (casting btween pointers to unrelated types always is). And
finally a pointer to a local variable will never be 0, so "testChar==0" will
never be true, no matter which byte order the system is using (if any).

To test for endiness you should

1) Test if CHAR_BITS (or its <climits> equivalent) is equal to 8. Endiness
is only defined for systems internally using octets. If CHAR_BITS is not
equal to 8 you cannot access octets on that system, at least not in an easy
way.

2) Test if sizeof(wchar_t) == 2. Endiness is only defined for pairs of
octets. So, if wchar_t is not a pair of octets, you have to think about
something else.

3) Assign a well known value to a wchar_t variable. (L'a' is not a well
known value. There are good chances that it will be 0x0061, but it might be
something completly different.) Use something like 0xFEFF instead. (0xFEFF
is the Unicode byte-order-mark, but other values will do, too.) Then get the
value of the two chars (octets) occupying the same space as the variable and
compare them with 0xFE and 0xFF:

wchar_t wc = 0xFEFF;
unsigned char const* cp = reinterpret_cast<unsigned char*>(&wc);
if (cp[0] == 0xFE && cp[1] == 0xFF)
{
// Big-Endian
}
else if (cp[0] == 0xFF && cp[1] == 0xFE)
{
// Little-Endian
}
else
{
// Something completly different
}

Alas, that code also depends on a cast of pointer to unrelated types.

But why do you need to know the endiness of the system your program runs on?
Usually you only have convert form one kind of byte-order to another when
you are reading from an external source (file, network connection) or
writing to such a destination. And in those situation you can easyly convert
between the external format and the format used in a program without knowing
the byte-order of the system itself. You only have to know the external byte
order. Then you can convert in a portable way.

To read a Unicode (UCS-16) string, read the string into an array of bytes
(unisgned char will probably be a god choid on most systems, but add some
test that CHAR_BITS is really equal to 8). The convert pairs of those octets
into values of a type large enough to hold an UCS-16 character:

if (ExternalFormatIsLittleEndian)
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString + 256 *
externalString[i + 1];
}
else
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString * 256 +
externalString[i + 1];
}

Before you write internal data to an external destination, you must of cause
convert your internal representation to the external one, but again you can
do so without knowing the internal byte order. You only have to know how
bytes should be arranged outside your program.

HTH
Heinz
 
T

ThazKool

Heinz said:
<quote>
I want to see if this code works the way it should on a Big-Endian
system. Also if anyone has any ideas on how determine this at
compile-time so that I use the right decoding or encoding functions, I
would greatly appreciate the help.

Thanks,
Ché


#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L'a';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;

if( testChar == 0 )
{
isLittleEndian = false;

// Big Endian should display '"Big Endian Success" here
std::cout << "Big Endian Success" << std::endl;

return 0;
}
</quote>

If might work, but it might not also do so. You are assuming that char and
wchar_t are different type. This may not always be the case. You also assume
that enough high bits of L'a' are zero to make a big endian system think a
char* pointing to a wchar_t actually points to an empty string. Then you are
using reinterpret_cast in a way that is undefined (or unspecified?)
behaviour (casting btween pointers to unrelated types always is). And
finally a pointer to a local variable will never be 0, so "testChar==0" will
never be true, no matter which byte order the system is using (if any).

To test for endiness you should

1) Test if CHAR_BITS (or its <climits> equivalent) is equal to 8. Endiness
is only defined for systems internally using octets. If CHAR_BITS is not
equal to 8 you cannot access octets on that system, at least not in an easy
way.

2) Test if sizeof(wchar_t) == 2. Endiness is only defined for pairs of
octets. So, if wchar_t is not a pair of octets, you have to think about
something else.

3) Assign a well known value to a wchar_t variable. (L'a' is not a well
known value. There are good chances that it will be 0x0061, but it might be
something completly different.) Use something like 0xFEFF instead. (0xFEFF
is the Unicode byte-order-mark, but other values will do, too.) Then get the
value of the two chars (octets) occupying the same space as the variable and
compare them with 0xFE and 0xFF:

wchar_t wc = 0xFEFF;
unsigned char const* cp = reinterpret_cast<unsigned char*>(&wc);
if (cp[0] == 0xFE && cp[1] == 0xFF)
{
// Big-Endian
}
else if (cp[0] == 0xFF && cp[1] == 0xFE)
{
// Little-Endian
}
else
{
// Something completly different
}

Alas, that code also depends on a cast of pointer to unrelated types.

But why do you need to know the endiness of the system your program runs on?
Usually you only have convert form one kind of byte-order to another when
you are reading from an external source (file, network connection) or
writing to such a destination. And in those situation you can easyly convert
between the external format and the format used in a program without knowing
the byte-order of the system itself. You only have to know the external byte
order. Then you can convert in a portable way.

To read a Unicode (UCS-16) string, read the string into an array of bytes
(unisgned char will probably be a god choid on most systems, but add some
test that CHAR_BITS is really equal to 8). The convert pairs of those octets
into values of a type large enough to hold an UCS-16 character:

if (ExternalFormatIsLittleEndian)
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString + 256 *
externalString[i + 1];
}
else
{
for (int i = 0; i < BytesRead; i += 2)
internalString[i / 2] = externalString * 256 +
externalString[i + 1];
}

Before you write internal data to an external destination, you must of cause
convert your internal representation to the external one, but again you can
do so without knowing the internal byte order. You only have to know how
bytes should be arranged outside your program.

HTH
Heinz


I really appreciate your help. There was at least one silly mistake as
I copied and added the code to main without testing. You are
completely correct on some of the issues that I was unaware of. My
desire to do this was formed out of uncertainty. I want to make
portable unicode handling functions that can interface directly with
say a person typing casually into C++ the const wchar_t* L"Hello World"
without worry.

Thank you for your help.
 
F

Frederick Gotham

ThazKool posted:

#include <iostream>

int main( int argc, char* argv[] )
{
// Default system to little endian
bool isLittleEndian = true;

// Check whether this platform is big-endian or little endian
wchar_t a = L'a';
unsigned char* testChar = reinterpret_cast<unsigned char*>( &a );

// Big Endian should display nothing on output here
std::cout << (unsigned char*) testChar << std::endl;


Oh good lord Jesus no!

There are perfectly portable ways of doing this, and this is not one of
them!

Check out some code I posted recently on comp.std.c++

http://groups.google.ie/group/comp.std.c++/msg/320642c7b4a21366?hl=en&
 
H

Howard

Gernot Frisch said:
Please do not use the name of the Lord in vein.
In "vein"? As in intravenously? I think you meant "in vain", as in "unless
you REALLY mean it!" :)

-Howard
 
F

Frederick Gotham

Gernot Frisch posted:

Please do not use the name of the Lord in vein.


Sorry.

I myself am not religious, and so have no quibble with such exclamations.

However, I realise that contributors to the group may be religious, and as
I have no desire to offend any of you, I will refrain from any such future
religious references, be they positive or negative.
 
T

ThazKool

Frederick said:
Gernot Frisch posted:




Sorry.

I myself am not religious, and so have no quibble with such exclamations.

However, I realise that contributors to the group may be religious, and as
I have no desire to offend any of you, I will refrain from any such future
religious references, be they positive or negative.

For me, I am spiritual and not blinded by religion. Your exclamation,
from what I gather, was said with love and tolerance for the group and
the curious. It was not associated with any resentment or other ill
spirits. If anything, I am blessed by your contribution. Being
spiritual, I will not harbor any resentment for your actions. Even
though I am catholic, I know that many of my brethren harbor resentment
on what they judge as unholy. The only thing I can say is do as jesus
does. If jesus where here, he would probably only say "Why not?", and
after seeing your contribution he would be pleased. There has been no
harm done here. I appreciate your code Frederick, and I have a
question for you.

I shortened some of it test for endianess. Your code is more robust
and gives you the actuall byte order. Should the byte order be
something that I should be concerned about or can I get away with my
shortened code? I guess since I am only dealing with wchar_t in my
code I will be ok. Also, I only want this code to work on compilers
that where sizeof( wchar_t ) == 2. I am not concerned about the rest.
Are there any CPU's that have unsequential byte orders out there? I am
curious and thanks once again.

template<typename T>
inline bool IsLittleEndian()
{
// Initialize the type to be tested
T testType = 0;

// Shift bits: 32bit type shifts left 24
testType = 1 << ( sizeof( T ) * 8 - 8 );

// Get the first byte of the type
const unsigned char *firstByte =
reinterpret_cast<unsigned char*>(&testType);

// Return true if type is little endian
if( *firstByte == 0 ) return true;
else return false;
}

template<typename T>
inline bool IsBigEndian()
{
// Initialize the type to be tested
T testType = 0;

// Shift bits: 32bit type shifts left 24
testType = 1 << ( sizeof( T ) * 8 - 8 );

// Get the first byte of the type
const unsigned char *firstByte =
reinterpret_cast<unsigned char*>(&testType);

// Return true if type is little endian
if( *firstByte == 0 ) return false;
else return true;
}
 
F

Frederick Gotham

ThazKool posted:
The only thing I can say is do as jesus does. If jesus where here, he
would probably only say "Why not?", and after seeing your contribution
he would be pleased.

I was with you up until the point where you made things subjective by
injecting your own religious beliefs.

People from all over the world view this newsgroup; people of different
nationalities, different religions, different cultures, different socio-
economic standings -- and they shouldn't have to read Christianity-specific
praise akin to your praising of Jesus above, nor should they have had to
read my original exclamation.

My own view is that the newsgroup should be kept free of religion PERIOD --
that means no religion-related exclamations (e.g. Jesus No!), no religion-
specific greetings (e.g As-Salamu Alaykum).

If you're religious / spirtual / philisophical, then that's great -- but
please keep it to yourself on this newsgroup. In communicational channels
like this one, such things divide more people than they unite.

I shortened some of it test for endianess. Your code is more robust
and gives you the actuall byte order. Should the byte order be
something that I should be concerned about or can I get away with my
shortened code?


If an unsigned integer consists of 4 bytes, then the number of possible
arrangements is the factorial of 4, i.e. 4!, which is 24.

My own code allows for all 24 arrangements. (Actually, it allows for any
number of bytes too, and thus any number of arrangements.)

The code could be simplified if it only had to distinguish between Big-
endian and Little-endian.

First though, two things must be assured:

(1) The integer type contains no padding.
(2) The integer type consists of at least 2 bytes.

I shall use boost's static assert to make sure of these things (
http://www.boost.org/doc/html/boost_staticassert.html ).

#include <iostream>
#include <limits>
#include <boost/static_assert.hpp>

enum Endianness { BigEndian, LittleEndian };

Endianness DetermineEndianness()
{
/* First, ensure that there's no padding: */

BOOST_STATIC_ASSERT( sizeof(unsigned) * CHAR_BIT
== std::numeric_limits<unsigned>::digits );


/* Now ensure that there's at least 2 bytes: */

BOOST_STATIC_ASSERT( sizeof(unsigned) >= 2 );


/* Now it's safe to play! */

unsigned i = 1;

return static_cast<Endianness>( reinterpret_cast<char&>(i) );
}

int main()
{
std::cout << "This machine is: ";

switch( DetermineEndianness() )
{
case LittleEndian:

std::cout << "Little-endian.\n";
break;

case BigEndian:

std::cout << "Big-endian.\n";
}
}

I guess since I am only dealing with wchar_t in my
code I will be ok.


No, no, no.

Also, I only want this code to work on compilers
that where sizeof( wchar_t ) == 2.


Fair enough, but as I have demonstrated, there's no need to go the non-
portable route.

Are there any CPU's that have unsequential byte orders out there?


Yes, and here's info on them:

http://en.wikipedia.org/wiki/Endianness
 
T

ThazKool

Frederick said:
ThazKool posted:


I was with you up until the point where you made things subjective by
injecting your own religious beliefs.

People from all over the world view this newsgroup; people of different
nationalities, different religions, different cultures, different socio-
economic standings -- and they shouldn't have to read Christianity-specific
praise akin to your praising of Jesus above, nor should they have had to
read my original exclamation.

My own view is that the newsgroup should be kept free of religion PERIOD --
that means no religion-related exclamations (e.g. Jesus No!), no religion-
specific greetings (e.g As-Salamu Alaykum).

If you're religious / spirtual / philisophical, then that's great -- but
please keep it to yourself on this newsgroup. In communicational channels
like this one, such things divide more people than they unite.




If an unsigned integer consists of 4 bytes, then the number of possible
arrangements is the factorial of 4, i.e. 4!, which is 24.

My own code allows for all 24 arrangements. (Actually, it allows for any
number of bytes too, and thus any number of arrangements.)

The code could be simplified if it only had to distinguish between Big-
endian and Little-endian.

First though, two things must be assured:

(1) The integer type contains no padding.
(2) The integer type consists of at least 2 bytes.

I shall use boost's static assert to make sure of these things (
http://www.boost.org/doc/html/boost_staticassert.html ).

#include <iostream>
#include <limits>
#include <boost/static_assert.hpp>

enum Endianness { BigEndian, LittleEndian };

Endianness DetermineEndianness()
{
/* First, ensure that there's no padding: */

BOOST_STATIC_ASSERT( sizeof(unsigned) * CHAR_BIT
== std::numeric_limits<unsigned>::digits );


/* Now ensure that there's at least 2 bytes: */

BOOST_STATIC_ASSERT( sizeof(unsigned) >= 2 );


/* Now it's safe to play! */

unsigned i = 1;

return static_cast<Endianness>( reinterpret_cast<char&>(i) );
}

int main()
{
std::cout << "This machine is: ";

switch( DetermineEndianness() )
{
case LittleEndian:

std::cout << "Little-endian.\n";
break;

case BigEndian:

std::cout << "Big-endian.\n";
}
}




No, no, no.




Fair enough, but as I have demonstrated, there's no need to go the non-
portable route.




Yes, and here's info on them:

http://en.wikipedia.org/wiki/Endianness

I agree with you wholeheartedly. Seperation of church from state;
seperation of church from code. Any suggestion can be taken or left
alone. Spirtuality is an attribute an atheist, an agnostic, or a
religious fanatic can have. It is not exclusive to the religious like
most people think. I have researched the real meanings of these words
where most have not. I appologize for not properly directing my
catholic, Jesus comment. It was directed to the one that blasted you.
That one did not realize that he was not following his own doctrine. I
disagree though with keeping spiritual principles out of the newsgroup.
I do agree with keeping religion out, and you are correct with the
fact that it divides more than unifies. Spiritual principles with love
and tolerance unite people such as "Help your brother coders and show
your appreciation when they help you". This benefits the groups, but
most people don't understand the difference between spirituality and
religion. They think that they are one in the same and they are not.
Well I do have to really and trully thank you. You have given me great
wisdom that have had difficulty accertaining from web browsing. I am
indebted to you.

Much thanks and much appreciation,
Ché
 
F

Frederick Gotham

ThazKool posted:
I agree with you wholeheartedly. Seperation of church from state;
seperation of church from code. Any suggestion can be taken or left
alone. Spirtuality is an attribute an atheist, an agnostic, or a
religious fanatic can have. It is not exclusive to the religious like
most people think. I have researched the real meanings of these words
where most have not.


Indeed, you come across as the kind of person who has put great thought
into these concepts. For the majority of people however, I'd say that
they have a very blurry distinction between religion, philosophy and
spirtuality, and so they feel like they're on thin ice when trying to
discuss one without mixing it with the other.

I appologize for not properly directing my
catholic, Jesus comment. It was directed to the one that blasted you.
That one did not realize that he was not following his own doctrine.


If you belong to Religion A, then it's very easy to notice when someone
inappropriately makes reference to Religion B (perhaps by greeting you in
a very Religion-specific way, e.g. "God be with you", or "As-Salamu
Alaykum"). However, when you do this yourself, it's harder to notice.

Sometimes though, phrases which have a religious foundation become worn
and are no longer particularly religious. I am not a Christian, nor do I
currently consider myself a practitioner of any religion, but I still
exclaim the likes of "Oh my God!", or "Jesus don't do that!". In my own
head, these are mere exclamations, and have no religious connontations.
My original exclamation in this thread was not intended as bearing any
religious significance.

However, this is a newsgroup, with people from all over the globe, and so
I can't presume to be interpreted by others in the way I wish to be
interpreted. Solution... ? Keep everything religion-neutral, nationality-
neutral, culture-neutral.

I disagree though with keeping spiritual principles out of the
newsgroup.


Yes, but as I mentioned above, some people may feel on thin ice trying
discussing spiritual priniciples.

Spiritual principles with love and tolerance unite people such as
"Help your brother coders and show your appreciation when they help
you".


(I think I call philosophy what you call spirituality.)

That's a fine example of conveying a spiritual or philisophical
priniciple without getting religion-specific. I myself would render it in
a more colloquial and gender-non-specific kind of way though:

Let's all help each other out, be courteous and appreciative, and we'll
all have a more enjoyable experience.

This benefits the groups, but most people don't understand the
difference between spirituality and religion. They think that they are
one in the same and they are not.


Yes, but that is purely because they haven't taken any considerable
amount of time to ponder over these concepts.

Well I do have to really and trully thank you. You
have given me great wisdom that have had difficulty accertaining from
web browsing.


You're welcome, I'm glad to help.

I am indebted to you.


Of course not! I help here because I enjoy it.
 
D

Diego Martins

Gernot said:
Please do not use the name of the Lord in vein.

what is wrong with the phrase "Oh good lord Jesus no!" ?
It was funny and expressed very well the writer feelings
I am sure Jesus, whenever He is, laughed together with me :)

I feel sick watching fanatics like Gernot Frisch.
These fanatic people are a bunch of bum onanists.

Fanatics do not code well, too

And have short dicks.
 
R

red floyd

Diego said:
what is wrong with the phrase "Oh good lord Jesus no!" ?
It was funny and expressed very well the writer feelings
I am sure Jesus, whenever He is, laughed together with me :)

To Gernot: You know, there *are* people who don't believe that Jesus
was divine. Besides, maybe Diego was talking to his buddy Jesus? Jesus
is a fairly common name among Latinos in the US.
 
F

Frederick Gotham

Diego Martins posted:

<snip>


And to think I was preparing a response up until I read the last line.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,208
Latest member
RandallLay

Latest Threads

Top