Macro for setting MSB - Intended to work on both Little andBig-endian machines

M

Myth__Buster

Hi All,

Here is my attempt for setting the MSB of an integer depending upon whetherthe underlying machine is Little or Big-endian. Any comments/suggestions/views are appreciated.

Here I have assumed though I don't store the 1ULL(LL - long long - to force1 to be stored in a multi-byte memory resource(say register) to hold the value 1) in a variable in my program, it will be accessed as a multi-byte value and hence 1 will be stored in the LSB of most-significant-byte of the value stored in a multi-byte memory resource(register) and not in the LSB ofleast-significant-byte of that resource. Please let me know if this is correct.

<Code>

#include <stdio.h>
#include <limits.h>

#define LSET_MSB(x) ((x) = (x) | 1ULL << (sizeof(x) * CHAR_BIT - 1))

#define BSET_MSB(x) ((x) = (x) | 1ULL << (CHAR_BIT - 1))

#define LIITE_ENDIAN (1ULL & 1)

int main(void)
{
unsigned long long int x = 1;

printf("x : %llu\n", x);
printf("x : %#llx\n", x);

if ( LIITE_ENDIAN )
{
printf("Little\n");
LSET_MSB(x);
}
else
{
printf("Big\n");
BSET_MSB(x);
}

printf("x : %llu\n", x);
printf("x : %#llx\n", x);

return 0;
}

</code>

<OutputOnMyMachine>
x : 1
x : 0x1
Little
x : 9223372036854775809
x : 0x8000000000000001
</OutputOnMyMachine>

Cheers,
Raghavan
 
M

Myth__Buster

Here I have assumed though I don't store the 1ULL(LL - long long - to force1 to be stored in a multi-byte memory resource(say register) to hold the value 1) in a variable in my program, it will be accessed as a multi-byte value and hence 1 will be stored in the LSB of most-significant-byte of the value stored in a multi-byte memory resource(register) and not in the LSB ofleast-significant-byte of that resource. Please let me know if this is correct.
 
M

Myth__Buster

By the way, I know the bitwise operations in C are independent of underlying machine's endian-ness and that's why I have attempted to figure out what's that
endian-ness.

Now, please note that the paragraph dealing with 1LL rather 1ULL in the corrected post is written with big-endian in mind but I forgot to mention it.

Okay, here is why I thought (1ULL & 1) in C would mean different from the usual or-ring two 1's: 1ULL, because of its type LL(long long) being guaranteed to be large enough to demand more than a byte to get stored. So, for a given big-endian machine where long long is 8 bytes wide, 1ULL will be stored(in some memory resource say register, even for temporary or intermediateusages) as the least-significant-bit(LS-bit) of its most-significant-byte(MS-byte) in memory being set given that LS-byte will be stored at an higheraddress on such a machine unlike on a little-endian machine.

However, now I realize that C abstracts the way in which 1ULL would be stored in memory and 1ULL will just mean number 1 regardless of the regardless of the underlying endian-ness.

The above confusion came up in my mind as I was comparing the most obvious way of checking out a machine's endian-ness as under with some other supposed way.

// Except that I have an explicit variable(actually, its memory location) to
// hold and represent the value 1, there is no difference from the numeral-
// literal 1 which would anyway be held in a temporary register for such
// operations.

int x = 1;
if ( *(char *)&x == 1 )
{
printf("Little-endian\n);
}

// How about this? I think this and above should be same, isn't it?

int x = 1;
if ( x & 1 )
{
printf("Little-endian\n");
}

---

However, I see that (x & 1) on big-endian clearly abstracts the way 1 is laid out in memory and just takes the LS-bit of MS-byte and and-s with numerical 1 as if 1 was at the LS-bit of MS-byte like on little-endian machine.


In essence, I just realized when you deal with memory directly with pointers to variables in C, you can get close to the endian-ness of the underlyinghardware but not when you deal with the variables just by their value.

In the end, it's all about direct vs indirect access! Good.
 
M

Myth__Buster

. . . that might be
((byte*)&x)[0], ((byte*)&x)[3], or ((byte*)&x)[1]

Yeah, agreed and I have just realized what I am doing exactly which I have written in my latest post above.

But, how can it be ((byte*)&x)[1]? This is possible if the machine is neither little nor big-endian but some mixed or different one altogether, right? And why not ((byte*)&x)[2], isn't there a machine which would give this in this context.
 
J

James Kuyper

By the way, I know the bitwise operations in C are independent of
underlying machine's endian-ness and that's why I have attempted to
figure out what's that endian-ness.

Now, please note that the paragraph dealing with 1LL rather 1ULL in
the corrected post is written with big-endian in mind but I forgot to
mention it.

Okay, here is why I thought (1ULL & 1) in C would mean different from
the usual or-ring two 1's: 1ULL, because of its type LL(long long)
being guaranteed to be large enough to demand more than a byte to get
stored. So, for a given big-endian machine where long long is 8 bytes
wide, 1ULL will be stored(in some memory resource say register, even
for temporary or intermediate usages) as the
least-significant-bit(LS-bit) of its most-significant-byte(MS-byte)

Endianness is detectable in portable C code only when a value is stored
in an object. That's because the only way to determine the endianess is
to access the individual bytes of the object, which requires used of a
union or type-punning. There's nothing you can portably do to determine
the endianness of a register, because you can't take the address of a
register, and you can't force a compiler to put a union object in a
register (the 'register' keyword is just a suggestion, which the
compiler is free to ignore).

I don't understand why you think the least significant bit would be
residing in the most significant byte. Whatever the reason is for that
expectation, it's incorrect.
in memory being set given that LS-byte will be stored at an higher
address on such a machine unlike on a little-endian machine.

However, now I realize that C abstracts the way in which 1ULL would
be stored in memory and 1ULL will just mean number 1 regardless of
the regardless of the underlying endian-ness.

The above confusion came up in my mind as I was comparing the most
obvious way of checking out a machine's endian-ness as under with
some other supposed way.

// Except that I have an explicit variable(actually, its memory
location) to
// hold and represent the value 1, there is no difference from the
numeral-
// literal 1 which would anyway be held in a temporary register for
such
// operations.

Actually, there's one huge and highly relevant difference. You can use
the expression &x when you have a variable, whereas &1 is a syntax
error. Without an address that can be converted to char*, there's no way
to test endianess.
int x = 1;
if ( *(char *)&x == 1 )
{
printf("Little-endian\n);
}

<pedantic>
Keep in mind that if sizeof(int)==4, which is quite common nowadays,
there are 4! = 24 different possible byte orders (most of which are
exceedingly uncommon). Only one of those orderings is called
little-endian, and only one is called big-endian - the others are
generically called middle-endian. At least two of those other orders
(2143 and 3412) have actually been used. I once ran into a web page that
identified 11 of those 24 orders as having been used in specific
contexts, which it identified - but I didn't think to bookmark it, and
I've never been able to find it again.

Your test will identify 6 of those possible orders as little-endian,
only one of which actually is. Strictly speaking, you can identify an
ordering as little endian only by checking the first sizeof(int)-1 bytes.
</pedantic>

<more pedantic>
The standard doesn't require that the least significant bit of an 'int'
be in the same location as the least significant bit when the byte that
contains it is interpreted as unsigned char. It is extremely unlikely
that this issue will ever come up.
</more pedantic>

<even more pedantic>
The standard doesn't even require that the CHAR_BIT least significant
bits are all stored in the same byte. In principle, an 8-bit byte could
contain bits 0,4,8,12,16,20,24, and 28. The next byte could contain bits
1,5,9,13,17,21,24, and 29, etc. There's a total of 64! different
possible bit-orderings. However, I think you can fairly assume that any
system for which such things are true was designed by aliens. :)
// How about this? I think this and above should be same, isn't it?

int x = 1;
if ( x & 1 )

Given x==1, the expression x&1 is guaranteed to be true regardless of
endianess (even in in the most pedantic case I discussed above), which
makes it a very bad way to test for endianess.
{
printf("Little-endian\n");
}

---

However, I see that (x & 1) on big-endian clearly abstracts the way 1
is laid out in memory and just takes the LS-bit of MS-byte and and-s
with numerical 1 as if 1 was at the LS-bit of MS-byte like on
little-endian machine.

If 'int' is bigendian, the least significant bit will be in the last
byte, which is the least significant byte when using a bigendian
representation. If 'int' is little-endian, that bit will be in the first
byte, which is the least significant byte when using little-endian
notation. But either way, it will reside in the least-significant byte,
not the most significant one.
 
G

guinness.tony

int x = 1;
if ( *(char *)&x == 1 )
{
printf("Little-endian\n);
}

For portability, you will also need

assert(sizeof x > 1);

There are many architectures out there (some TI DSPs spring to mind) where char and int share the same size. When that is the case, your test will not detect big-endianness.
 
K

Keith Thompson

Myth__Buster said:
*was at the LS-bit of LS-byte like on little-endian machine . . .

Without any quoted context, we can't tell what this refers to. Not all
newsreaders provide an easy way to view the parent article; mine does,
but it doesn't seem to provide an easy way to get back, so I don't use
it much. Please quote enough of the parent article for your followup to
make sense.
 
K

Keith Thompson

China Blue White said:
. . . that might be

((byte*)&x)[0], ((byte*)&x)[3], or ((byte*)&x)[1]

Yeah, agreed and I have just realized what I am doing exactly which I
have written in my latest post above.

But, how can it be ((byte*)&x)[1]? This is possible if the machine is
neither little nor big-endian but some mixed or different one
altogether, right? And why not ((byte*)&x)[2], isn't there a machine
which would give this in this context.

It's PDP-11/VAX endianess where the byte order is <1><msb><lsb><2>.[/QUOTE]

The PDP-11 used a middle-endian representation, but I don't believe the
VAX ever did.
 
M

Myth__Buster

For portability, you will also need



assert(sizeof x > 1);



There are many architectures out there (some TI DSPs spring to mind) where char and int share the same size. When that is the case, your test will not detect big-endianness.

Well, regardless of the sizeof(char) in terms o bits and hence the value CHAR_BIT, sizeof(char) is guaranteed to be 1. But, CHAR_BIT can be big enoughto represent the largest integer data type in C: long long i.e., sizeof(long long) == 1 is possible. So, if that is the case, in C there would beno portable way of checking the endian-ness of such a kind of machine. In fact, with respect to integers, endian-ness has no role in such a machine as there are no more than one byte even in the largest integer data type to play with the byte-order of that integer in the respective memory layout.
 
M

Myth__Buster

Endianness is detectable in portable C code only when a value is stored

in an object. That's because the only way to determine the endianess is

to access the individual bytes of the object, which requires used of a

union or type-punning.

Yes, I realized after writing the opening post of this thread that we really need to deal with memory identified by their addresses to figure out the byte-ordering. This I have commented in my above post to my earlier post.
There's nothing you can portably do to determine

the endianness of a register, because you can't take the address of a

register, and you can't force a compiler to put a union object in a

register (the 'register' keyword is just a suggestion, which the

compiler is free to ignore).

Yes, I am aware of the fact that registers are not addressable and 'register' keyword is not a command but request. However, I thought the compile-time constants such as 1ULL would also be held in a temporary memory location in big-endian representation before being moved to a register for any operation using that numerical-constant.
I don't understand why you think the least significant bit would be

residing in the most significant byte. Whatever the reason is for that

expectation, it's incorrect.

Well, I should have said, 'most-significant-address' instead of most-significant-byte. Sorry for the confusion.
Actually, there's one huge and highly relevant difference. You can use

the expression &x when you have a variable, whereas &1 is a syntax

error.

Yes, I know that &1 doesn't make any sense as '&' operator needs lvalue(location value) to operate on which compile-time-numerical-constant 1 is not associated with. But, I didn't mention here since I thought it was really necessary. In fact, I can go on mentioning such differences: 1++, 1--, --1, ++1,
&(1), 1 = 2, 1 += 2, and so on! :)
<pedantic>

Keep in mind that if sizeof(int)==4, which is quite common nowadays,

there are 4! = 24 different possible byte orders (most of which are

exceedingly uncommon). Only one of those orderings is called

little-endian, and only one is called big-endian - the others are

generically called middle-endian. At least two of those other orders

(2143 and 3412) have actually been used. I once ran into a web page that

identified 11 of those 24 orders as having been used in specific

contexts, which it identified - but I didn't think to bookmark it, and

I've never been able to find it again.

So, you mean we have to iterate over bytes and figure out from one byte at a time. Right?
Your test will identify 6 of those possible orders as little-endian,

only one of which actually is. Strictly speaking, you can identify an

ordering as little endian only by checking the first sizeof(int)-1 bytes.

</pedantic>

Yup. I hope you are referring to long long type here of size 8 bytes wherein you are not considering first and last bytes for middle/mixed endian-nesschecking.
<more pedantic>

The standard doesn't require that the least significant bit of an 'int'

be in the same location as the least significant bit when the byte that

contains it is interpreted as unsigned char. It is extremely unlikely

that this issue will ever come up.

</more pedantic>

Yeah, this is what allows us to check the endian-ness of a machine using a variable's address to know at what byte in it is the number 1 stored if that variable's value is 1. And if that issue comes up, then that would break many programs I guess even the simple ones:

int x = 1;
(*(unsigned char *)&x & 1) == 1; // This will be incorrectly true even with a
// big-endian machine on which sizeof(int)>
// sizeof(char).
<even more pedantic>

The standard doesn't even require that the CHAR_BIT least significant

bits are all stored in the same byte. In principle, an 8-bit byte could

contain bits 0,4,8,12,16,20,24, and 28. The next byte could contain bits

1,5,9,13,17,21,24, and 29, etc. There's a total of 64! different

possible bit-orderings. However, I think you can fairly assume that any

system for which such things are true was designed by aliens. :)

</even more pedantic>

And that would be a pain big time for the compiler designer! :)
Given x==1, the expression x&1 is guaranteed to be true regardless of

endianess (even in in the most pedantic case I discussed above), which

makes it a very bad way to test for endianess.

Yes, I have realized after posting this thread.
If 'int' is bigendian, the least significant bit will be in the last

byte, which is the least significant byte when using a bigendian

representation. If 'int' is little-endian, that bit will be in the first

byte, which is the least significant byte when using little-endian

notation. But either way, it will reside in the least-significant byte,

not the most significant one.

Yes, I know that. As mentioned above, I intended to say that least-significant-byte will be stored in the most-significant-address in case of a big-endian machine.


- Raghavan
 
M

Myth__Buster

Without any quoted context, we can't tell what this refers to. Not all

newsreaders provide an easy way to view the parent article; mine does,

but it doesn't seem to provide an easy way to get back, so I don't use

it much. Please quote enough of the parent article for your followup to

make sense.



--

Keith Thompson (The_Other_Keith) (e-mail address removed) <http://www.ghoti.net/~kst>

Working, but not speaking, for JetHead Development, Inc.

"We must do something. This is something. Therefore, we must do this."

-- Antony Jay and Jonathan Lynn, "Yes Minister"

Sorry I thought since the posts were next-to-next, I thought of not pasting the entire paragraph. The paragraph to which it applies is

"However, I see that (x & 1) on big-endian clearly abstracts the way 1 is laid out in memory and just takes the LS-bit of MS-byte and and-s with numerical 1 as if 1 was at the LS-bit of MS-byte like on little-endian machine."

But, here MS-byte shall be read as MS-address.

Thaks.
 
J

James Kuyper

Yes, I am aware of the fact that registers are not addressable and 'register' keyword is not a command but request. However, I thought the compile-time constants such as 1ULL would also be held in a temporary memory location in big-endian representation before being moved to a register for any operation using that numerical-constant.

They might be - but there's no portable C code that can be used to
determine whether such values are stored in big-endian or little-endian
format. The flip side of this is that there is correspondingly no reason
why you should ever care - if there were a reason to care, the behavior
associated with that reason would provide a mechanism for checking the
endianess.
Well, I should have said, 'most-significant-address' instead of most-significant-byte. Sorry for the confusion.

No, the term "most-significant" simply isn't meaningful when applied to
an address. Perhaps you meant the "last address"?

....
So, you mean we have to iterate over bytes and figure out from one byte at a time. Right?
Correct.


Yup. I hope you are referring to long long type here of size 8 bytes wherein you are not considering first and last bytes for middle/mixed endian-ness checking.

No, I was very explicitly referring to int values with 4 bytes, which
are quite common nowadays. In principle, the number of possible byte
orderings for an 8 byte long long would be 8! = 40320 (the number
actually used is far smaller, of course). Applied to such an integer,
your test would incorrectly identify 7!-1 = 5039 of those orderings as
little-endian.
Yeah, this is what allows us to check the endian-ness of a machine using a variable's address to know at what byte in it is the number 1 stored if that variable's value is 1. And if that issue comes up, then that would break many programs I guess even the simple ones:

Many of my programs would be completely unaffected - they're written to
avoid making unportable assumptions about byte-ordering or bit-ordering.

To be fair, a large part of the nominal portability of my code is due to
the fact that the HDF library <http://www.hdfgroup.org/> is responsible
for hiding many such portability issues from my code. We're required by
our client to use that library, and could not port our code to a
platform which does not have a working installation of HDF. A working
HDF library would have to include routines for converting data from the
format of HDF files to the native format on that machine, and
vice-versa, so that work would have already been done for me. Assuming
that had already been done, my own code would require little if any
additional modifications.
 
K

Keith Thompson

Myth__Buster said:
Sorry I thought since the posts were next-to-next, I thought of not
pasting the entire paragraph. The paragraph to which it applies is

"However, I see that (x & 1) on big-endian clearly abstracts the way 1
is laid out in memory and just takes the LS-bit of MS-byte and and-s
with numerical 1 as if 1 was at the LS-bit of MS-byte like on
little-endian machine."

But, here MS-byte shall be read as MS-address.

Take a look at
https://gist.github.com/Keith-S-Thompson/5248300
to see what your followup looks like in my newsreader. This shows how
badly Google Groups messes up Usenet posts. Note the double-spacing of
quoted text -- and the quadruple-spacing of quoted quoted text.

Something I didn't mention before: Please don't quote signatures (the
stuff following the "-- " at the bottom of most posts).

Please either fix up your posts before clicking "Send", or use a real
newsreader.

Thanks.
 
G

glen herrmannsfeldt

James Kuyper said:
(snip)
Endianness is detectable in portable C code only when a value is stored
in an object. That's because the only way to determine the endianess is
to access the individual bytes of the object, which requires used of a
union or type-punning.
(snip)

<pedantic>
Keep in mind that if sizeof(int)==4, which is quite common nowadays,
there are 4! = 24 different possible byte orders (most of which are
exceedingly uncommon). Only one of those orderings is called
little-endian, and only one is called big-endian - the others are
generically called middle-endian. At least two of those other orders
(2143 and 3412) have actually been used. I once ran into a web page that
identified 11 of those 24 orders as having been used in specific
contexts, which it identified - but I didn't think to bookmark it, and
I've never been able to find it again.

I suppose on a bit addressable machine there are even more
possibilities.

Still, the most common example of middle-endian is VAX floating point.
In storage, they are little endian 16 bit words in big endian order.
If you look at them in little endian byte order, they look like big
endian words in little endian order. For example, when initializing
a floating point variable with a hexadecimal constant in VAX Fortran.

For another interesting bit order, consider the bits in the FAT entries
in the FAT12 file system. (It is different for add and even entries.)

-- glen
 
G

glen herrmannsfeldt

(snip)
Well, regardless of the sizeof(char) in terms o bits and hence the
value CHAR_BIT, sizeof(char) is guaranteed to be 1. But, CHAR_BIT
can be big enough to represent the largest integer data type in
C: long long i.e., sizeof(long long) == 1 is possible.
So, if that is the case, in C there would be no portable way of
checking the endian-ness of such a kind of machine. In fact, with
respect to integers, endian-ness has no role in such a machine as
there are no more than one byte even in the largest integer data
type to play with the byte-order of that integer in the respective
memory layout.

Well, a system could have sizeof(long long)==1, but sizeof(double)
longer, so that endianness did matter in floating point. (Assuming
a known floating point representation.)

-- glen
 
G

glen herrmannsfeldt

(snip, someone wrote)
The PDP-11 used a middle-endian representation, but I don't believe the
VAX ever did.

The VAX floating point format is middle endian. Adapted from a floating
point system that was used with some models of the PDP-11.

Little endian 16 bit words are stored in big-endian order. This is
visible initializaing floating point variables with hex constants
in VAX Fortran. That is, the constant is considered a little endian
integer, when mapped into memory.

All binary integer representations on VAX are little endian. Packed
decimal (BCD) integers are big endian, similar to the IBM 360
representation.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,577
Members
45,052
Latest member
LucyCarper

Latest Threads

Top