Q about endian-ness/portability

J

Joe C

I have some code that performs bitwise operations on files. I'm trying to
make the code portable on different endian systems. This is not work/school
related...just trying to learn/understand.

My computer is little endian 32-bit (intel, imagine that). My code deals
with binary data in files, and I've been treating all data as native words
(32-bit) for performance reasons. I found out that if I treat the data as
long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
that making it 64-bit might be a forward-looking way to support future
platforms.
Anyway...I don't have access to a 64-bit Big-endian system, and I want to
make sure I understand how the data is internally represented on such a
system.

Suppose I have a file containing 8-bytes. In Ascii, it contains:
"abcdefgh"
In hex, the file contains:
61 62 63 64 65 66 67 68
The file is then read into memory on my machine (2 different ways) and on a
hypothetical big-endian 64-bit machine. Each system does an operation on
the data then writes the data to a binary file. Will all three files
contain the identical bit-sequence? Thanks.

Case1)
I read this data as binary into a 2-element array of 32-bit words, on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(array), 8)
after which:
array[0] == 1684234849 == 0x64636261
array[1] == 1751606885 == 0x68676665

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
int carrybit = array[0] & 1;
array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
array[1] = (array[1] >> 1) | (carrybit << 31);
ofstream out ("fileout1.dat", ios::binary | ios::eek:ut);
char* o = reinterpret_cast<char*>(array);
out.write(o, 8);

Case2)
I read this data as binary into long long variable (64-bit), on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7523094288207667809 == 0x6867666564636261

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
variable = (variable >> 1) | (variable << 63);
ofstream out ("fileout2.dat", ios::binary | ios::eek:ut);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

Case3)
I read this data as binary into a 64-bit variable on a hypothetical
big-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7017280452245743464 == 0x6162636465666768

I then do the following transformation (rotate "left" 1-bit) and write the
binary output to a file:
variable = (variable << 1) | (variable >> 63);
ofstream out ("fileout3.dat", ios::binary | ios::eek:ut);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

_______________________

The question...do all three files contain identical data, namely(hex):
30 b1 31 b2 32 b3 33 b4

Thanks for your help.

Joe
 
K

Kevin Saff

Joe C said:
I have some code that performs bitwise operations on files. I'm trying to
make the code portable on different endian systems. This is not work/school
related...just trying to learn/understand.

My computer is little endian 32-bit (intel, imagine that). My code deals
with binary data in files, and I've been treating all data as native words
(32-bit) for performance reasons. I found out that if I treat the data as
long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
that making it 64-bit might be a forward-looking way to support future
platforms.

On the other hand, "long long" is a non-standard extension to C++.
Anyway...I don't have access to a 64-bit Big-endian system, and I want to
make sure I understand how the data is internally represented on such a
system.

Suppose I have a file containing 8-bytes. In Ascii, it contains:
"abcdefgh"
In hex, the file contains:
61 62 63 64 65 66 67 68
The file is then read into memory on my machine (2 different ways) and on a
hypothetical big-endian 64-bit machine. Each system does an operation on
the data then writes the data to a binary file. Will all three files
contain the identical bit-sequence? Thanks.

Maybe. In general C++ cannot guarantee that your file is portable.
However, in these cases I think one usually assumes that both computers use
the same char-size, and a set of chars written by one computer can be read
in the same order by the other computer. On different computers, bit
sequences are not required to have the same textual representation, or
signify the same numbers.
Case1)
I read this data as binary into a 2-element array of 32-bit words, on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(array), 8)
after which:
array[0] == 1684234849 == 0x64636261
array[1] == 1751606885 == 0x68676665

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
int carrybit = array[0] & 1;
array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
array[1] = (array[1] >> 1) | (carrybit << 31);
ofstream out ("fileout1.dat", ios::binary | ios::eek:ut);
char* o = reinterpret_cast<char*>(array);
out.write(o, 8);

Case2)
I read this data as binary into long long variable (64-bit), on my
little-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7523094288207667809 == 0x6867666564636261

I then do the following transformation (rotate "right" 1-bit) and write the
binary output to a file:
variable = (variable >> 1) | (variable << 63);
ofstream out ("fileout2.dat", ios::binary | ios::eek:ut);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

Case3)
I read this data as binary into a 64-bit variable on a hypothetical
big-endian machine using something like:
in.read(reinterpret_cast<char*>(&variable), 8)
after which:
variable == 7017280452245743464 == 0x6162636465666768

I then do the following transformation (rotate "left" 1-bit) and write the
binary output to a file:
variable = (variable << 1) | (variable >> 63);
ofstream out ("fileout3.dat", ios::binary | ios::eek:ut);
char* o = reinterpret_cast<char*>(&variable);
out.write(o, 8);

Some confusions you might have here:

1) You are confused about the meaning of shift left/right. Left shifting is
always multiplication by two (if possible), right shifting division by two,
regardless of the bit representation.

2) Big-endian vs. little-endian is about the order of BYTES, not the order
of BITS. In fact, since a char is by definition the smallest addressable
units of memory in C++, it doesn't really make much since to talk about bit
order. OTOH byte order can be important, especially since IO involves
streaming objects as byte sequences.
The question...do all three files contain identical data, namely(hex):
30 b1 31 b2 32 b3 33 b4

Taking a much easier example, say we have the short (0x0102) saved on the
intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00); the
"future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH if
the "future computer" right-shifts, it arrives at (0x8100), which writes
(0x81 0x00), the same as the intel.
Thanks for your help.

It probably isn't worth coding for this until it comes up. At the least
someone would have to compile and test for the new platform, when needed,
anyway. If/when it is needed an entire compatibility layer would probably
need to be added, which is too much work. Doing this compatibility work
will limit your current design, since it will make it much harder to make
needed changes to your binary format - every new feature will need to be
endian-proofed, and this will discourage real improvements.

HTH
 
J

Joe C

On the other hand, "long long" is a non-standard extension to C++.

right...but a 64 bit integer data-type surely be available.

Taking a much easier example, say we have the short (0x0102) saved on the
intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00); the
"future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH if
the "future computer" right-shifts, it arrives at (0x8100), which writes
(0x81 0x00), the same as the intel.

Thanks a bunch for this good explaination. My analysis was flawed and you
have shed bright light on the issues.

Joe
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top