Q about endian-ness/portability

Discussion in 'C++' started by Joe C, Jan 13, 2004.

  1. Joe C

    Joe C Guest

    I have some code that performs bitwise operations on files. I'm trying to
    make the code portable on different endian systems. This is not work/school
    related...just trying to learn/understand.

    My computer is little endian 32-bit (intel, imagine that). My code deals
    with binary data in files, and I've been treating all data as native words
    (32-bit) for performance reasons. I found out that if I treat the data as
    long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
    that making it 64-bit might be a forward-looking way to support future
    platforms.
    Anyway...I don't have access to a 64-bit Big-endian system, and I want to
    make sure I understand how the data is internally represented on such a
    system.

    Suppose I have a file containing 8-bytes. In Ascii, it contains:
    "abcdefgh"
    In hex, the file contains:
    61 62 63 64 65 66 67 68
    The file is then read into memory on my machine (2 different ways) and on a
    hypothetical big-endian 64-bit machine. Each system does an operation on
    the data then writes the data to a binary file. Will all three files
    contain the identical bit-sequence? Thanks.

    Case1)
    I read this data as binary into a 2-element array of 32-bit words, on my
    little-endian machine using something like:
    in.read(reinterpret_cast<char*>(array), 8)
    after which:
    array[0] == 1684234849 == 0x64636261
    array[1] == 1751606885 == 0x68676665

    I then do the following transformation (rotate "right" 1-bit) and write the
    binary output to a file:
    int carrybit = array[0] & 1;
    array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
    array[1] = (array[1] >> 1) | (carrybit << 31);
    ofstream out ("fileout1.dat", ios::binary | ios::eek:ut);
    char* o = reinterpret_cast<char*>(array);
    out.write(o, 8);

    Case2)
    I read this data as binary into long long variable (64-bit), on my
    little-endian machine using something like:
    in.read(reinterpret_cast<char*>(&variable), 8)
    after which:
    variable == 7523094288207667809 == 0x6867666564636261

    I then do the following transformation (rotate "right" 1-bit) and write the
    binary output to a file:
    variable = (variable >> 1) | (variable << 63);
    ofstream out ("fileout2.dat", ios::binary | ios::eek:ut);
    char* o = reinterpret_cast<char*>(&variable);
    out.write(o, 8);

    Case3)
    I read this data as binary into a 64-bit variable on a hypothetical
    big-endian machine using something like:
    in.read(reinterpret_cast<char*>(&variable), 8)
    after which:
    variable == 7017280452245743464 == 0x6162636465666768

    I then do the following transformation (rotate "left" 1-bit) and write the
    binary output to a file:
    variable = (variable << 1) | (variable >> 63);
    ofstream out ("fileout3.dat", ios::binary | ios::eek:ut);
    char* o = reinterpret_cast<char*>(&variable);
    out.write(o, 8);

    _______________________

    The question...do all three files contain identical data, namely(hex):
    30 b1 31 b2 32 b3 33 b4

    Thanks for your help.

    Joe
    Joe C, Jan 13, 2004
    #1
    1. Advertising

  2. Joe C

    Kevin Saff Guest

    "Joe C" <> wrote in message
    news:RYWMb.50837$...
    > I have some code that performs bitwise operations on files. I'm trying to
    > make the code portable on different endian systems. This is not

    work/school
    > related...just trying to learn/understand.
    >
    > My computer is little endian 32-bit (intel, imagine that). My code deals
    > with binary data in files, and I've been treating all data as native words
    > (32-bit) for performance reasons. I found out that if I treat the data as
    > long long (64-bit) I only suffer a 5% performance hit. So...I'm thinking
    > that making it 64-bit might be a forward-looking way to support future
    > platforms.


    On the other hand, "long long" is a non-standard extension to C++.

    > Anyway...I don't have access to a 64-bit Big-endian system, and I want to
    > make sure I understand how the data is internally represented on such a
    > system.
    >
    > Suppose I have a file containing 8-bytes. In Ascii, it contains:
    > "abcdefgh"
    > In hex, the file contains:
    > 61 62 63 64 65 66 67 68
    > The file is then read into memory on my machine (2 different ways) and on

    a
    > hypothetical big-endian 64-bit machine. Each system does an operation on
    > the data then writes the data to a binary file. Will all three files
    > contain the identical bit-sequence? Thanks.


    Maybe. In general C++ cannot guarantee that your file is portable.
    However, in these cases I think one usually assumes that both computers use
    the same char-size, and a set of chars written by one computer can be read
    in the same order by the other computer. On different computers, bit
    sequences are not required to have the same textual representation, or
    signify the same numbers.

    > Case1)
    > I read this data as binary into a 2-element array of 32-bit words, on my
    > little-endian machine using something like:
    > in.read(reinterpret_cast<char*>(array), 8)
    > after which:
    > array[0] == 1684234849 == 0x64636261
    > array[1] == 1751606885 == 0x68676665
    >
    > I then do the following transformation (rotate "right" 1-bit) and write

    the
    > binary output to a file:
    > int carrybit = array[0] & 1;
    > array[0] = (array[0] >> 1) | ((array[1] & 1) << 31);
    > array[1] = (array[1] >> 1) | (carrybit << 31);
    > ofstream out ("fileout1.dat", ios::binary | ios::eek:ut);
    > char* o = reinterpret_cast<char*>(array);
    > out.write(o, 8);
    >
    > Case2)
    > I read this data as binary into long long variable (64-bit), on my
    > little-endian machine using something like:
    > in.read(reinterpret_cast<char*>(&variable), 8)
    > after which:
    > variable == 7523094288207667809 == 0x6867666564636261
    >
    > I then do the following transformation (rotate "right" 1-bit) and write

    the
    > binary output to a file:
    > variable = (variable >> 1) | (variable << 63);
    > ofstream out ("fileout2.dat", ios::binary | ios::eek:ut);
    > char* o = reinterpret_cast<char*>(&variable);
    > out.write(o, 8);
    >
    > Case3)
    > I read this data as binary into a 64-bit variable on a hypothetical
    > big-endian machine using something like:
    > in.read(reinterpret_cast<char*>(&variable), 8)
    > after which:
    > variable == 7017280452245743464 == 0x6162636465666768
    >
    > I then do the following transformation (rotate "left" 1-bit) and write the
    > binary output to a file:
    > variable = (variable << 1) | (variable >> 63);
    > ofstream out ("fileout3.dat", ios::binary | ios::eek:ut);
    > char* o = reinterpret_cast<char*>(&variable);
    > out.write(o, 8);


    Some confusions you might have here:

    1) You are confused about the meaning of shift left/right. Left shifting is
    always multiplication by two (if possible), right shifting division by two,
    regardless of the bit representation.

    2) Big-endian vs. little-endian is about the order of BYTES, not the order
    of BITS. In fact, since a char is by definition the smallest addressable
    units of memory in C++, it doesn't really make much since to talk about bit
    order. OTOH byte order can be important, especially since IO involves
    streaming objects as byte sequences.

    > The question...do all three files contain identical data, namely(hex):
    > 30 b1 31 b2 32 b3 33 b4


    Taking a much easier example, say we have the short (0x0102) saved on the
    intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
    Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00); the
    "future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH if
    the "future computer" right-shifts, it arrives at (0x8100), which writes
    (0x81 0x00), the same as the intel.

    >
    > Thanks for your help.
    >


    It probably isn't worth coding for this until it comes up. At the least
    someone would have to compile and test for the new platform, when needed,
    anyway. If/when it is needed an entire compatibility layer would probably
    need to be added, which is too much work. Doing this compatibility work
    will limit your current design, since it will make it much harder to make
    needed changes to your binary format - every new feature will need to be
    endian-proofed, and this will discourage real improvements.

    HTH
    --
    KCS
    Kevin Saff, Jan 14, 2004
    #2
    1. Advertising

  3. Joe C

    Joe C Guest

    "Kevin Saff" <> wrote in message
    news:...

    > On the other hand, "long long" is a non-standard extension to C++.


    right...but a 64 bit integer data-type surely be available.


    >
    > Taking a much easier example, say we have the short (0x0102) saved on the
    > intel (as 0x02 0x01). Then "future computer" reads this in as (0x0201).
    > Whereas the intel short right-shifts to (0x0081), saving as (0x81 0x00);

    the
    > "future computer" will left-shift to (0x0402), written (0x04 0x02). OTOH

    if
    > the "future computer" right-shifts, it arrives at (0x8100), which writes
    > (0x81 0x00), the same as the intel.
    >


    Thanks a bunch for this good explaination. My analysis was flawed and you
    have shed bright light on the issues.

    Joe
    Joe C, Jan 14, 2004
    #3
  4. EventHelix.com, Jan 15, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hicham
    Replies:
    2
    Views:
    9,021
    dxcoder
    Jul 2, 2003
  2. Ernst Murnleitner

    float: IEEE, big endian, little endian

    Ernst Murnleitner, Jan 13, 2004, in forum: C++
    Replies:
    0
    Views:
    864
    Ernst Murnleitner
    Jan 13, 2004
  3. invincible

    Little Endian to Big Endian

    invincible, Jun 14, 2005, in forum: C++
    Replies:
    9
    Views:
    14,350
    Old Wolf
    Jun 14, 2005
  4. invincible
    Replies:
    1
    Views:
    543
    red floyd
    Jun 14, 2005
  5. root

    Unions vs endian ness

    root, Nov 27, 2009, in forum: C Programming
    Replies:
    11
    Views:
    1,215
    James Dow Allen
    Dec 18, 2009
Loading...

Share This Page