I have seen code to read this as an
int in Java but don't understand what is happening. I guess that I just
don't understand what is going on with the shifting of bits and "anding"
with other values. Could someone please explain this in detail? Or lead
me
to a good source for in-depth reading? I don't just want code... I want
to
understand. If this seems trivial, please excuse me... I'm only a second
semester Computer Science major who likes to keep his brain busy during
the
summer off and would like to make a new front-end for an old program that
I
use at work.
Thanks!
I will do my best to explain - without using any code.
Endian-ness is fun. It adds excitement and joy to the otherwise tedious
task of
developing portable code to read arbitrary binary data formats. Java has
made
the life of the data processor much less interesting by taking this task
and
wrapping it up in the ByteBuffer class. However, for the purposes of
learning
it is a good thing to understand what it going on behind the scenes.
There are [essentially] two types of endianess, big-endian and
little-endian.
Big-endian hardware stores bytes in memory in their "natural" format, with
the
"big" end on the "left" (lower memory address). Little-endian hardware was
designed to do the opposite, just to be awkward.
Lets assume we have 3 variables, containing a char, a 16bit int ("short")
and a
32bit int ("long"). We'll assign the hex. values of 0x11, 0x1122 and
0x11223344
to these variables respectively. If these variables occupied consecutive
memory
addresses (or were output to binary file in sequence) on big-endian
hardware
the contents of memory would be 0x11, 0x11, 0x22, 0x11, 0x22, 0x33, 0x44.
On
little-endian hardware the values would be 0x11, 0x22, 0x11, 0x44, 0x33,
0x22,
0x11. As you can see, little-endian hardware has reversed the bytes of
each
value. (NOTE: If you write binary data from Java it is *always* output in
big-endian order).
If you write the data to a file and read it back on the same hardware
using the
same variable types [and the same language] then there is no problem. The
bytes
will be stored in the correct locations and the variables will have the
correct
contents. The fun comes when you read the data as a byte array, or attempt
to
read it on the other type of hardware or use a language which makes
different
assumptions about the type of data.
To see how it all goes horribly wrong lets try to read the little-endian
data
file (written by some language other than Java) into Java. Remember, the
order
of the bytes in the little-endian binary file is 11221144332211. So we
read the
first byte and treat it as a byte, and this is ok. Next we read the two
bytes
0x22 and 0x11 and get the short integer 0x2211, not what we wanted at all.
The
situation is the same for the "long" integer which will contain
0x44332211.
This is where byte shifting and masking becomes necessary (if you don't
use
Java or don't use ByteBuffer in Java), the contents of the "short" and
"long"
integers have to be reversed.
You can do this more easily by reading into a byte array and extracting
the
correct bytes. For example, for the 4-byte "long" integer, reading the
bytes
into a byte array you will get array[0]=0x44, array[1]=0x33, array[2]=0x22
and
array[3]=0x11. To construct the correct integer (0x11223344) you need to
shift
array[3] left 24 places so it becomes 0x11000000, combine that with
array[2]
left shifted 16 bits (0x220000) etc. How you write the code to do this is
up to
you, I said I wouldn't use any code.
--
Nigel Wade, System Administrator, Space Plasma Physics Group,
University of Leicester, Leicester, LE1 7RH, UK
E-mail : (e-mail address removed)
Phone : +44 (0)116 2523548, Fax : +44 (0)116 2523555