python and bit shifts and byte order, oh my!

R

Reid Nichol

I played with bit shifts on my PC and tried the same thing on a Mac OS X
machine. They produced the same results, so I would assume that the way
the bits and how they are interpreted are as in this link:

http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html

At least in memory while the interpreter is running.

It's my first attempt down there and I need to know. So, is this how
Python works at the bit level?

I know that storing the number on disk means byte-order, etc. But, that
doesn't seem to be in play here.

Do I have the right thinking here? Or am I wrong? If so, how doesn't
it actually work?
 
D

Daniel Dittmar

Reid said:
I played with bit shifts on my PC and tried the same thing on a Mac OS X
machine. They produced the same results, so I would assume that the way
the bits and how they are interpreted are as in this link:

http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html

At least in memory while the interpreter is running.

CPUs differ in the way integers are stored in memory. But the shift
operators of the CPU are implemented to work on logical integers in
registers, not on consecutive bytes in memory.

Daniel
 
R

Reid Nichol

Daniel said:
CPUs differ in the way integers are stored in memory. But the shift
operators of the CPU are implemented to work on logical integers in
registers, not on consecutive bytes in memory.

Daniel

I'm wondering because the file format I'm trying to write uses
bit-packing, so I need to be able to write, for example, a 5 bit integer
to the disk.

I do think regardless of language I'm going to have an unfun time doing
this. But, since cross platform is a want approching a need, I'd like
to use Python.

Would getting a specific bit from the integer be the same or would I
have to worry about the byte-order?

ie Would:
x & SOMEBIT
be portable?
 
G

Grant Edwards

I'm wondering because the file format I'm trying to write uses
bit-packing, so I need to be able to write, for example, a 5 bit integer
to the disk.

I presume by "to disk" you mean to a file in a filesystem.

You can't do that under any OS and filesystem with which I'm
familiar. You can only write an integral number of bytes to a
file. You can only write an integral number of blocks to disk
(a block is 512 bytes, typically)
I do think regardless of language I'm going to have an unfun
time doing this. But, since cross platform is a want
approching a need, I'd like to use Python.

Would getting a specific bit from the integer be the same or
would I have to worry about the byte-order?

You only have to worry about byte order when reading/writing
binary objects from/to a file.
ie Would:
x & SOMEBIT
be portable?

Yes, that's portable.

The code that writes x to a file and reads it from a file is
what you have to worry about.
 
R

Reid Nichol

Grant said:
Yes, that's portable. Great!


The code that writes x to a file and reads it from a file is
what you have to worry about.

So, I have to handle the byte order myself during file i/o.


Thank you Grant and Daniel! I cleared up a a bunch of my fuzziness :)
 
P

Phil Frost

The standard 'struct' module provides methods to specify the byte order
of the input.
 
R

Reid Nichol

Phil said:
The standard 'struct' module provides methods to specify the byte order
of the input.

It's my understanding that pack and unpack of the struct module returns
strings and not rearranged integers.

At any rate I would rather do it myself if only to teach myself
something about this.
 
G

Grant Edwards

It's my understanding that pack and unpack of the struct module returns
strings and not rearranged integers.

Partly right. Pack returns a string. Unpack returns whatever
you tell it to return (integers, floats, etc.).

When reading a 4 byte integer from a file:

Read a string of 4 bytes from the file.

Use struct.upack() on the string to convert it to an integer
object.

When writing a 4 byte integer to a file:

Use struct.pack() to convert the integer object into a 4-byte
string with the desired byte order.

Write the 4 byte string to the file.
At any rate I would rather do it myself if only to teach
myself something about this.

Do what yourself?
 
J

Jason Lai

Reid said:
It's my understanding that pack and unpack of the struct module returns
strings and not rearranged integers.

At any rate I would rather do it myself if only to teach myself
something about this.

pack returns a string. Which is a sequence of characters, which also
happen to be bytes. So you write the string (sequence of bytes) to the
file. I don't think you usually need to rearrange integers in memory;
it's mainly when you're writing to disk.

unpack returns a tuple of objects according to the format string. So
struct.unpack("!i", "abcd") returns (1633837924,). In your case, you'd
read in 4 bytes and unpack it to a 32-bit int.

If efficiency isn't important, you could forget about the whole
byte-order thing and just read/write it byte-by-byte. Then you can think
of the file as a bit-stream (everything gets written in order and read
back in order), although you still have to read/write a whole 8-bit byte
at a time.

- Jason Lai
 
R

Reid Nichol

Jason said:
If efficiency isn't important, you could forget about the whole
byte-order thing and just read/write it byte-by-byte. Then you can think
of the file as a bit-stream (everything gets written in order and read
back in order), although you still have to read/write a whole 8-bit byte
at a time.

- Jason Lai

Since the format can have:
5bit
24bit
24bit

I assumed that I would have to write byte by byte. And I don't really
consider speed important so I think that it's viable to do it this way.

@Grant
This is what I meant.
 
B

Bengt Richter

Since the format can have:
5bit
24bit
24bit

I assumed that I would have to write byte by byte. And I don't really
consider speed important so I think that it's viable to do it this way.

@Grant
This is what I meant.

I would suggest you define a class (e.g., subclass the builtin file type)
that serves as a convenient (for you) bit-wise interface to a binary file
(binary is important on windows, or you will EOL conversions when you write).
E.g., so you will be able to write code like:

bf = BitFile('data/bitfile.dat', 'wb')
bf.write(0xfa, 5)
bf.write(whatever, 24)
bf.close()

The class will have to take care of buffering and packing and unpacking and endianness
and how to deal with a file that is not an integral number times 8 bits total (if you
are defining the format, you could always append an extra byte on close that says how many bits
there are in the last (preceding) data byte, so you could read back exactly the bits specified).

You could also give the class properties for common bit field widths, so that the
effect of e.g., the above writes would look like

bf.b5 = 0xfa
bf.b24 = whatever

would be to write (actually buffer, since you have to do that for fractional bytes
anyway, and will gain in i/o performance for larger chunks) five bits. On the read
side, you might want to distinguish between signed and unsigned bitfields, e.g.,

signed = bf.s5 # read next 5 bits as signed integer
unsigned = bf.u5 # ditto, except unsigned

Of course, packing bits together from a sequence of numbers into a string of bytes has
nothing necessarily to do with file i/o, so you might want to factor that out. E.g.,
you could take inspiration from struct to create something that works by bit fields, e.g.,
say '.n' means pack n bits adjacent to previously buffered bits. Say ',n' means skip n
bits as if you were reading or writing (introducing default 0 if not re-writing), and then
use the struct type letters for alignment skips, e.g., 'h' to skip to end of current short,
or 'l' to skip to end of current long. Then

pack('<.3,2.7h.24l', x, y, z)

could be a little-endian packing of size(short)+size(long) bits, with two fields
x and y of 3 and 7 bits respectively, separated by a 2-bit space, packed into a short,
followed z packed into the bottom of a long, for six bytes total.

Probably pack should be a class so that you get back an object that has both data bytes
and total bit length and methods for convenient concatenation, so

pack('.3', 10) + pack('.4', 15) == pack('.3.4', 10, 15)

Sorry I don't have time to implement this now (actually, I have a strictly-little-endian
hack that I used for some music compression experiments a while back, maybe I can find
it later). API preferences could probably stand a little discussion anyway ;-)

Regards,
Bengt Richter
 
F

Francesco Bochicchio

I played with bit shifts on my PC and tried the same thing on a Mac OS X
machine. They produced the same results, so I would assume that the way
the bits and how they are interpreted are as in this link:

http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html

At least in memory while the interpreter is running.

It's my first attempt down there and I need to know. So, is this how
Python works at the bit level?

I had similar problems, not for reading/write binary files but to
pack/unpack network binary messages, with bitfields in it, between
big-endian and little-endian computers.

I ended up writing a 'bitstring' class that is loaded with a bunch of
bytes ( e.g. the output of a socket.recv () or a file.read() ) and is
able to read/set any arbitrary group of bits in it in a portable way,
which is not hard, since bits operators works on 'logical intergers' and
are independent of byte order, as others pointed out (I believe this is
also true i C and other languages).

I'm not showing my code, because since then I discovered that in the Net
there are better implementation than mine :). Just google for "python bit
manipulation" and you will find examples like this:

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799

which should work with arbitrary length bitstrings, since bit operators
work on long integers, also :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,535
Members
45,007
Latest member
obedient dusk

Latest Threads

Top