python and bit shifts and byte order, oh my!

Discussion in 'Python' started by Reid Nichol, Sep 10, 2004.

  1. Reid Nichol

    Reid Nichol Guest

    I played with bit shifts on my PC and tried the same thing on a Mac OS X
    machine. They produced the same results, so I would assume that the way
    the bits and how they are interpreted are as in this link:

    http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html

    At least in memory while the interpreter is running.

    It's my first attempt down there and I need to know. So, is this how
    Python works at the bit level?

    I know that storing the number on disk means byte-order, etc. But, that
    doesn't seem to be in play here.

    Do I have the right thinking here? Or am I wrong? If so, how doesn't
    it actually work?
     
    Reid Nichol, Sep 10, 2004
    #1
    1. Advertisements

  2. CPUs differ in the way integers are stored in memory. But the shift
    operators of the CPU are implemented to work on logical integers in
    registers, not on consecutive bytes in memory.

    Daniel
     
    Daniel Dittmar, Sep 10, 2004
    #2
    1. Advertisements

  3. Reid Nichol

    Reid Nichol Guest

    I'm wondering because the file format I'm trying to write uses
    bit-packing, so I need to be able to write, for example, a 5 bit integer
    to the disk.

    I do think regardless of language I'm going to have an unfun time doing
    this. But, since cross platform is a want approching a need, I'd like
    to use Python.

    Would getting a specific bit from the integer be the same or would I
    have to worry about the byte-order?

    ie Would:
    x & SOMEBIT
    be portable?
     
    Reid Nichol, Sep 10, 2004
    #3
  4. I presume by "to disk" you mean to a file in a filesystem.

    You can't do that under any OS and filesystem with which I'm
    familiar. You can only write an integral number of bytes to a
    file. You can only write an integral number of blocks to disk
    (a block is 512 bytes, typically)
    You only have to worry about byte order when reading/writing
    binary objects from/to a file.
    Yes, that's portable.

    The code that writes x to a file and reads it from a file is
    what you have to worry about.
     
    Grant Edwards, Sep 10, 2004
    #4
  5. Reid Nichol

    Reid Nichol Guest

    So, I have to handle the byte order myself during file i/o.


    Thank you Grant and Daniel! I cleared up a a bunch of my fuzziness :)
     
    Reid Nichol, Sep 10, 2004
    #5
  6. Reid Nichol

    Phil Frost Guest

    The standard 'struct' module provides methods to specify the byte order
    of the input.
     
    Phil Frost, Sep 10, 2004
    #6
  7. Reid Nichol

    Reid Nichol Guest

    It's my understanding that pack and unpack of the struct module returns
    strings and not rearranged integers.

    At any rate I would rather do it myself if only to teach myself
    something about this.
     
    Reid Nichol, Sep 10, 2004
    #7
  8. Partly right. Pack returns a string. Unpack returns whatever
    you tell it to return (integers, floats, etc.).

    When reading a 4 byte integer from a file:

    Read a string of 4 bytes from the file.

    Use struct.upack() on the string to convert it to an integer
    object.

    When writing a 4 byte integer to a file:

    Use struct.pack() to convert the integer object into a 4-byte
    string with the desired byte order.

    Write the 4 byte string to the file.
    Do what yourself?
     
    Grant Edwards, Sep 10, 2004
    #8
  9. Reid Nichol

    Jason Lai Guest

    pack returns a string. Which is a sequence of characters, which also
    happen to be bytes. So you write the string (sequence of bytes) to the
    file. I don't think you usually need to rearrange integers in memory;
    it's mainly when you're writing to disk.

    unpack returns a tuple of objects according to the format string. So
    struct.unpack("!i", "abcd") returns (1633837924,). In your case, you'd
    read in 4 bytes and unpack it to a 32-bit int.

    If efficiency isn't important, you could forget about the whole
    byte-order thing and just read/write it byte-by-byte. Then you can think
    of the file as a bit-stream (everything gets written in order and read
    back in order), although you still have to read/write a whole 8-bit byte
    at a time.

    - Jason Lai
     
    Jason Lai, Sep 10, 2004
    #9
  10. Reid Nichol

    Reid Nichol Guest

    Since the format can have:
    5bit
    24bit
    24bit

    I assumed that I would have to write byte by byte. And I don't really
    consider speed important so I think that it's viable to do it this way.

    @Grant
    This is what I meant.
     
    Reid Nichol, Sep 10, 2004
    #10
  11. I would suggest you define a class (e.g., subclass the builtin file type)
    that serves as a convenient (for you) bit-wise interface to a binary file
    (binary is important on windows, or you will EOL conversions when you write).
    E.g., so you will be able to write code like:

    bf = BitFile('data/bitfile.dat', 'wb')
    bf.write(0xfa, 5)
    bf.write(whatever, 24)
    bf.close()

    The class will have to take care of buffering and packing and unpacking and endianness
    and how to deal with a file that is not an integral number times 8 bits total (if you
    are defining the format, you could always append an extra byte on close that says how many bits
    there are in the last (preceding) data byte, so you could read back exactly the bits specified).

    You could also give the class properties for common bit field widths, so that the
    effect of e.g., the above writes would look like

    bf.b5 = 0xfa
    bf.b24 = whatever

    would be to write (actually buffer, since you have to do that for fractional bytes
    anyway, and will gain in i/o performance for larger chunks) five bits. On the read
    side, you might want to distinguish between signed and unsigned bitfields, e.g.,

    signed = bf.s5 # read next 5 bits as signed integer
    unsigned = bf.u5 # ditto, except unsigned

    Of course, packing bits together from a sequence of numbers into a string of bytes has
    nothing necessarily to do with file i/o, so you might want to factor that out. E.g.,
    you could take inspiration from struct to create something that works by bit fields, e.g.,
    say '.n' means pack n bits adjacent to previously buffered bits. Say ',n' means skip n
    bits as if you were reading or writing (introducing default 0 if not re-writing), and then
    use the struct type letters for alignment skips, e.g., 'h' to skip to end of current short,
    or 'l' to skip to end of current long. Then

    pack('<.3,2.7h.24l', x, y, z)

    could be a little-endian packing of size(short)+size(long) bits, with two fields
    x and y of 3 and 7 bits respectively, separated by a 2-bit space, packed into a short,
    followed z packed into the bottom of a long, for six bytes total.

    Probably pack should be a class so that you get back an object that has both data bytes
    and total bit length and methods for convenient concatenation, so

    pack('.3', 10) + pack('.4', 15) == pack('.3.4', 10, 15)

    Sorry I don't have time to implement this now (actually, I have a strictly-little-endian
    hack that I used for some music compression experiments a while back, maybe I can find
    it later). API preferences could probably stand a little discussion anyway ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Sep 11, 2004
    #11
  12. I had similar problems, not for reading/write binary files but to
    pack/unpack network binary messages, with bitfields in it, between
    big-endian and little-endian computers.

    I ended up writing a 'bitstring' class that is loaded with a bunch of
    bytes ( e.g. the output of a socket.recv () or a file.read() ) and is
    able to read/set any arbitrary group of bits in it in a portable way,
    which is not hard, since bits operators works on 'logical intergers' and
    are independent of byte order, as others pointed out (I believe this is
    also true i C and other languages).

    I'm not showing my code, because since then I discovered that in the Net
    there are better implementation than mine :). Just google for "python bit
    manipulation" and you will find examples like this:

    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799

    which should work with arbitrary length bitstrings, since bit operators
    work on long integers, also :)
     
    Francesco Bochicchio, Sep 11, 2004
    #12
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.