python and bit shifts and byte order, oh my!

Discussion in 'Python' started by Reid Nichol, Sep 10, 2004.

  1. Reid Nichol

    Reid Nichol Guest

    I played with bit shifts on my PC and tried the same thing on a Mac OS X
    machine. They produced the same results, so I would assume that the way
    the bits and how they are interpreted are as in this link:

    http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html

    At least in memory while the interpreter is running.

    It's my first attempt down there and I need to know. So, is this how
    Python works at the bit level?

    I know that storing the number on disk means byte-order, etc. But, that
    doesn't seem to be in play here.

    Do I have the right thinking here? Or am I wrong? If so, how doesn't
    it actually work?
     
    Reid Nichol, Sep 10, 2004
    #1
    1. Advertising

  2. Reid Nichol wrote:
    > I played with bit shifts on my PC and tried the same thing on a Mac OS X
    > machine. They produced the same results, so I would assume that the way
    > the bits and how they are interpreted are as in this link:
    >
    > http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html
    >
    > At least in memory while the interpreter is running.


    CPUs differ in the way integers are stored in memory. But the shift
    operators of the CPU are implemented to work on logical integers in
    registers, not on consecutive bytes in memory.

    Daniel
     
    Daniel Dittmar, Sep 10, 2004
    #2
    1. Advertising

  3. Reid Nichol

    Reid Nichol Guest

    Daniel Dittmar wrote:
    >> I played with bit shifts on my PC and tried the same thing on a Mac OS
    >> X machine. They produced the same results, so I would assume that the
    >> way the bits and how they are interpreted are as in this link:
    >>
    >> http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html
    >>
    >> At least in memory while the interpreter is running.

    >
    >
    > CPUs differ in the way integers are stored in memory. But the shift
    > operators of the CPU are implemented to work on logical integers in
    > registers, not on consecutive bytes in memory.
    >
    > Daniel


    I'm wondering because the file format I'm trying to write uses
    bit-packing, so I need to be able to write, for example, a 5 bit integer
    to the disk.

    I do think regardless of language I'm going to have an unfun time doing
    this. But, since cross platform is a want approching a need, I'd like
    to use Python.

    Would getting a specific bit from the integer be the same or would I
    have to worry about the byte-order?

    ie Would:
    x & SOMEBIT
    be portable?
     
    Reid Nichol, Sep 10, 2004
    #3
  4. On 2004-09-10, Reid Nichol <> wrote:

    >> CPUs differ in the way integers are stored in memory. But the shift
    >> operators of the CPU are implemented to work on logical integers in
    >> registers, not on consecutive bytes in memory.

    >
    > I'm wondering because the file format I'm trying to write uses
    > bit-packing, so I need to be able to write, for example, a 5 bit integer
    > to the disk.


    I presume by "to disk" you mean to a file in a filesystem.

    You can't do that under any OS and filesystem with which I'm
    familiar. You can only write an integral number of bytes to a
    file. You can only write an integral number of blocks to disk
    (a block is 512 bytes, typically)

    > I do think regardless of language I'm going to have an unfun
    > time doing this. But, since cross platform is a want
    > approching a need, I'd like to use Python.
    >
    > Would getting a specific bit from the integer be the same or
    > would I have to worry about the byte-order?


    You only have to worry about byte order when reading/writing
    binary objects from/to a file.

    > ie Would:
    > x & SOMEBIT
    > be portable?


    Yes, that's portable.

    The code that writes x to a file and reads it from a file is
    what you have to worry about.

    --
    Grant Edwards grante Yow! Put FIVE DOZEN red
    at GIRDLES in each CIRCULAR
    visi.com OPENING!!
     
    Grant Edwards, Sep 10, 2004
    #4
  5. Reid Nichol

    Reid Nichol Guest

    Grant Edwards wrote:
    >>ie Would:
    >>x & SOMEBIT
    >>be portable?

    >
    >
    > Yes, that's portable.

    Great!

    >
    > The code that writes x to a file and reads it from a file is
    > what you have to worry about.
    >


    So, I have to handle the byte order myself during file i/o.


    Thank you Grant and Daniel! I cleared up a a bunch of my fuzziness :)
     
    Reid Nichol, Sep 10, 2004
    #5
  6. Reid Nichol

    Phil Frost Guest

    The standard 'struct' module provides methods to specify the byte order
    of the input.

    On Fri, Sep 10, 2004 at 02:13:52PM -0500, Reid Nichol wrote:
    >
    > >
    > >The code that writes x to a file and reads it from a file is
    > >what you have to worry about.
    > >

    >
    > So, I have to handle the byte order myself during file i/o.
    >
    >
    > Thank you Grant and Daniel! I cleared up a a bunch of my fuzziness :)
     
    Phil Frost, Sep 10, 2004
    #6
  7. Reid Nichol

    Reid Nichol Guest

    Phil Frost wrote:
    > The standard 'struct' module provides methods to specify the byte order
    > of the input.
    >
    > On Fri, Sep 10, 2004 at 02:13:52PM -0500, Reid Nichol wrote:
    >
    >>>The code that writes x to a file and reads it from a file is
    >>>what you have to worry about.
    >>>

    >>
    >>So, I have to handle the byte order myself during file i/o.
    >>
    >>
    >>Thank you Grant and Daniel! I cleared up a a bunch of my fuzziness :)


    It's my understanding that pack and unpack of the struct module returns
    strings and not rearranged integers.

    At any rate I would rather do it myself if only to teach myself
    something about this.
     
    Reid Nichol, Sep 10, 2004
    #7
  8. On 2004-09-10, Reid Nichol <> wrote:

    > It's my understanding that pack and unpack of the struct module returns
    > strings and not rearranged integers.


    Partly right. Pack returns a string. Unpack returns whatever
    you tell it to return (integers, floats, etc.).

    When reading a 4 byte integer from a file:

    Read a string of 4 bytes from the file.

    Use struct.upack() on the string to convert it to an integer
    object.

    When writing a 4 byte integer to a file:

    Use struct.pack() to convert the integer object into a 4-byte
    string with the desired byte order.

    Write the 4 byte string to the file.

    > At any rate I would rather do it myself if only to teach
    > myself something about this.


    Do what yourself?

    --
    Grant Edwards grante Yow! NOW, I'm supposed
    at to SCRAMBLE two, and HOLD
    visi.com th' MAYO!!
     
    Grant Edwards, Sep 10, 2004
    #8
  9. Reid Nichol

    Jason Lai Guest

    Reid Nichol wrote:
    > Phil Frost wrote:
    >
    >> The standard 'struct' module provides methods to specify the byte order
    >> of the input.
    >>
    >> On Fri, Sep 10, 2004 at 02:13:52PM -0500, Reid Nichol wrote:
    >>
    >>>> The code that writes x to a file and reads it from a file is
    >>>> what you have to worry about.
    >>>>
    >>>
    >>> So, I have to handle the byte order myself during file i/o.
    >>>
    >>>
    >>> Thank you Grant and Daniel! I cleared up a a bunch of my fuzziness :)

    >
    >
    > It's my understanding that pack and unpack of the struct module returns
    > strings and not rearranged integers.
    >
    > At any rate I would rather do it myself if only to teach myself
    > something about this.


    pack returns a string. Which is a sequence of characters, which also
    happen to be bytes. So you write the string (sequence of bytes) to the
    file. I don't think you usually need to rearrange integers in memory;
    it's mainly when you're writing to disk.

    unpack returns a tuple of objects according to the format string. So
    struct.unpack("!i", "abcd") returns (1633837924,). In your case, you'd
    read in 4 bytes and unpack it to a 32-bit int.

    If efficiency isn't important, you could forget about the whole
    byte-order thing and just read/write it byte-by-byte. Then you can think
    of the file as a bit-stream (everything gets written in order and read
    back in order), although you still have to read/write a whole 8-bit byte
    at a time.

    - Jason Lai
     
    Jason Lai, Sep 10, 2004
    #9
  10. Reid Nichol

    Reid Nichol Guest

    Jason Lai wrote:
    > If efficiency isn't important, you could forget about the whole
    > byte-order thing and just read/write it byte-by-byte. Then you can think
    > of the file as a bit-stream (everything gets written in order and read
    > back in order), although you still have to read/write a whole 8-bit byte
    > at a time.
    >
    > - Jason Lai


    Since the format can have:
    5bit
    24bit
    24bit

    I assumed that I would have to write byte by byte. And I don't really
    consider speed important so I think that it's viable to do it this way.

    @Grant
    This is what I meant.
     
    Reid Nichol, Sep 10, 2004
    #10
  11. On Fri, 10 Sep 2004 15:51:33 -0500, Reid Nichol <> wrote:

    >Jason Lai wrote:
    >> If efficiency isn't important, you could forget about the whole
    >> byte-order thing and just read/write it byte-by-byte. Then you can think
    >> of the file as a bit-stream (everything gets written in order and read
    >> back in order), although you still have to read/write a whole 8-bit byte
    >> at a time.
    >>
    >> - Jason Lai

    >
    >Since the format can have:
    >5bit
    >24bit
    >24bit
    >
    >I assumed that I would have to write byte by byte. And I don't really
    >consider speed important so I think that it's viable to do it this way.
    >
    >@Grant
    >This is what I meant.


    I would suggest you define a class (e.g., subclass the builtin file type)
    that serves as a convenient (for you) bit-wise interface to a binary file
    (binary is important on windows, or you will EOL conversions when you write).
    E.g., so you will be able to write code like:

    bf = BitFile('data/bitfile.dat', 'wb')
    bf.write(0xfa, 5)
    bf.write(whatever, 24)
    bf.close()

    The class will have to take care of buffering and packing and unpacking and endianness
    and how to deal with a file that is not an integral number times 8 bits total (if you
    are defining the format, you could always append an extra byte on close that says how many bits
    there are in the last (preceding) data byte, so you could read back exactly the bits specified).

    You could also give the class properties for common bit field widths, so that the
    effect of e.g., the above writes would look like

    bf.b5 = 0xfa
    bf.b24 = whatever

    would be to write (actually buffer, since you have to do that for fractional bytes
    anyway, and will gain in i/o performance for larger chunks) five bits. On the read
    side, you might want to distinguish between signed and unsigned bitfields, e.g.,

    signed = bf.s5 # read next 5 bits as signed integer
    unsigned = bf.u5 # ditto, except unsigned

    Of course, packing bits together from a sequence of numbers into a string of bytes has
    nothing necessarily to do with file i/o, so you might want to factor that out. E.g.,
    you could take inspiration from struct to create something that works by bit fields, e.g.,
    say '.n' means pack n bits adjacent to previously buffered bits. Say ',n' means skip n
    bits as if you were reading or writing (introducing default 0 if not re-writing), and then
    use the struct type letters for alignment skips, e.g., 'h' to skip to end of current short,
    or 'l' to skip to end of current long. Then

    pack('<.3,2.7h.24l', x, y, z)

    could be a little-endian packing of size(short)+size(long) bits, with two fields
    x and y of 3 and 7 bits respectively, separated by a 2-bit space, packed into a short,
    followed z packed into the bottom of a long, for six bytes total.

    Probably pack should be a class so that you get back an object that has both data bytes
    and total bit length and methods for convenient concatenation, so

    pack('.3', 10) + pack('.4', 15) == pack('.3.4', 10, 15)

    Sorry I don't have time to implement this now (actually, I have a strictly-little-endian
    hack that I used for some music compression experiments a while back, maybe I can find
    it later). API preferences could probably stand a little discussion anyway ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Sep 11, 2004
    #11
  12. On Thu, 09 Sep 2004 22:21:42 -0500, Reid Nichol wrote:

    > I played with bit shifts on my PC and tried the same thing on a Mac OS X
    > machine. They produced the same results, so I would assume that the way
    > the bits and how they are interpreted are as in this link:
    >
    > http://www.khakipants.org/archives/2003/03/bitlevel_input_and_output.html
    >
    > At least in memory while the interpreter is running.
    >
    > It's my first attempt down there and I need to know. So, is this how
    > Python works at the bit level?
    >


    I had similar problems, not for reading/write binary files but to
    pack/unpack network binary messages, with bitfields in it, between
    big-endian and little-endian computers.

    I ended up writing a 'bitstring' class that is loaded with a bunch of
    bytes ( e.g. the output of a socket.recv () or a file.read() ) and is
    able to read/set any arbitrary group of bits in it in a portable way,
    which is not hard, since bits operators works on 'logical intergers' and
    are independent of byte order, as others pointed out (I believe this is
    also true i C and other languages).

    I'm not showing my code, because since then I discovered that in the Net
    there are better implementation than mine :). Just google for "python bit
    manipulation" and you will find examples like this:

    http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/113799

    which should work with arbitrary length bitstrings, since bit operators
    work on long integers, also :)
     
    Francesco Bochicchio, Sep 11, 2004
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ross A. Finlayson
    Replies:
    19
    Views:
    604
    Keith Thompson
    Mar 10, 2005
  2. gamehack

    Bit shifts and endianness

    gamehack, Jan 5, 2006, in forum: C Programming
    Replies:
    72
    Views:
    6,875
    Dave Thompson
    Jan 11, 2006
  3. fermineutron

    bit shifts across array elements

    fermineutron, Nov 4, 2006, in forum: C Programming
    Replies:
    6
    Views:
    356
    Peter Nilsson
    Nov 6, 2006
  4. Replies:
    9
    Views:
    981
    Juha Nieminen
    Aug 22, 2007
  5. geo

    Be cautious when iterating bit-shifts

    geo, Nov 16, 2009, in forum: C Programming
    Replies:
    4
    Views:
    387
    Phil Carmody
    Nov 21, 2009
Loading...

Share This Page