Since the format can have:
5bit
24bit
24bit
I assumed that I would have to write byte by byte. And I don't really
consider speed important so I think that it's viable to do it this way.
@Grant
This is what I meant.
I would suggest you define a class (e.g., subclass the builtin file type)
that serves as a convenient (for you) bit-wise interface to a binary file
(binary is important on windows, or you will EOL conversions when you write).
E.g., so you will be able to write code like:
bf = BitFile('data/bitfile.dat', 'wb')
bf.write(0xfa, 5)
bf.write(whatever, 24)
bf.close()
The class will have to take care of buffering and packing and unpacking and endianness
and how to deal with a file that is not an integral number times 8 bits total (if you
are defining the format, you could always append an extra byte on close that says how many bits
there are in the last (preceding) data byte, so you could read back exactly the bits specified).
You could also give the class properties for common bit field widths, so that the
effect of e.g., the above writes would look like
bf.b5 = 0xfa
bf.b24 = whatever
would be to write (actually buffer, since you have to do that for fractional bytes
anyway, and will gain in i/o performance for larger chunks) five bits. On the read
side, you might want to distinguish between signed and unsigned bitfields, e.g.,
signed = bf.s5 # read next 5 bits as signed integer
unsigned = bf.u5 # ditto, except unsigned
Of course, packing bits together from a sequence of numbers into a string of bytes has
nothing necessarily to do with file i/o, so you might want to factor that out. E.g.,
you could take inspiration from struct to create something that works by bit fields, e.g.,
say '.n' means pack n bits adjacent to previously buffered bits. Say ',n' means skip n
bits as if you were reading or writing (introducing default 0 if not re-writing), and then
use the struct type letters for alignment skips, e.g., 'h' to skip to end of current short,
or 'l' to skip to end of current long. Then
pack('<.3,2.7h.24l', x, y, z)
could be a little-endian packing of size(short)+size(long) bits, with two fields
x and y of 3 and 7 bits respectively, separated by a 2-bit space, packed into a short,
followed z packed into the bottom of a long, for six bytes total.
Probably pack should be a class so that you get back an object that has both data bytes
and total bit length and methods for convenient concatenation, so
pack('.3', 10) + pack('.4', 15) == pack('.3.4', 10, 15)
Sorry I don't have time to implement this now (actually, I have a strictly-little-endian
hack that I used for some music compression experiments a while back, maybe I can find
it later). API preferences could probably stand a little discussion anyway ;-)
Regards,
Bengt Richter