Reading file bit by bit

A

Alfred Bovin

Hi all.

I'm working on something where I need to read a (binary) file bit by bit and
do something depending on whether the bit is 0 or 1.

Any help on doing the actual file reading is appreciated.

Thanks in advance
 
U

Ulrich Eckhardt

Alfred said:
I'm working on something where I need to read a (binary) file bit by bit
and do something depending on whether the bit is 0 or 1.

Well, smallest unit you can read is an octet/byte. You then check the
individual digits of the byte using binary masks.


f = open(...)
data = f.read()
for byte in data:
for i in range(8):
bit = 2**i & byte
...

Uli
 
P

Peter Otten

Alfred said:
I'm working on something where I need to read a (binary) file bit by bit
and do something depending on whether the bit is 0 or 1.

Any help on doing the actual file reading is appreciated.

The logical unit in which files are written is the byte. You can split the
bytes into 8 bits...
.... while True:
.... b = f.read(1)
.... if not b: break
.... b = ord(b)
.... for i in range(8):
.... yield b & 1
.... b >>= 1
........ for bit in bits(f):
.... print bit
....
0
1
0
1
0
0
1
1
1
1
1
1
0
1
0
1

but that's a very inefficient approach. If you explain what you are planning
to do we can most certainly come up with a better alternative.

Peter
 
R

Richard Thomas

The logical unit in which files are written is the byte. You can split the
bytes into 8 bits...


...     while True:
...             b = f.read(1)
...             if not b: break
...             b = ord(b)
...             for i in range(8):
...                     yield b & 1
...                     b >>= 1
...>>> with open("tmp.dat", "wb") as f: # create a file with some example data

...     f.write(chr(0b11001010)+chr(0b10101111))>>> with open("tmp.dat", "rb") as f:

...     for bit in bits(f):
...             print bit
...
0
1
0
1
0
0
1
1
1
1
1
1
0
1
0
1

but that's a very inefficient approach. If you explain what you are planning
to do we can most certainly come up with a better alternative.

Peter

You're reading those bits backwards. You want to read the most
significant bit of each byte first...

Richard.
 
U

Ulrich Eckhardt

Ulrich said:
data = f.read()
for byte in data:
for i in range(8):
bit = 2**i & byte
...

Correction: Of course you have to use ord() to get from the single-element
string ("byte" above) to its integral value first.

Uli
 
P

Peter Otten

Richard said:
You're reading those bits backwards. You want to read the most
significant bit of each byte first...

Richard.
.... while True:
.... byte = f.read(1)
.... if not byte: break
.... byte = ord(byte)
.... for i in reversed(range(8)):
.... yield byte >> i & 1
........ for bit in bits(f):
.... print bit,
....
1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1
 
N

Nobody

You're reading those bits backwards. You want to read the most
significant bit of each byte first...

Says who?

There is no universal standard for bit-order.

Among bitmap image formats, XBM is LSB-first while BMP and PBM are
MSB-first. OpenGL reads or writes bitmap data in either order, controlled
by glPixelStorei().

Most serial communication links (e.g. RS-232, ethernet) transmit the LSB
first, although there are exceptions (e.g. I2C uses MSB-first).
 
U

Ulrich Eckhardt

Nobody said:
Says who?

Says Python:
'0x11000000'

That said, I totally agree that there is no inherently right way and I guess
Richard was just a smiley or two short in order to have correct markup in
his not-so-serious posting.

:^)

Uli
 
U

Ulrich Eckhardt

Peter said:
Hmm, if that's what /your/ Python says, here's mine to counter:

'0_totally_faked_binary_00000011'

Argh! Of course one of my Pythons says '0b11000000' and not what I mistyped
above.... =(

Uli
*goes and hides under a stone*
 
U

Ulrich Eckhardt

superpollo said:
mine goes like this:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'bin' is not defined

Yep, one of mine, too. The "bin" function was new in 2.6, as were binary
number literals ("0b1100").

Uli
 
G

Grant Edwards

You're reading those bits backwards. You want to read the most
significant bit of each byte first...

Can you explain the reasoning behind that assertion?
 
T

Terry Reedy

Correction: Of course you have to use ord() to get from the single-element
string ("byte" above) to its integral value first.

In Py3 (OP did not specify), a binary file is read as bytes, which is a
sequence of ints, and one would have to not use ord() ;=)

tjr
 
M

Martin

Hi all.

I'm working on something where I need to read a (binary) file bit by bit and
do something depending on whether the bit is 0 or 1.

Any help on doing the actual file reading is appreciated.

Thanks in advance

Hi,

Have you looked at the numpy libraries?

It would be very easy to do...

import numpy as np

f = open("something.bin", "rb")
data = np.fromfile(f, np.uint8)
data = np.where(data == 0, data * 5, data)

So in this example I am just saying if data = 0, multiply by 5. This
saves the need for slow loops as well.

Mart.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,227
Latest member
Daniella65

Latest Threads

Top