How to Read Bytes from a file

G

gregpinero

It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

Thanks in advance for any help.

-Greg
 
A

Alex Martelli

It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

You should probaby prepare before the loop a mapping from char to number
of 1 bits in that char:

m = {}
for c in range(256):
m[c] = countones(c)

and then sum up the values of m[abyte] into a running total (break from
the loop when 'not abyte', i.e. you're reading 0 bytes even though
asking for 1 -- that tells you the fine is finished, remember to close
it).

A trivial way to do the countones function:

def countones(x):
assert x>=0
c = 0
while x:
c += (x&1)
x >>= 1
return c

you just don't want to call it too often, whence the previous advice to
call it just 256 times to prep a mapping.

If you download and install gmpy you can use gmpy.popcount as a fast
implementation of countones:).


Alex
 
L

Leif K-Brooks

Alex said:
You should probaby prepare before the loop a mapping from char to number
of 1 bits in that char:

m = {}
for c in range(256):
m[c] = countones(c)

Wouldn't a list be more efficient?

m = [countones(c) for c in xrange(256)]
 
B

Bart Ogryczak

It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))

byteOnes[n] is number is number of ones in byte n.
 
J

Jussi Salmela

Bart Ogryczak kirjoitti:
It seems like this would be easy but I'm drawing a blank.

What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.

I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.

f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))

byteOnes[n] is number is number of ones in byte n.

I guess struct.unpack is not necessary, because:

byteOnes2 = map(count1, (ord(ch) for ch in buf))

seems to do the trick also.

Cheers,
Jussi
 
A

Alex Martelli

Leif K-Brooks said:
Alex said:
You should probaby prepare before the loop a mapping from char to number
of 1 bits in that char:

m = {}
for c in range(256):
m[c] = countones(c)

Wouldn't a list be more efficient?

m = [countones(c) for c in xrange(256)]

Yes, or an array.array -- actually I meant to use m[chr(c)] above (so
you could use the character you're reading directly to index m, rather
than calling ord(byte) a bazillion times for each byte you're reading),
but if you're using the numbers (as I did before) a list or array is
better.


Alex
 
G

gregpinero

It seems like this would be easy but I'm drawing a blank.
What I want to do is be able to open any file in binary mode, and read
in one byte (8 bits) at a time and then count the number of 1 bits in
that byte.
I got as far as this but it is giving me strings and I'm not sure how
to accurately get to the byte/bit level.
f1=file('somefile','rb')
while 1:
abyte=f1.read(1)

import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))

byteOnes[n] is number is number of ones in byte n.


This solution looks nice, but how does it work? I'm guessing
struct.unpack will provide me with 8 bit bytes (will this work on any
system?)

How does count1 work exactly?

Thanks for the help.

-Greg
 
J

John Machin

import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))

byteOnes = map(count1,struct.unpack('%dB'%len(buf),buf))
 
B

Bart Ogryczak

On Mar 1, 7:52 am, "(e-mail address removed)" <[email protected]>
wrote:
import struct
buf = open('somefile','rb').read()
count1 = lambda x: (x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+
(x&64>0)+(x&128>0)
byteOnes = map(count1,struct.unpack('B'*len(buf),buf))
byteOnes[n] is number is number of ones in byte n.

This solution looks nice, but how does it work? I'm guessing
struct.unpack will provide me with 8 bit bytes


unpack with 'B' format gives you int value equivalent to unsigned char
(1 byte).
(will this work on any system?)

Any system with 8-bit bytes, which would mean any system made after
1965. I'm not aware of any Python implementation for UNIVAC, so I
wouldn't worry ;-)
How does count1 work exactly?

1,2,4,8,16,32,64,128 in binary are
1,10,100,1000,10000,100000,1000000,10000000
x&1 == 1 if x has first bit set to 1
x&2 == 2, so (x&2>0) == True if x has second bit set to 1
.... and so on.
In the context of int, True is interpreted as 1, False as 0.
 
G

gregpinero

unpack with 'B' format gives you int value equivalent to unsigned char
(1 byte).


Any system with 8-bit bytes, which would mean any system made after
1965. I'm not aware of any Python implementation for UNIVAC, so I
wouldn't worry ;-)


1,2,4,8,16,32,64,128 in binary are
1,10,100,1000,10000,100000,1000000,10000000
x&1 == 1 if x has first bit set to 1
x&2 == 2, so (x&2>0) == True if x has second bit set to 1
... and so on.
In the context of int, True is interpreted as 1, False as 0.

Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?

Thanks again everyone for the help.

-Greg
 
H

Hendrik van Rooyen

Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?

0 to 255 inclusive, actually - that is 256 numbers...

The largest number representable in a byte is 255

eight bits, of value 128,64,32,16,8,4,2,1

Their sum is 255...

And then there is zero.

- Hendrik
 
B

Bart Ogryczak

Thanks Bart. That's perfect. The other suggestion was to precompute
count1 for all possible bytes, I guess that's 0-256, right?

0-255 actually. It'd be worth it, if accessing dictionary with
precomputed values would be significantly faster then calculating the
lambda, which I doubt. I suspect it actually might be slower.
 
P

Piet van Oostrum

Bart Ogryczak said:
BO> Any system with 8-bit bytes, which would mean any system made after
BO> 1965. I'm not aware of any Python implementation for UNIVAC, so I
BO> wouldn't worry ;-)

1965? I worked with non-8-byte machines (CDC) until the beginning of the
80's. :=( In fact in that time the institution where Guido worked also had such
a machine, but Python came later.
 
B

Bart Ogryczak

1965? I worked with non-8-byte machines (CDC) until the beginning of the
80's. :=( In fact in that time the institution where Guido worked also had such
a machine, but Python came later.

Right, I should have written 'designed' not 'made'. UNIVACs also have
been produced until early 1980s. Anyway, I'd call it
paleoinformatics ;-)
 
G

Gabriel Genellina

0-255 actually. It'd be worth it, if accessing dictionary with
precomputed values would be significantly faster then calculating the
lambda, which I doubt. I suspect it actually might be slower.

Dictionary access is highly optimized in Python. In fact, using a
precomputed dictionary is about 12 times faster:

py> import timeit
py> count1 = lambda x:
(x&1)+(x&2>0)+(x&4>0)+(x&8>0)+(x&16>0)+(x&32>0)+(x&64>0)+
(x&128>0)
py> d256 = dict((i, count1(i)) for i in range(256))
py> timeit.Timer("for x in range(256): w = d256[x]", "from __main__ import
d256"
).repeat(number=10000)
[0.54261253874445003, 0.54763468541393934, 0.54499943428564279]
py> timeit.Timer("for x in range(256): w = count1(x)", "from __main__
import cou
nt1").repeat(number=10000)
[6.1867963665773118, 6.1967124313285638, 6.1666287195719178]
 
H

Hendrik van Rooyen

Piet van Oostrum said:
1965? I worked with non-8-byte machines (CDC) until the beginning of the
80's. :=( In fact in that time the institution where Guido worked also had such
a machine, but Python came later.

Those behemoths were EXPENSIVE - so it made a lot of sense to keep using
them until the point that it became obvious even to an accountant that the
maintenance cost was no longer worth it...

Would actually not surprise me if there were still a few around, doing
electricity
accounts or something.

- Hendrik
 
H

Hendrik van Rooyen

Right, I should have written 'designed' not 'made'. UNIVACs also have
been produced until early 1980s. Anyway, I'd call it
paleoinformatics ;-)

The correct term is: "Data Processing", or DP for short.

- Hendrik
 
D

Dennis Lee Bieber

Would actually not surprise me if there were still a few around, doing
electricity
accounts or something.

I would hope the accountants for that electric company have taken
into consideration the cost of the electricity to do that billing <G>
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

Matthias Julius

Gabriel Genellina said:
En Fri, 02 Mar 2007 08:22:36 -0300, Bart Ogryczak


Dictionary access is highly optimized in Python. In fact, using a
precomputed dictionary is about 12 times faster:

Why using a dictionary and not a list?

Matthias
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,786
Messages
2,569,625
Members
45,320
Latest member
icelord

Latest Threads

Top