File Read issue by using module binascii

Jimmie He · Apr 27, 2013

When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]
f.close()
print('bin: ',bsstr,type(bsstr))
return

Jimmie He · Apr 27, 2013

when I commet the line of "print('bin: ',bsstr,type(bsstr)) ",it can be run,so maybe the problem is the memory allocate of so long strings......Am I right?

åœ¨ 2013å¹´4æœˆ27æ—¥æ˜ŸæœŸå…UTC+8ä¸Šåˆ11æ—¶57åˆ†45ç§’ï¼ŒJimmie Heå†™é“ï¼š

Fábio Santos · Apr 27, 2013

It may be that you are printing too much data at once. 100k is a bit too
much to have in memory but it should run anyway. But your console may be
having trouble. Try looping over small chunks of the file and printing them
one at a time. Use a while loop. I do know that in windows the console is
not very efficient at printing so when I print too much data the console
itself starts taking up a lot of processor time.

when I commet the line of "print('bin: ',bsstr,type(bsstr)) ",it can be
run,so maybe the problem is the memory allocate of so long strings......Am
I right?

åœ¨ 2013å¹´4æœˆ27æ—¥æ˜ŸæœŸå…UTC+8ä¸Šåˆ11æ—¶57åˆ†45ç§’ï¼ŒJimmie Heå†™é“ï¼š

When I run the readbmp on an example.bmp(about 100k),the Shell is become

Click to expand...

to "No respose",when I change f.read() to f.read(1000),it is ok,could
someone tell me the excat reason for this?

Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():

f = open('example.bmp','rb')

rawdata = f.read() #f.read(1000) is ok

hexstr = binascii.b2a_hex(rawdata) #Get an HEX number

bsstr = bin (int(hexstr,16))[2:]

f.close()

print('bin: ',bsstr,type(bsstr))

return

Click to expand...

Jimmie He · Apr 27, 2013

What you said should make sense and I've already correct my code by your advice,thanks for your response!

åœ¨ 2013å¹´4æœˆ27æ—¥æ˜ŸæœŸå…UTC+8ä¸‹åˆ5æ—¶56åˆ†08ç§’ï¼ŒFÃ¡bio Santoså†™é“ï¼š

It may be that you are printing too much data at once. 100k is a bit too much to have in memory but it should run anyway. But your console may be having trouble. Try looping over small chunks of the file and printing them one at a time. Use a while loop. I do know that in windows the console is not very efficient at printing so when I print too much data the console itself starts taking up a lot of processor time.

when I commet the line of "print('bin: ',bsstr,type(bsstr)) ",it can be run,so maybe the problem is the memory allocate of so long strings......Am Iright?

åœ¨ 2013å¹´4æœˆ27æ—¥æ˜ŸæœŸå…UTC+8ä¸Šåˆ11æ—¶57åˆ†45ç§’ï¼ŒJimmie Heå†™é“ï¼š

When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!
Python Code as below!!
import binascii
def read_bmp():
Â Â f = open('example.bmp','rb')
Â Â rawdata = f.read() Â Â Â Â Â Â Â Â Â Â Â #f.read(1000) is ok
Â Â hexstr = binascii.b2a_hex(rawdata) Â Â Â #Get an HEX number
Â Â bsstr = bin (int(hexstr,16))[2:]
Â Â f.close()
Â Â print('bin: ',bsstr,type(bsstr))
Â Â return

Click to expand...

Peter Otten · Apr 27, 2013

Jimmie said:
When I run the readbmp on an example.bmp(about 100k),the Shell is become
to "No respose",when I change f.read() to f.read(1000),it is ok,could
someone tell me the excat reason for this? Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]
f.close()
print('bin: ',bsstr,type(bsstr))
return

What shell are you using? The one provided by Idle?

Jimmie He · Apr 27, 2013

What shell are you using? The one provided by Idle?

Yes. I use IDLE,the python version is 3.3.1.What else could I use??

Fábio Santos · Apr 27, 2013

You could use your operating system's shell. Even if it is windows, it
should be a lot better and faster, and thus not block so easily.

Peter Otten · Apr 27, 2013

Jimmie said:
Yes. I use IDLE,the python version is 3.3.1.What else could I use??

The shell provided by the operating system is usually much faster. When I
modify your code to

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]
f.close()
print('bin: ',bsstr,type(bsstr))
return

if __name__ == "__main__":
read_bmp()

and generate a dummy example.bmp with 2**20 (about 1 million) bytes it takes
about 2 seconds to terminate -- on hardware that is quite old. If I redirect
the output it is even faster:

$ time python3 bmp_to_bin.py > /dev/null

real 0m0.766s
user 0m0.300s
sys 0m0.180s

I am a Linux user, but expect similar numbers on Windows (in the DOS box or
one of its successors).

I have considered filing a bug* to ask for a tweak in idle that improves its
responsiveness, but first wanted you to confirm that this was indeed the
problem.

(*) on http://bugs.python.org, if you want to do it yourself

Jimmie He · Apr 27, 2013

Peter,
Thanks for your test in details. Because I'm an newbie in Python and I can not conform whether it is an bug. Here I attach my code include my emample.bmp as link below.
https://hotfile.com/dl/204795514/62fed41/BMPTool.zip.html

Peter Otten · Apr 28, 2013

Tim said:
Jimmie He said:

When I run the readbmp on an example.bmp(about 100k),the Shell is become
to "No respose",when I change f.read() to f.read(1000),it is ok,could
someone tell me the excat reason for this? Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]

Click to expand...

I suspect the root of the problem here is that you don't understand what
this is actually doing. You should run this code in the command-line
interpreter, one line at a time, and print the results.

The "read" instruction produces a string with 100k bytes. The b2a_hex
then
produces a string with 200k bytes. Then, int(hexstr,16) takes that
200,000 byte hex string and converts it to an integer, roughly equal to 10
to the
240,000 power, a number with some 240,000 decimal digits. You then
convert
that integer to a binary string. That string will contain 800,000 bytes.
You then drop the first two characters and print the other 799,998 bytes,
each of which will be either '0' or '1'.

I am absolutely, positively convinced that's not what you wanted to do.
What point is there in printing out the binary equavalent of a bitmap?

Even if you did, it would be much quicker for you to do the conversion one
byte at a time, completely skipping the conversion to hex and then the
creation of a massive multi-precision number. Example:

Hm, if you fix the long integer arithmetic "problem" you should also attack
the unbounded memory consumption problem in general

f = open('example.bmp','rb')
rawdata = f.read()
bsstr = []
for b in rawdata:
bsstr.append( bin(ord(b)) )
bsstr = ''.join(bsstr)

or even:
f = open('example.bmp','rb')
bsstr = ''.join( bin(ord(b))[2:] for b in f.read() )

Yes, the original is horrible newbie code

but that's what you tend to
write while learning to program -- and python can handle it alright. On the
other hand, Idle becomes unresponsive when I do

in its shell. I'm still investigating, but the problem seems to be that it's
a single line.

takes under 7 secs. Not as good as konsole (KDE's terminal emulation) which
finishes in 0.5 secs, but acceptable.

Jens Thoms Toerring · Apr 28, 2013

Tim Roberts said:
When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]

Click to expand...

I suspect the root of the problem here is that you don't understand what
this is actually doing. You should run this code in the command-line
interpreter, one line at a time, and print the results.

The "read" instruction produces a string with 100k bytes. The b2a_hex then
produces a string with 200k bytes. Then, int(hexstr,16) takes that 200,000
byte hex string and converts it to an integer, roughly equal to 10 to the
240,000 power, a number with some 240,000 decimal digits. You then convert
that integer to a binary string. That string will contain 800,000 bytes.
You then drop the first two characters and print the other 799,998 bytes,
each of which will be either '0' or '1'.

I am absolutely, positively convinced that's not what you wanted to do.
What point is there in printing out the binary equavalent of a bitmap?

Even if you did, it would be much quicker for you to do the conversion one
byte at a time, completely skipping the conversion to hex and then the
creation of a massive multi-precision number. Example:

f = open('example.bmp','rb')
rawdata = f.read()
bsstr = []
for b in rawdata:
bsstr.append( bin(ord(b)) )
bsstr = ''.join(bsstr)

or even:
f = open('example.bmp','rb')
bsstr = ''.join( bin(ord(b))[2:] for b in f.read() )

Exactly my idea at first. But then I started to time it (using
the timeit module) by comparing the following functions:

# Original version

def c1( rawdata ) :
h = binascii.b2a_hex( rawdata )
z = bin( int( h, 16 ) )[ 2 : ]
return '0' * ( 8 * len( r ) - len( z ) ) + z

# Convert each byte directly

def c2( rawdata ) :
return ''.join( bin( ord( x ) )[ 2 : ].rjust( 8, '0' ) for x in r )

# Convert each byte using a list for table look-up

def c3( rawdata ) :
h = [ bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) ]
return ''.join( h[ ord( x ) ] for x in rawdata )

# Convert each byte using a dictionary for table look-up (avoids
# lots of ord() calls)

def c4( rawdata ) :
h = { chr( i ) : bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) }
return ''.join( h[ x ] for x in rawdata )

As you can see I even in c3() and c4() tried to speed things up
further by using a table look-up instead if calling bin() etc.
on each byte. But the results was that c2() is nearly 15 times
slower than c1(), c3() about 3 times and c4() still more than 2
times slower! So the method the OP uses seems to be quite a bit
more efficient than one might be tempted to assume.

I would guess that the reason is that c1() does just a small
number of calls of functions that probably aren't implemented
in Python but in C and thus can be a lot faster then anything
you could achieve with Python, while the other functions use a
for loop in Python, which seems to account for a good part of
the CPU time used. To test for that I split the 'rawdata' string
into a list of character (i.e. single letter strings) and re-
assembled it using join() and a for loop:

r = list( rawdata( )
z = ''.join( x for x in r )

The second line alone took about 1.7 times longer than the
whole, seemingly convoluted c1() function!

What I take away from this is that a lot of the assumption one
is prone to make when coming from e.g. a C/C++ background can
be quite misleading when extrapolating to Python (or other in-
terpreted languages)...
Best regards, Jens

Jimmie He · Apr 28, 2013

Tim Roberts said:
Tim Roberts said:

When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]

Click to expand...

I suspect the root of the problem here is that you don't understand what

Click to expand...

this is actually doing. You should run this code in the command-line

Click to expand...

interpreter, one line at a time, and print the results.

The "read" instruction produces a string with 100k bytes. The b2a_hex then

Click to expand...

produces a string with 200k bytes. Then, int(hexstr,16) takes that 200,000

Click to expand...

byte hex string and converts it to an integer, roughly equal to 10 to the

Click to expand...

240,000 power, a number with some 240,000 decimal digits. You then convert

Click to expand...

that integer to a binary string. That string will contain 800,000 bytes.

Click to expand...

You then drop the first two characters and print the other 799,998 bytes,

Click to expand...

each of which will be either '0' or '1'.

I am absolutely, positively convinced that's not what you wanted to do.

Click to expand...

What point is there in printing out the binary equavalent of a bitmap?

Even if you did, it would be much quicker for you to do the conversion one

Click to expand...

byte at a time, completely skipping the conversion to hex and then the

Click to expand...

creation of a massive multi-precision number. Example:

f = open('example.bmp','rb')

Click to expand...

rawdata = f.read()

Click to expand...

bsstr = []

Click to expand...

for b in rawdata:

Click to expand...

bsstr.append( bin(ord(b)) )

Click to expand...

bsstr = ''.join(bsstr)

or even:

Click to expand...

f = open('example.bmp','rb')

Click to expand...

bsstr = ''.join( bin(ord(b))[2:] for b in f.read() )

Click to expand...

Exactly my idea at first. But then I started to time it (using

the timeit module) by comparing the following functions:

# Original version

def c1( rawdata ) :

h = binascii.b2a_hex( rawdata )

z = bin( int( h, 16 ) )[ 2 : ]

return '0' * ( 8 * len( r ) - len( z ) ) + z

# Convert each byte directly

def c2( rawdata ) :

return ''.join( bin( ord( x ) )[ 2 : ].rjust( 8, '0' ) for x in r )

# Convert each byte using a list for table look-up

def c3( rawdata ) :

h = [ bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) ]

return ''.join( h[ ord( x ) ] for x in rawdata )

# Convert each byte using a dictionary for table look-up (avoids

# lots of ord() calls)

def c4( rawdata ) :

h = { chr( i ) : bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) }

return ''.join( h[ x ] for x in rawdata )

As you can see I even in c3() and c4() tried to speed things up

further by using a table look-up instead if calling bin() etc.

on each byte. But the results was that c2() is nearly 15 times

slower than c1(), c3() about 3 times and c4() still more than 2

times slower! So the method the OP uses seems to be quite a bit

more efficient than one might be tempted to assume.

I would guess that the reason is that c1() does just a small

number of calls of functions that probably aren't implemented

in Python but in C and thus can be a lot faster then anything

you could achieve with Python, while the other functions use a

for loop in Python, which seems to account for a good part of

the CPU time used. To test for that I split the 'rawdata' string

into a list of character (i.e. single letter strings) and re-

assembled it using join() and a for loop:

r = list( rawdata( )

z = ''.join( x for x in r )

The second line alone took about 1.7 times longer than the

whole, seemingly convoluted c1() function!

What I take away from this is that a lot of the assumption one

is prone to make when coming from e.g. a C/C++ background can

be quite misleading when extrapolating to Python (or other in-

terpreted languages)...

Best regards, Jens

--

\ Jens Thoms Toerring ___ (e-mail address removed)

\__________________________ http://toerring.de

Hi,Jens &Peter &Tim,
Thank you very much for your wonderful analysis for my newbie question.
I admit that I throw this question to much early because I just want some guru to inspire me;-) If it really confuse you,excuse my noise

What I intend to do is to make an BMP Font Maker(Covert the BMP to an data array,what I did wrong is print it directly to screen and had not understand it at all firstly.
C1()~C4() which Jens provided deeply indicate that we should think about the effiency because it is an interpreted language.
Anyway thanks for all your kindly help

XOR Binascii Hex Strings using the PyCrypto module	0	Feb 16, 2007
Fwd: Issue with continous incrementing of unbroken sequence for aentire working day	1	Feb 28, 2013
File Read Cache - How to purge?	14	Aug 21, 2007
Rename file if it exists.	4	Oct 16, 2009
How do I continue after the error?	2	Jun 5, 2009
A performance issue when using default value	4	Feb 1, 2010
trouble quitting PyQt4 App	0	Sep 14, 2009
unknown_ca exception when using client certificate in net::https	1	Jul 25, 2010

File Read issue by using module binascii

Jimmie He

Jimmie He

Fábio Santos

Jimmie He

Peter Otten

Jimmie He

Fábio Santos

Peter Otten

Jimmie He

Peter Otten

Jens Thoms Toerring

Jimmie He

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads