File Read issue by using module binascii

J

Jimmie He

When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]
f.close()
print('bin: ',bsstr,type(bsstr))
return
 
J

Jimmie He

when I commet the line of "print('bin: ',bsstr,type(bsstr)) ",it can be run,so maybe the problem is the memory allocate of so long strings......Am I right?

在 2013å¹´4月27日星期六UTC+8上åˆ11æ—¶57分45秒,Jimmie He写é“:
 
F

Fábio Santos

It may be that you are printing too much data at once. 100k is a bit too
much to have in memory but it should run anyway. But your console may be
having trouble. Try looping over small chunks of the file and printing them
one at a time. Use a while loop. I do know that in windows the console is
not very efficient at printing so when I print too much data the console
itself starts taking up a lot of processor time.
when I commet the line of "print('bin: ',bsstr,type(bsstr)) ",it can be
run,so maybe the problem is the memory allocate of so long strings......Am
I right?

在 2013å¹´4月27日星期六UTC+8上åˆ11æ—¶57分45秒,Jimmie He写é“:
When I run the readbmp on an example.bmp(about 100k),the Shell is become
to "No respose",when I change f.read() to f.read(1000),it is ok,could
someone tell me the excat reason for this?
Thank you in advance!



Python Code as below!!



import binascii



def read_bmp():

f = open('example.bmp','rb')

rawdata = f.read() #f.read(1000) is ok

hexstr = binascii.b2a_hex(rawdata) #Get an HEX number

bsstr = bin (int(hexstr,16))[2:]

f.close()

print('bin: ',bsstr,type(bsstr))

return
 
J

Jimmie He

What you said should make sense and I've already correct my code by your advice,thanks for your response!


在 2013å¹´4月27日星期六UTC+8下åˆ5æ—¶56分08秒,Fábio Santos写é“:
It may be that you are printing too much data at once. 100k is a bit too much to have in memory but it should run anyway. But your console may be having trouble. Try looping over small chunks of the file and printing them one at a time. Use a while loop. I do know that in windows the console is not very efficient at printing so when I print too much data the console itself starts taking up a lot of processor time.



when I commet the line of "print('bin: ',bsstr,type(bsstr)) ",it can be run,so maybe the problem is the memory allocate of so long strings......Am Iright?



在 2013å¹´4月27日星期六UTC+8上åˆ11æ—¶57分45秒,Jimmie He写é“:
When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!
Python Code as below!!
import binascii
def read_bmp():
    f = open('example.bmp','rb')
    rawdata = f.read()                       #f.read(1000) is ok
    hexstr = binascii.b2a_hex(rawdata)      #Get an HEX number
    bsstr = bin (int(hexstr,16))[2:]
    f.close()
    print('bin: ',bsstr,type(bsstr))
    return
 
P

Peter Otten

Jimmie said:
When I run the readbmp on an example.bmp(about 100k),the Shell is become
to "No respose",when I change f.read() to f.read(1000),it is ok,could
someone tell me the excat reason for this? Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]
f.close()
print('bin: ',bsstr,type(bsstr))
return

What shell are you using? The one provided by Idle?
 
F

Fábio Santos

You could use your operating system's shell. Even if it is windows, it
should be a lot better and faster, and thus not block so easily.
 
P

Peter Otten

Jimmie said:
Yes. I use IDLE,the python version is 3.3.1.What else could I use??

The shell provided by the operating system is usually much faster. When I
modify your code to

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]
f.close()
print('bin: ',bsstr,type(bsstr))
return

if __name__ == "__main__":
read_bmp()

and generate a dummy example.bmp with 2**20 (about 1 million) bytes it takes
about 2 seconds to terminate -- on hardware that is quite old. If I redirect
the output it is even faster:

$ time python3 bmp_to_bin.py > /dev/null

real 0m0.766s
user 0m0.300s
sys 0m0.180s

I am a Linux user, but expect similar numbers on Windows (in the DOS box or
one of its successors).

I have considered filing a bug* to ask for a tweak in idle that improves its
responsiveness, but first wanted you to confirm that this was indeed the
problem.

(*) on http://bugs.python.org, if you want to do it yourself
 
P

Peter Otten

Tim said:
Jimmie He said:
When I run the readbmp on an example.bmp(about 100k),the Shell is become
to "No respose",when I change f.read() to f.read(1000),it is ok,could
someone tell me the excat reason for this? Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]

I suspect the root of the problem here is that you don't understand what
this is actually doing. You should run this code in the command-line
interpreter, one line at a time, and print the results.

The "read" instruction produces a string with 100k bytes. The b2a_hex
then
produces a string with 200k bytes. Then, int(hexstr,16) takes that
200,000 byte hex string and converts it to an integer, roughly equal to 10
to the
240,000 power, a number with some 240,000 decimal digits. You then
convert
that integer to a binary string. That string will contain 800,000 bytes.
You then drop the first two characters and print the other 799,998 bytes,
each of which will be either '0' or '1'.

I am absolutely, positively convinced that's not what you wanted to do.
What point is there in printing out the binary equavalent of a bitmap?

Even if you did, it would be much quicker for you to do the conversion one
byte at a time, completely skipping the conversion to hex and then the
creation of a massive multi-precision number. Example:

Hm, if you fix the long integer arithmetic "problem" you should also attack
the unbounded memory consumption problem in general ;)
f = open('example.bmp','rb')
rawdata = f.read()
bsstr = []
for b in rawdata:
bsstr.append( bin(ord(b)) )
bsstr = ''.join(bsstr)

or even:
f = open('example.bmp','rb')
bsstr = ''.join( bin(ord(b))[2:] for b in f.read() )

Yes, the original is horrible newbie code ;) but that's what you tend to
write while learning to program -- and python can handle it alright. On the
other hand, Idle becomes unresponsive when I do

in its shell. I'm still investigating, but the problem seems to be that it's
a single line.

takes under 7 secs. Not as good as konsole (KDE's terminal emulation) which
finishes in 0.5 secs, but acceptable.
 
J

Jens Thoms Toerring

Tim Roberts said:
When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]
I suspect the root of the problem here is that you don't understand what
this is actually doing. You should run this code in the command-line
interpreter, one line at a time, and print the results.
The "read" instruction produces a string with 100k bytes. The b2a_hex then
produces a string with 200k bytes. Then, int(hexstr,16) takes that 200,000
byte hex string and converts it to an integer, roughly equal to 10 to the
240,000 power, a number with some 240,000 decimal digits. You then convert
that integer to a binary string. That string will contain 800,000 bytes.
You then drop the first two characters and print the other 799,998 bytes,
each of which will be either '0' or '1'.
I am absolutely, positively convinced that's not what you wanted to do.
What point is there in printing out the binary equavalent of a bitmap?
Even if you did, it would be much quicker for you to do the conversion one
byte at a time, completely skipping the conversion to hex and then the
creation of a massive multi-precision number. Example:
f = open('example.bmp','rb')
rawdata = f.read()
bsstr = []
for b in rawdata:
bsstr.append( bin(ord(b)) )
bsstr = ''.join(bsstr)
or even:
f = open('example.bmp','rb')
bsstr = ''.join( bin(ord(b))[2:] for b in f.read() )

Exactly my idea at first. But then I started to time it (using
the timeit module) by comparing the following functions:

# Original version

def c1( rawdata ) :
h = binascii.b2a_hex( rawdata )
z = bin( int( h, 16 ) )[ 2 : ]
return '0' * ( 8 * len( r ) - len( z ) ) + z

# Convert each byte directly

def c2( rawdata ) :
return ''.join( bin( ord( x ) )[ 2 : ].rjust( 8, '0' ) for x in r )

# Convert each byte using a list for table look-up

def c3( rawdata ) :
h = [ bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) ]
return ''.join( h[ ord( x ) ] for x in rawdata )

# Convert each byte using a dictionary for table look-up (avoids
# lots of ord() calls)

def c4( rawdata ) :
h = { chr( i ) : bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) }
return ''.join( h[ x ] for x in rawdata )

As you can see I even in c3() and c4() tried to speed things up
further by using a table look-up instead if calling bin() etc.
on each byte. But the results was that c2() is nearly 15 times
slower than c1(), c3() about 3 times and c4() still more than 2
times slower! So the method the OP uses seems to be quite a bit
more efficient than one might be tempted to assume.

I would guess that the reason is that c1() does just a small
number of calls of functions that probably aren't implemented
in Python but in C and thus can be a lot faster then anything
you could achieve with Python, while the other functions use a
for loop in Python, which seems to account for a good part of
the CPU time used. To test for that I split the 'rawdata' string
into a list of character (i.e. single letter strings) and re-
assembled it using join() and a for loop:

r = list( rawdata( )
z = ''.join( x for x in r )

The second line alone took about 1.7 times longer than the
whole, seemingly convoluted c1() function!

What I take away from this is that a lot of the assumption one
is prone to make when coming from e.g. a C/C++ background can
be quite misleading when extrapolating to Python (or other in-
terpreted languages)...
Best regards, Jens
 
J

Jimmie He

Tim Roberts said:
When I run the readbmp on an example.bmp(about 100k),the Shell is become to "No respose",when I change f.read() to f.read(1000),it is ok,could someone tell me the excat reason for this?
Thank you in advance!

Python Code as below!!

import binascii

def read_bmp():
f = open('example.bmp','rb')
rawdata = f.read() #f.read(1000) is ok
hexstr = binascii.b2a_hex(rawdata) #Get an HEX number
bsstr = bin (int(hexstr,16))[2:]


I suspect the root of the problem here is that you don't understand what
this is actually doing. You should run this code in the command-line
interpreter, one line at a time, and print the results.


The "read" instruction produces a string with 100k bytes. The b2a_hex then
produces a string with 200k bytes. Then, int(hexstr,16) takes that 200,000
byte hex string and converts it to an integer, roughly equal to 10 to the
240,000 power, a number with some 240,000 decimal digits. You then convert
that integer to a binary string. That string will contain 800,000 bytes.
You then drop the first two characters and print the other 799,998 bytes,
each of which will be either '0' or '1'.


I am absolutely, positively convinced that's not what you wanted to do.
What point is there in printing out the binary equavalent of a bitmap?


Even if you did, it would be much quicker for you to do the conversion one
byte at a time, completely skipping the conversion to hex and then the
creation of a massive multi-precision number. Example:


f = open('example.bmp','rb')
rawdata = f.read()
bsstr = []
for b in rawdata:
bsstr.append( bin(ord(b)) )
bsstr = ''.join(bsstr)


or even:
f = open('example.bmp','rb')
bsstr = ''.join( bin(ord(b))[2:] for b in f.read() )



Exactly my idea at first. But then I started to time it (using

the timeit module) by comparing the following functions:



# Original version



def c1( rawdata ) :

h = binascii.b2a_hex( rawdata )

z = bin( int( h, 16 ) )[ 2 : ]

return '0' * ( 8 * len( r ) - len( z ) ) + z



# Convert each byte directly



def c2( rawdata ) :

return ''.join( bin( ord( x ) )[ 2 : ].rjust( 8, '0' ) for x in r )



# Convert each byte using a list for table look-up



def c3( rawdata ) :

h = [ bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) ]

return ''.join( h[ ord( x ) ] for x in rawdata )



# Convert each byte using a dictionary for table look-up (avoids

# lots of ord() calls)



def c4( rawdata ) :

h = { chr( i ) : bin( i )[ 2 : ].rjust( 8, '0' ) for i in range( 256 ) }

return ''.join( h[ x ] for x in rawdata )



As you can see I even in c3() and c4() tried to speed things up

further by using a table look-up instead if calling bin() etc.

on each byte. But the results was that c2() is nearly 15 times

slower than c1(), c3() about 3 times and c4() still more than 2

times slower! So the method the OP uses seems to be quite a bit

more efficient than one might be tempted to assume.



I would guess that the reason is that c1() does just a small

number of calls of functions that probably aren't implemented

in Python but in C and thus can be a lot faster then anything

you could achieve with Python, while the other functions use a

for loop in Python, which seems to account for a good part of

the CPU time used. To test for that I split the 'rawdata' string

into a list of character (i.e. single letter strings) and re-

assembled it using join() and a for loop:



r = list( rawdata( )

z = ''.join( x for x in r )



The second line alone took about 1.7 times longer than the

whole, seemingly convoluted c1() function!



What I take away from this is that a lot of the assumption one

is prone to make when coming from e.g. a C/C++ background can

be quite misleading when extrapolating to Python (or other in-

terpreted languages)...

Best regards, Jens

--

\ Jens Thoms Toerring ___ (e-mail address removed)

\__________________________ http://toerring.de
Hi,Jens &Peter &Tim,
Thank you very much for your wonderful analysis for my newbie question.
I admit that I throw this question to much early because I just want some guru to inspire me;-) If it really confuse you,excuse my noise:)
What I intend to do is to make an BMP Font Maker(Covert the BMP to an data array,what I did wrong is print it directly to screen and had not understand it at all firstly.
C1()~C4() which Jens provided deeply indicate that we should think about the effiency because it is an interpreted language.
Anyway thanks for all your kindly help :)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,074
Latest member
StanleyFra

Latest Threads

Top