Print a string in binary format

N

neutrino

Greetings to the Python gurus,

I have a binary file and wish to see the "raw" content of it. So I open
it in binary mode, and read one byte at a time to a variable, which
will be of the string type. Now the problem is how to print the binary
format of that charater to the standard output. It seems a common task
but I just cannot find the appropriate method from the documentation.
Thanks a lot.
 
M

Mark McEahern

neutrino said:
Greetings to the Python gurus,

I have a binary file and wish to see the "raw" content of it. So I open
it in binary mode, and read one byte at a time to a variable, which
will be of the string type. Now the problem is how to print the binary
format of that charater to the standard output. It seems a common task
but I just cannot find the appropriate method from the documentation.
Thanks a lot.

How is this *not* what you want:

import sys
f = open(filename, 'rb')
data = f.read(1)
while data:
sys.stdout.write(data)
data = f.read(1)

Of course, that's the long version of:

print open(filename, 'rb').read()

// m
 
N

neutrino

Mmm, the output format that I want is like:
0001110011111111000000001111111100000000....

I tried your code on Cygwin under Windows 2000, and on Linux, but it
prints out ASCII characters.
 
K

Kartic

neutrino said the following on 1/20/2005 7:53 PM:
Greetings to the Python gurus,

I have a binary file and wish to see the "raw" content of it. So I open
it in binary mode, and read one byte at a time to a variable, which
will be of the string type. Now the problem is how to print the binary
format of that charater to the standard output. It seems a common task
but I just cannot find the appropriate method from the documentation.
Thanks a lot.

Not a guru, but I will try to help out :)

how about this:
Now this is what telnet.hex looks like:
(This file must be converted with BinHex 4.0)

:#R4PE'jPG#jPH'8!2j!)!!!!!4B!N!@9[deDN!!!!`!!!!3!!!$rr`!!Z!#3"d!
!N#2J!!!!$Kqk$J#d#FdKZ!&-c5&8D'Pc)("bEfGbB@dJBf&ZEQpd)'*P)(*eEL"
TEL"%6e-JE@pNC5i0$3SN!*!(Q$X`mpaDAU$F@PkJh&THS#Cj(U$G@PkJh&TIS'P
DAU!QH8HJceTHS%Yj'k$G@PkJ"RP#S-TDAU!'H81Jh9THS#CjBk$G@PkJ8QPMD0a
DAU!!N""343!!6!%$!&VGE6d!N!MJ!!m"#`%(!!$!!!!!"J)!N!AQ[3!!!"!!!!$
3!*!&!3!3!!!!!J!!"3!"!!8!!3!%!*!)i!)!!!3!!&Hl!3!$!!#!!!!%!!!3!*!
%%!!!%!#3"K!!N!Z3!-%!!-J!N!5J!J"B1!#3'[!5!!!F!*!M8!)!!-`!N!33!!$
F!J#3'LjdCAKd!!!!rVm!!!!3!!!!`!!!!!3!N!iJ!!"J,Q4KG'%!!!"mb`%!!0!

(... Big Snip ...)


Or how about this?
>>> f = open('C:/windows/system32/telnet.exe', 'rb')
>>> fcontents = f.read()
>>> import binhex
>>> print binhex.binascii.hexlify(fcontents[0:10]) '4d5a9000030000000400'
>>>

Is this what you want???

Thanks,
--Kartic
 
K

Kartic

neutrino said the following on 1/20/2005 8:21 PM:
Mmm, the output format that I want is like:
0001110011111111000000001111111100000000....

I tried your code on Cygwin under Windows 2000, and on Linux, but it
prints out ASCII characters.

Aha..I guess I posted too soon.

You might want to take a look at this cookbook entry:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/219300

Defines lambdas to convert integer to binary. The one you probably want is -'1100001'

--Kartic
 
P

Peter Hansen

neutrino said:
Mmm, the output format that I want is like:
0001110011111111000000001111111100000000....

I tried your code on Cygwin under Windows 2000, and on Linux, but it
prints out ASCII characters.

Generally speaking, trying to work with binary like that
will drive you mad. Most people find themselves far more
productive learning to work with hexadecimal, which is nice
in this case since you'll automatically get the hex
representation of any non-printable character if you just
print the string with repr() around it.

-Peter
 
S

Stephen Thorne

neutrino said the following on 1/20/2005 8:21 PM:

Aha..I guess I posted too soon.

You might want to take a look at this cookbook entry:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/219300

Defines lambdas to convert integer to binary. The one you probably want is -
'1100001'

Death to inappropriate usage of lambda.
First of all, that lambda is buggy, it doesn't work for negative
numbers, but you can't immediately see that because of the compressed
nature of the code.

The lambda, with the bug corrected, converts to a function that looks like this.
def bstr(n, length=16):
if n == 0:
return '0'
if n<0:
return bstr((long(2)<<length)+n)
return bstr(n>>1).lstrip('0') + str(n&1)

As you can see, the kwarg used is actually only used for the negative
numbers, so we can make this more concise as:
def bstr(n):
if n == 0:
return '0'
return bstr(n>>1).lstrip('0') + str(n&1)

But this isn't what we want, because we want bstr(1) to == '00000001'
not '1'. So lets just throw the POS away and rewrite it, firstly:

Test Cases:

assert bstr(0) == '00000000'
assert bstr(1) == '00000001'
assert bstr(255) == '11111111'
assert bstr(128) == '10000000'

def bstr(n): # n in range 0-255
return ''.join([str(n >> x & 1) for x in (7,6,5,4,3,2,1,0)])

Took me a little bit, and I replaced xrange(7,-1,-1) with the much
clearer precomputed tuple, but what we have a recursionless function
that actually fulfills the requirements.

This can be used in the original poster's situation to output data in
(almost) readable binary format using something like the following:

f = file('myfile', 'rb')
while 1:
bytes = f.read(8)
if not bytes:
break
print ' '.join([bstr(ord(c)) for c in bytes])

Regards,
Stephen Thorne
 
B

Bengt Richter

Mmm, the output format that I want is like:
0001110011111111000000001111111100000000....

I tried your code on Cygwin under Windows 2000, and on Linux, but it
prints out ASCII characters.

You can make a 256-length list to translate 0..255 byte codes into binary strings thus:
>>> byte2bits = [''.join(['01'[i&(1<<b)>0] for b in xrange(7,-1,-1)]) for i in xrange(256)]

Then you can look up string representations for bytes thus:
>>> byte2bits[10] '00001010'
>>> byte2bits[255]
'11111111'

and if you had a byte (single character string) that was part of a binary file buffer
that you had read whose value was e.g., '\xa3' you could get its representation thus:
'10100011'

and then you can make a generator routine to convert and join little sequences of bytes, e.g.,
... for byte_chunk in byte_chunk_seq:
... yield bspacer.join([byte2bits[ord(byte)] for byte in byte_chunk])
...
>>> bytes2bin(['\x00\x01', '\xa3\xa4\xa5\xa6', '@ABCDEFG'])
>>> for line in bytes2bin(['\x00\x01', '\xa3\xa4\xa5\xa6', '@ABCDEFG']): print line
...
00000000 00000001
10100011 10100100 10100101 10100110
01000000 01000001 01000010 01000011 01000100 01000101 01000110 01000111

If you want to feed bytes2bin with a sequence of 8-byte chunks from a binary file,
you can define a chunk sequence thus (the last chunk might not be a full 8 bytes):

chunkseq = iter(lambda f=open(filename, 'rb'):f.read(8), '')

and then you can use that as above to print a binary file's content. E.g.,

First we'll make a quick test file
>>> open('btest.bin', 'wb').write(''.join([chr(i&0xff) for i in xrange(384)]))

Then we'll read, convert and print:
...
00000000 00000001 00000010 00000011 00000100 00000101 00000110 00000111
00001000 00001001 00001010 00001011 00001100 00001101 00001110 00001111
00010000 00010001 00010010 00010011 00010100 00010101 00010110 00010111
00011000 00011001 00011010 00011011 00011100 00011101 00011110 00011111
[...snip stuff you can infer ...]
11100000 11100001 11100010 11100011 11100100 11100101 11100110 11100111
11101000 11101001 11101010 11101011 11101100 11101101 11101110 11101111
11110000 11110001 11110010 11110011 11110100 11110101 11110110 11110111
11111000 11111001 11111010 11111011 11111100 11111101 11111110 11111111
00000000 00000001 00000010 00000011 00000100 00000101 00000110 00000111
00001000 00001001 00001010 00001011 00001100 00001101 00001110 00001111
00010000 00010001 00010010 00010011 00010100 00010101 00010110 00010111
[...snip stuff you can infer ...]
01110000 01110001 01110010 01110011 01110100 01110101 01110110 01110111
01111000 01111001 01111010 01111011 01111100 01111101 01111110 01111111

Well, you get the idea.

Regards,
Bengt Richter
 
P

Paul Rubin

neutrino said:
I have a binary file and wish to see the "raw" content of it. So I open
it in binary mode, and read one byte at a time to a variable, which
will be of the string type. Now the problem is how to print the binary
format of that charater to the standard output. It seems a common task
but I just cannot find the appropriate method from the documentation.

There's not a builtin for it. There was some discussion in the bug
system of adding it to the binascii module. Meanwhile you sort of
have to write actual code.
 
N

Nick Coghlan

neutrino said:
Greetings to the Python gurus,

I have a binary file and wish to see the "raw" content of it. So I open
it in binary mode, and read one byte at a time to a variable, which
will be of the string type. Now the problem is how to print the binary
format of that charater to the standard output. It seems a common task
but I just cannot find the appropriate method from the documentation.
Thanks a lot.

FWIW, I work with serial data a lot, and I find the following list comprehension
to be a handy output tool for binary data:

print " ".join(["%0.2X" % ord(c) for c in data])

The space between each byte helps keep things from degenerating into a
meaningless mass of numbers, and using 2-digit hex instead of binary works
towards the same purpose. (I actually currently use the hex() builtin, but the
above just occurred to me, and it will give nicer formatting, and avoids the
C-style "0x" prefixing each byte)

Here's an interesting twiddle, though (there's probably already something along
these lines in the cookbook):

Py> def show_base(val, base, min_length = 1):
.... chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
.... if base < 2: raise ValueError("2 is minimum meaningful base")
.... if base > len(chars): raise ValueError("Not enough characters for base")
.... new_val = []
.... while val:
.... val, remainder = divmod(val, base)
.... new_val.append(chars[remainder])
.... result = "".join(reversed(new_val))
.... return ("0" * (min_length - len(result))) + result
....
Py> show_base(10, 2)
'1010'
Py> show_base(10, 2, 8)
'00001010'
Py> show_base(10, 16, 2)
'0A'
Py> show_base(254, 16, 2)
'FE'
Py> show_base(0, 16)
'0'
Py> for base in range(2, 36):
.... for testval in range(1000):
.... assert testval == int(show_base(testval, base), base)
....
Py>

Cheers,
Nick.
 
S

Steven Bethard

Stephen said:
Death to inappropriate usage of lambda.
First of all, that lambda is buggy, it doesn't work for negative
numbers, but you can't immediately see that because of the compressed
nature of the code.
[snip how to write better code without lambdas]

Your discussion here was nicely illustrative. You might consider adding
something to the "Overuse of Lambda" discussion in:

http://www.python.org/moin/DubiousPython

Steve
 
B

Bengt Richter

neutrino said:
Greetings to the Python gurus,

I have a binary file and wish to see the "raw" content of it. So I open
it in binary mode, and read one byte at a time to a variable, which
will be of the string type. Now the problem is how to print the binary
format of that charater to the standard output. It seems a common task
but I just cannot find the appropriate method from the documentation.
Thanks a lot.

FWIW, I work with serial data a lot, and I find the following list comprehension
to be a handy output tool for binary data:

print " ".join(["%0.2X" % ord(c) for c in data])

The space between each byte helps keep things from degenerating into a
meaningless mass of numbers, and using 2-digit hex instead of binary works
towards the same purpose. (I actually currently use the hex() builtin, but the
above just occurred to me, and it will give nicer formatting, and avoids the
C-style "0x" prefixing each byte)

Here's an interesting twiddle, though (there's probably already something along
these lines in the cookbook):
Looks like you also played with this problem, after Alex posted a request for alternative
one-liner solutions to a question on an Italian newsgroup last October? ("show_base" reminded me
of "number_in_base")

http://groups-beta.google.com/groups?hl=en&lr=&q=number_in_base&qt_s=Search+Groups

BTW, my final version was (which can be put on one line ;-)

def number_in_base(x, N=10, digits='0123456789ABCDEF'):
return '-'[:x<0]+''.join([digits[r] for q in [abs(x)]
for q,r in iter(lambda:divmod(q, N), (0,0))][::-1]) or digits[0]

(it also takes care of sign and lets you encode with alternative digits if you like ;-)

Py> def show_base(val, base, min_length = 1):
... chars = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
... if base < 2: raise ValueError("2 is minimum meaningful base")
... if base > len(chars): raise ValueError("Not enough characters for base")
... new_val = []
... while val:
... val, remainder = divmod(val, base)
... new_val.append(chars[remainder])
... result = "".join(reversed(new_val))
... return ("0" * (min_length - len(result))) + result
Hm, learn something every day ;-)
It didn't occur to me that a string multiplied by a negative number would default
nicely to the same result as multiplying by zero.
...
Py> show_base(10, 2)
'1010'
Py> show_base(10, 2, 8)
'00001010'
Py> show_base(10, 16, 2)
'0A'
Py> show_base(254, 16, 2)
'FE'
Py> show_base(0, 16)
'0'
Py> for base in range(2, 36):
... for testval in range(1000):
... assert testval == int(show_base(testval, base), base)
...
Py>
I guess that's a good idea (supplying full set of alphanumeric digits)
... return '-'[:x<0]+''.join([digits[r] for q in [abs(x)]
... for q,r in iter(lambda:divmod(q, N), (0,0))][::-1]) or digits[0]
... ... for testval in range(1000):
... assert testval == int(number_in_base(testval, base), base)
...
For that matter, might as well go for Z base and negatives too:
... for testval in range(-500, 500):
... assert testval == int(number_in_base(testval, base), base)
...(assert did not complain)
... for testval in (-base+2, -base+1, -base, base-2, base-1,base):
... print '%4s:%-4s'%(testval, number_in_base(testval, base)),
... print
...
0:0 -1:-1 -2:-10 0:0 1:1 2:10
-1:-1 -2:-2 -3:-10 1:1 2:2 3:10
-2:-2 -3:-3 -4:-10 2:2 3:3 4:10
-3:-3 -4:-4 -5:-10 3:3 4:4 5:10
-4:-4 -5:-5 -6:-10 4:4 5:5 6:10
-5:-5 -6:-6 -7:-10 5:5 6:6 7:10
-6:-6 -7:-7 -8:-10 6:6 7:7 8:10
-7:-7 -8:-8 -9:-10 7:7 8:8 9:10
-8:-8 -9:-9 -10:-10 8:8 9:9 10:10
-9:-9 -10:-A -11:-10 9:9 10:A 11:10
-10:-A -11:-B -12:-10 10:A 11:B 12:10
-11:-B -12:-C -13:-10 11:B 12:C 13:10
-12:-C -13:-D -14:-10 12:C 13:D 14:10
-13:-D -14:-E -15:-10 13:D 14:E 15:10
-14:-E -15:-F -16:-10 14:E 15:F 16:10
-15:-F -16:-G -17:-10 15:F 16:G 17:10
-16:-G -17:-H -18:-10 16:G 17:H 18:10
-17:-H -18:-I -19:-10 17:H 18:I 19:10
-18:-I -19:-J -20:-10 18:I 19:J 20:10
-19:-J -20:-K -21:-10 19:J 20:K 21:10
-20:-K -21:-L -22:-10 20:K 21:L 22:10
-21:-L -22:-M -23:-10 21:L 22:M 23:10
-22:-M -23:-N -24:-10 22:M 23:N 24:10
-23:-N -24:-O -25:-10 23:N 24:O 25:10
-24:-O -25:p -26:-10 24:O 25:p 26:10
-25:p -26:-Q -27:-10 25:p 26:Q 27:10
-26:-Q -27:-R -28:-10 26:Q 27:R 28:10
-27:-R -28:-S -29:-10 27:R 28:S 29:10
-28:-S -29:-T -30:-10 28:S 29:T 30:10
-29:-T -30:-U -31:-10 29:T 30:U 31:10
-30:-U -31:-V -32:-10 30:U 31:V 32:10
-31:-V -32:-W -33:-10 31:V 32:W 33:10
-32:-W -33:-X -34:-10 32:W 33:X 34:10
-33:-X -34:-Y -35:-10 33:X 34:Y 35:10
-34:-Y -35:-Z -36:-10 34:Y 35:Z 36:10
Of course, this is the prefixed-sign and absolute value representation,
which is no good if you are using base 2, 8, or 16 to get an idea of
underlying bits in a negative two-s complement representation.
But that's another thread ;-)

Regards,
Bengt Richter
 
N

Nick Coghlan

Bengt said:
Looks like you also played with this problem, after Alex posted a request for alternative
one-liner solutions to a question on an Italian newsgroup last October? ("show_base" reminded me
of "number_in_base")

http://groups-beta.google.com/groups?hl=en&lr=&q=number_in_base&qt_s=Search+Groups

See, I knew I wouldn't be the first one to think of it :)

I stole some ideas from that thread to add to the new version down below (I did
not, however, try to make the function expressible as a one-liner, since, after
the function has been defined, *using* it is a one-liner!)
Hm, learn something every day ;-)
It didn't occur to me that a string multiplied by a negative number would default
nicely to the same result as multiplying by zero.

Where'd I learn that trick?. . . oh, that's right, Facundo used it when working
out the string representation for Decimal. It certainly makes padding to a
desired minimum field width pretty easy.
Of course, this is the prefixed-sign and absolute value representation,
which is no good if you are using base 2, 8, or 16 to get an idea of
underlying bits in a negative two-s complement representation.

Dealing with negative numbers isn't really needed for printing a string as
binary , since ord() returns only positive results.

However, we should be able to add complement formatting fairly easily:

Py> def show_base(val, base, min_digits=1, complement=False,
.... digits="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
.... if base > len(digits): raise ValueError("Not enough digits for base")
.... negative = val < 0
.... val = abs(val)
.... if complement:
.... sign = ""
.... max = base**min_digits
.... if (val >= max) or (not negative and val == max):
.... raise ValueError("Value out of range for complemented format")
.... if negative:
.... val = (max - val)
.... else:
.... sign = "-" * negative
.... val_digits = []
.... while val:
.... val, digit = divmod(val, base)
.... val_digits.append(digits[digit])
.... result = "".join(reversed(val_digits))
.... return sign + ("0" * (min_digits - len(result))) + result
....
Py> show_base(10, 2)
'1010'
Py> show_base(-10, 2)
'-1010'
Py> show_base(10, 2, 8)
'00001010'
Py> show_base(-10, 2, 8)
'-00001010'
Py> show_base(10, 2, 8, complement=True)
'00001010'
Py> show_base(-10, 2, 8, complement=True)
'11110110'
Py> show_base(10, 16, 2, complement=True)
'0A'
Py> show_base(-10, 16, 2, complement=True)
'F6'
Py> show_base(127, 16, 2, complement=True)
'7F'
Py> show_base(-127, 16, 2, complement=True)
'81'
Py> show_base(255, 16, 2, complement=True)
'FF'
Py> show_base(-255, 16, 2, complement=True)
'01'
Py> show_base(256, 16, 2, complement=True)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "<stdin>", line 10, in show_base
ValueError: Value out of range for complemented format
Py> show_base(-256, 16, 2, complement=True)
'00'
Py>

Cheers,
Nick.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,888
Messages
2,569,964
Members
46,293
Latest member
BonnieHamb

Latest Threads

Top