Convert raw binary file to ascii

r2 · Jul 27, 2009

I have a memory dump from a machine I am trying to analyze. I can view
the file in a hex editor to see text strings in the binary code. I
don't see a way to save these ascii representations of the binary, so
I went digging into Python to see if there were any modules to help.

I found one I think might do what I want it to do - the binascii
module. Can anyone describe to me how to convert a raw binary file to
an ascii file using this module. I've tried? Boy, I've tried.

Am I correct in assuming I can get the converted binary to ascii text
I see in a hex editor using this module? I'm new to this forensics
thing and it's quite possible I am mixing technical terms. I am not
new to Python, however. Thanks for your help.

Peter Otten · Jul 27, 2009

r2 said:
I have a memory dump from a machine I am trying to analyze. I can view
the file in a hex editor to see text strings in the binary code. I
don't see a way to save these ascii representations of the binary, so
I went digging into Python to see if there were any modules to help.

I found one I think might do what I want it to do - the binascii
module. Can anyone describe to me how to convert a raw binary file to
an ascii file using this module. I've tried? Boy, I've tried.

That won't work because a text editor doesn't need any help to convert the
bytes into characters. If it expects ascii it just will be puzzled by bytes
that are not valid ascii. Also, it will happily display byte sequences that
are valid ascii, but that you as a user will see as gibberish because they
were meant to be binary data by the program that wrote them.

Am I correct in assuming I can get the converted binary to ascii text
I see in a hex editor using this module? I'm new to this forensics
thing and it's quite possible I am mixing technical terms. I am not
new to Python, however. Thanks for your help.

Unix has the "strings" commandline tool to extract text from a binary.
Get hold of a copy of the MinGW tools if you are on windows.

Peter

r2 · Jul 27, 2009

That won't work because a text editor doesn't need any help to convert the
bytes into characters. If it expects ascii it just will be puzzled by bytes
that are not valid ascii. Also, it will happily display byte sequences that
are valid ascii, but that you as a user will see as gibberish because they
were meant to be binary data by the program that wrote them.

Unix has the "strings" commandline tool to extract text from a binary.
Get hold of a copy of the MinGW tools if you are on windows.

Peter

Okay. Thanks for the guidance. I have a machine with Linux, so I
should be able to do what you describe above. Could Python extract the
strings from the binary as well? Just wondering.

r2 · Jul 27, 2009

$ strings memdump.binary >memdump.strings

$ hexdump -C memdump.binary >memdump.hex+ascii

Grant,

Thanks for the commands!

Peter Otten · Jul 27, 2009

r2 said:
Okay. Thanks for the guidance. I have a machine with Linux, so I
should be able to do what you describe above. Could Python extract the
strings from the binary as well? Just wondering.

As a special service for you here is a naive implementation to build upon:

#!/usr/bin/env python
import sys

wanted_chars = ["\0"]*256
for i in range(32, 127):
wanted_chars = chr(i)
wanted_chars[ord("\t")] = "\t"
wanted_chars = "".join(wanted_chars)

THRESHOLD = 4

for s in sys.stdin.read().translate(wanted_chars).split("\0"):
if len(s) >= THRESHOLD:
print s

Peter

r2 · Jul 27, 2009

Okay. Thanks for the guidance. I have a machine with Linux, so I
should be able to do what you describe above. Could Python extract the
strings from the binary as well? Just wondering.

Click to expand...

As a special service for you here is a naive implementation to build upon:

#!/usr/bin/env python
import sys

wanted_chars = ["\0"]*256
for i in range(32, 127):
wanted_chars = chr(i)
wanted_chars[ord("\t")] = "\t"
wanted_chars = "".join(wanted_chars)

THRESHOLD = 4

for s in sys.stdin.read().translate(wanted_chars).split("\0"):
if len(s) >= THRESHOLD:
print s

Peter- Hide quoted text -

- Show quoted text -

Perfect! Thanks.

Dave Angel · Jul 27, 2009

r2 said:
Okay. Thanks for the guidance. I have a machine with Linux, so I
should be able to do what you describe above. Could Python extract the
strings from the binary as well? Just wondering.

Yes, you could do the same thing in Python easily enough. And with the
advantage that you could define your own meanings for "characters."

The memory dump could be storing characters that are strictly ASCII. Or
it could have EBCDIC, or UTF-8. And it could be Unicode, 16 bit or 32
bits, and big-endian or little-endian. Or the characters could be in
some other format specific to a particular program.

However, it's probably very useful to see what a "strings" program might
look like, because you can quickly code variations on it, to suit your
particular data.
Something like the following (totally untested)

def isprintable(char):
return 0x20 <= char <= 0x7f

def string(filename):
data = open(filename, "rb").read()
count = 0
line = ""
for ch in data:
if isprintable(ch):
count += 1
line = line + ch
else:
if count > 4 : #cutoff, don't print strings smaller
than this because they're probably just coincidence
print line
count = 0
line= ""
print line

Now you can change the definition of what's "printable", you can change
the min-length that you care about. And of course you can fine-tune
things like max-length lines and such.

DaveA

Jan Kaliszewski · Jul 27, 2009

Hello Friends,

It's my first post to python-list, so first let me introduce myself...
* my name is Jan Kaliszewski,
* country -- Poland,
* occupation -- composer (studied in F. Chopin Academy of Music @Warsaw)
and programmer (currently in Record System company,
working on Anakonda -- ERP system for
big companies [developed in Python + WX
+ Postgres]).

Now, to the matter...

27-07-2009 Grant Edwards said:
$ strings memdump.binary >memdump.strings

$ hexdump -C memdump.binary >memdump.hex+as

Do You (r2) want to do get ASCII substrings (i.e. extract only those
pieces of file that consist of ASCII codes -- i.e. 7-bit values -- i.e in
range 0...127), or rather "possibly readable ascii representation" of
the whole file, with printable ascii characters preserved 'as is' and
not-printable/non-ascii characters being replaced with their codes
(e.g. with '\x...' notation).

If the latter, you probably want something like this:

import codecs
with open('memdump.binary', 'rb') as source:
with open('memdump.txt', 'w') as target:
for quasiline in codecs.iterencode(source, 'string_escape'):
target.write(quasiline)

simple Binary File to ASCII	0	Nov 19, 2010
Ascii to binary conversion	6	Aug 9, 2008
Uploading images - binary or unsupported text encoding	2	Dec 24, 2022
How to convert CSV to parquet file without RLE_DICTIONARY encoding?	0	Sep 2, 2022
Batch Convert HTML to UTF-8 Files	2	Oct 2, 2023
HEX to ASCII	10	Oct 6, 2013
translating ascii to binary	5	Sep 17, 2008
trying to strip out non ascii.. or rather convert non ascii	38	Oct 26, 2013

Convert raw binary file to ascii

r2

Peter Otten

r2

r2

Peter Otten

r2

Dave Angel

Jan Kaliszewski

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads