B
blatt447477
Hi all,
I developed a script, which, IMHO, is more useful than the well
known bash "hexdump".
Unfortunately i can't arrange very easily the utf-8 encoding,
so in my output there is a loss of synchronization between the
the literal and the hex part...
The script is not very long but is written not very well (no functions,
no classes...) but I didn't succeed in formulating my doubts in
a more concise way... so here you can find it!
# -*- coding: utf-8 -*-
# px.py # python 2.6.6
nLenN=3 # n. of digits for lines
# hex conversion on 2 lines (except spaces)
# various run options: std : python px.py file
# bash cat : cat file | python px.py (alias hex)
# bash echo: echo line | python px.py " "
# works on any n. of bytes for utf-8
import os, sys
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
try:
sFN=sys.argv[1]
f=open(sFN)
lF=f.readlines()
f.close()
except:
sHD=sys.stdin.read().replace('\n','~\n')
lF=sHD.split('\n')
for n in xrange(len(lF)):
lF[n]=lF[n].replace('~','\n')
#################################################################
lP=[]
for n in xrange(len(lF)):
lP.append(str(n+1).zfill(nLenN)+' '+lF[n])
lNoSpaces=lF[n].replace(' ','~!').split('!')
sHexH=sHexL=' ' * nLenN +' '
for k in xrange(len(lNoSpaces)):
sHex=lNoSpaces[k].encode('hex')
sHexNT=sHex.replace('7e','')
sH=''
for c in xrange(0,len(sHexNT),2):
sH += sHexNT[c]
sHexH += sH+' '
sL=''
for c in xrange(1,len(sHexNT),2):
sL += sHexNT[c]
sHexL += sL+' '
lP.append(sHexH+'\n')
lP.append(sHexL+'\n\n') # to jump a line
# the insertion of one or more spaces after the unicode characters must be
# done manually on the output (lP)
print ''.join(lP)
#--------------------------------------------------------------
print '---------------------\n'
for n in xrange(0,len(lP),3):
try:
lP[n].encode('utf-8')
except:
print lP[n], # to be modified by hand in presence of utf-8 char
print lP[n+1], # to syncronize ascii and hex
print lP[n+2],
As you see, it is a hex conversion on 2 lines (except spaces), which
has various run options: std : python px.py file
bash cat : cat file | python px.py (alias hex)
bash echo: echo line | python px.py " "
Besides that, it can work (if I solve my problems) on any n. of bytes
for utf-8.
As an example of such problems, you can compare the output in presence of
utf-8 chars...
004 # qwerty: not unicode but ascii
2 7767773 667 7666666 677 676660
3 175249a ef4 5e93f45 254 13399a
005 # qwerty: non è unicode bensì ascii
2 7767773 666 ca 7666666 6667ca 676660
3 175249a efe 38 5e93f45 25e33c 13399a
Thanks in advance for any help!
Blatt
I developed a script, which, IMHO, is more useful than the well
known bash "hexdump".
Unfortunately i can't arrange very easily the utf-8 encoding,
so in my output there is a loss of synchronization between the
the literal and the hex part...
The script is not very long but is written not very well (no functions,
no classes...) but I didn't succeed in formulating my doubts in
a more concise way... so here you can find it!
# -*- coding: utf-8 -*-
# px.py # python 2.6.6
nLenN=3 # n. of digits for lines
# hex conversion on 2 lines (except spaces)
# various run options: std : python px.py file
# bash cat : cat file | python px.py (alias hex)
# bash echo: echo line | python px.py " "
# works on any n. of bytes for utf-8
import os, sys
import signal
signal.signal(signal.SIGPIPE, signal.SIG_DFL)
try:
sFN=sys.argv[1]
f=open(sFN)
lF=f.readlines()
f.close()
except:
sHD=sys.stdin.read().replace('\n','~\n')
lF=sHD.split('\n')
for n in xrange(len(lF)):
lF[n]=lF[n].replace('~','\n')
#################################################################
lP=[]
for n in xrange(len(lF)):
lP.append(str(n+1).zfill(nLenN)+' '+lF[n])
lNoSpaces=lF[n].replace(' ','~!').split('!')
sHexH=sHexL=' ' * nLenN +' '
for k in xrange(len(lNoSpaces)):
sHex=lNoSpaces[k].encode('hex')
sHexNT=sHex.replace('7e','')
sH=''
for c in xrange(0,len(sHexNT),2):
sH += sHexNT[c]
sHexH += sH+' '
sL=''
for c in xrange(1,len(sHexNT),2):
sL += sHexNT[c]
sHexL += sL+' '
lP.append(sHexH+'\n')
lP.append(sHexL+'\n\n') # to jump a line
# the insertion of one or more spaces after the unicode characters must be
# done manually on the output (lP)
print ''.join(lP)
#--------------------------------------------------------------
print '---------------------\n'
for n in xrange(0,len(lP),3):
try:
lP[n].encode('utf-8')
except:
print lP[n], # to be modified by hand in presence of utf-8 char
print lP[n+1], # to syncronize ascii and hex
print lP[n+2],
As you see, it is a hex conversion on 2 lines (except spaces), which
has various run options: std : python px.py file
bash cat : cat file | python px.py (alias hex)
bash echo: echo line | python px.py " "
Besides that, it can work (if I solve my problems) on any n. of bytes
for utf-8.
As an example of such problems, you can compare the output in presence of
utf-8 chars...
004 # qwerty: not unicode but ascii
2 7767773 667 7666666 677 676660
3 175249a ef4 5e93f45 254 13399a
005 # qwerty: non è unicode bensì ascii
2 7767773 666 ca 7666666 6667ca 676660
3 175249a efe 38 5e93f45 25e33c 13399a
Thanks in advance for any help!
Blatt