Newbie lost

A

Angelo Secchi

I'm fighting with a binary file and I am definitely lost.
I know that each line of the file has a first part that is a string with
length 113 and then that there is a group of identical fields. I do not
know the precise format of these fields even if I know that the file was
created on an IBM Mainframe and that in the binary part there should be
223 fields with the same width 4.
Just to give you an idea if I read the first line of my file as a
string I obtain something like (just a small part of the first line):

13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3\x9
a\x00


Still I am not able to convert this binary. Can anybody give some
advices?

Thanks
Angelo



--
========================================================
Angelo Secchi PGP Key ID:EA280337
========================================================
Current Position:
Graduate Fellow Scuola Superiore S.Anna
Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
ph.: +39 050 883365
email: (e-mail address removed) www.sssup.it/~secchi/
========================================================
 
A

Anton Vredegoor

Angelo Secchi said:
I'm fighting with a binary file and I am definitely lost.
I know that each line of the file has a first part that is a string with
length 113 and then that there is a group of identical fields. I do not
know the precise format of these fields even if I know that the file was
created on an IBM Mainframe and that in the binary part there should be
223 fields with the same width 4.
Just to give you an idea if I read the first line of my file as a
string I obtain something like (just a small part of the first line):

13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3\x9
a\x00


Still I am not able to convert this binary. Can anybody give some
advices?

The string above contains escape sequences, so sometimes four
characters correspond to one byte, sometimes a char is just a byte.
This is not really present in the file but just an artifact of the way
you chose to print it. In order to gain more insight:

#open the file in binary mode e.g:
inf = file('somefile','rb')

#read 1 line e.g:
line = inf.readline()

#turn this line into a list of characters:
L = list(line)

#Inspect the list L and come back here with further questions,
#if you have any :)
print L

Anton
 
A

Angelo Secchi

I checked with a statistical software(SAS) and I was able to convert my
file using as a format something called s370frb4. that according to its
manual should correspond to a float in C and to a REAL*4 in fortran. Now
I know that the first 3 numbers in the binary part should be exactly:


15612852 0 0

I also notice using struct module that
'\xb4;\xee\x00'
'\x00\x00\x00\x00'


not very different (just the order and the F around) from the beginning
of the binary part of my file

13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3\x9
a\x00


Can these infos help anybody to help me?
Thanks again hoping to be able to throe away any proprietary sofware...
Angelo




I'm fighting with a binary file and I am definitely lost.
I know that each line of the file has a first part that is a string
with length 113 and then that there is a group of identical fields. I
do not know the precise format of these fields even if I know that
the file was created on an IBM Mainframe and that in the binary part
there should be 223 fields with the same width 4.
Just to give you an idea if I read the first line of my file as a
string I obtain something like (just a small part of the first line):

13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3
\x9 a\x00


Still I am not able to convert this binary. Can anybody give some
advices?

The string above contains escape sequences, so sometimes four
characters correspond to one byte, sometimes a char is just a byte.
This is not really present in the file but just an artifact of the way
you chose to print it. In order to gain more insight:

#open the file in binary mode e.g:
inf = file('somefile','rb')

#read 1 line e.g:
line = inf.readline()

#turn this line into a list of characters:
L = list(line)

#Inspect the list L and come back here with further questions,
#if you have any :)
print L

Anton
[/QUOTE]


--
========================================================
Angelo Secchi PGP Key ID:EA280337
========================================================
Current Position:
Graduate Fellow Scuola Superiore S.Anna
Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
ph.: +39 050 883365
email: (e-mail address removed) www.sssup.it/~secchi/
========================================================
 
J

John Roth

If you could read it with SAS, most likely the floats are in
IBM's proprietary format, not in standard IEEE-488 (or whatever)
format. (IBM, by the way, was there first...) I'm not certain whether
Python can convert them. You might have to do some bit twiddling,
which is going to be awfully slow.

For the rest of it, I'd like to see a *real* hex dump in mainframe
format. From what you've given us so far I'm not certain whether
the struct module can convert the data for you.

John Roth



Angelo Secchi said:
I checked with a statistical software(SAS) and I was able to convert my
file using as a format something called s370frb4. that according to its
manual should correspond to a float in C and to a REAL*4 in fortran. Now
I know that the first 3 numbers in the binary part should be exactly:


15612852 0 0

I also notice using struct module that
'\xb4;\xee\x00'
'\x00\x00\x00\x00'


not very different (just the order and the F around) from the beginning
of the binary part of my file

13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3\x9
a\x00


Can these infos help anybody to help me?
Thanks again hoping to be able to throe away any proprietary sofware...
Angelo




I'm fighting with a binary file and I am definitely lost.
I know that each line of the file has a first part that is a string
with length 113 and then that there is a group of identical fields. I
do not know the precise format of these fields even if I know that
the file was created on an IBM Mainframe and that in the binary part
there should be 223 fields with the same width 4.
Just to give you an idea if I read the first line of my file as a
string I obtain something like (just a small part of the first line):

13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3
\x9 a\x00


Still I am not able to convert this binary. Can anybody give some
advices?

The string above contains escape sequences, so sometimes four
characters correspond to one byte, sometimes a char is just a byte.
This is not really present in the file but just an artifact of the way
you chose to print it. In order to gain more insight:

#open the file in binary mode e.g:
inf = file('somefile','rb')

#read 1 line e.g:
line = inf.readline()

#turn this line into a list of characters:
L = list(line)

#Inspect the list L and come back here with further questions,
#if you have any :)
print L

Anton


--
========================================================
Angelo Secchi PGP Key ID:EA280337
========================================================
Current Position:
Graduate Fellow Scuola Superiore S.Anna
Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
ph.: +39 050 883365
email: (e-mail address removed) www.sssup.it/~secchi/
========================================================
[/QUOTE]
 
A

Anton Vredegoor

John Roth said:
For the rest of it, I'd like to see a *real* hex dump in mainframe
format. From what you've given us so far I'm not certain whether
the struct module can convert the data for you.

Probably what John wants to see is the output of something like this:

#one 'line' of data (since it's a binary file there is no real line
#ending convention: a line is just a specific number of bytes long,
#it's important to find out how many exactly)

from binascii import hexlify
inf = file('somefile','rb')
data = inf.read(1005) #a 113 bytes string + 232 4 bytes floats
L = map(hexlify,data)
print L

Anton
 
J

John Roth

Anton Vredegoor said:
Probably what John wants to see is the output of something like this:

#one 'line' of data (since it's a binary file there is no real line
#ending convention: a line is just a specific number of bytes long,
#it's important to find out how many exactly)

from binascii import hexlify
inf = file('somefile','rb')
data = inf.read(1005) #a 113 bytes string + 232 4 bytes floats
L = map(hexlify,data)
print L

Anton

Thank you! I wasn't aware of that module. It looks like it
should do exactly what's needed.

Although this would probably do instead of the last 2 lines
(and excuse the fact that it's real ugly code, as well as
untested)

for i in range(4):
index = i * 32
print hexlify(data[index: index+32])
for i in range(232):
index = i * 4 + 113
print hexlify(data[index: index + 4])

John Roth
 
A

Angelo Secchi

John and Anthon thanks for your help.
If I did it correctly this is the outcome of the code you asked me to
try:
from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970) # 1970 is the exact length of the line
for i in range(4):
index = i * 32
print hexlify(data[index: index+32])

3131313131313131312030313032323230313033343130323238353332303031
3220202020203032323835333230303132303030303031303030303030314338
3930303030303120203939333531302030313130303030312020313220313335
313030313032323230313033343133343146ee3bb40000000000000000465de3



from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970)
for i in range(223):
index = i * 4 + 113
print hexlify(data[index: index + 4])


46ee3bb4 (I know that this should be 15612852)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
465de39a (I know that this should be 6153114)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
....


John just to understand what you said, if the file is in IBM propietary
binary format (EBCDIC ?) I cannot convert it in ASCII using Python?

Thanks again
Angelo







Anton Vredegoor said:
Probably what John wants to see is the output of something like
this:

#one 'line' of data (since it's a binary file there is no real line
#ending convention: a line is just a specific number of bytes long,
#it's important to find out how many exactly)

from binascii import hexlify
inf = file('somefile','rb')
data = inf.read(1005) #a 113 bytes string + 232 4 bytes floats
L = map(hexlify,data)
print L

Anton

Thank you! I wasn't aware of that module. It looks like it
should do exactly what's needed.

Although this would probably do instead of the last 2 lines
(and excuse the fact that it's real ugly code, as well as
untested)

for i in range(4):
index = i * 32
print hexlify(data[index: index+32])
for i in range(232):
index = i * 4 + 113
print hexlify(data[index: index + 4])

John Roth


--
========================================================
Angelo Secchi PGP Key ID:EA280337
========================================================
Current Position:
Graduate Fellow Scuola Superiore S.Anna
Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
ph.: +39 050 883365
email: (e-mail address removed) www.sssup.it/~secchi/
========================================================
 
J

John Roth

Angelo Secchi said:
John and Anthon thanks for your help.
If I did it correctly this is the outcome of the code you asked me to
try:
from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970) # 1970 is the exact length of the line
for i in range(4):
index = i * 32
print hexlify(data[index: index+32])

3131313131313131312030313032323230313033343130323238353332303031
3220202020203032323835333230303132303030303031303030303030314338
3930303030303120203939333531302030313130303030312020313220313335
313030313032323230313033343133343146ee3bb40000000000000000465de3



from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970)
for i in range(223):
index = i * 4 + 113
print hexlify(data[index: index + 4])


46ee3bb4 (I know that this should be 15612852)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
465de39a (I know that this should be 6153114)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
...


John just to understand what you said, if the file is in IBM propietary
binary format (EBCDIC ?) I cannot convert it in ASCII using Python?

The character data seems to be in ASCII, so it won't be any
problem.

I'm going to have to go onto
the IBM site later today to check the floating point formats; someone who
knows IEEE-488 should be able to tell you whether or not those match
standard formats, though.

John Roth
Thanks again
Angelo







Anton Vredegoor said:
For the rest of it, I'd like to see a *real* hex dump in mainframe
format. From what you've given us so far I'm not certain whether
the struct module can convert the data for you.

Probably what John wants to see is the output of something like
this:

#one 'line' of data (since it's a binary file there is no real line
#ending convention: a line is just a specific number of bytes long,
#it's important to find out how many exactly)

from binascii import hexlify
inf = file('somefile','rb')
data = inf.read(1005) #a 113 bytes string + 232 4 bytes floats
L = map(hexlify,data)
print L

Anton

Thank you! I wasn't aware of that module. It looks like it
should do exactly what's needed.

Although this would probably do instead of the last 2 lines
(and excuse the fact that it's real ugly code, as well as
untested)

for i in range(4):
index = i * 32
print hexlify(data[index: index+32])
for i in range(232):
index = i * 4 + 113
print hexlify(data[index: index + 4])

John Roth


--
========================================================
Angelo Secchi PGP Key ID:EA280337
========================================================
Current Position:
Graduate Fellow Scuola Superiore S.Anna
Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
ph.: +39 050 883365
email: (e-mail address removed) www.sssup.it/~secchi/
========================================================
 
B

Bob Ippolito

John just to understand what you said, if the file is in IBM propietary
binary format (EBCDIC ?) I cannot convert it in ASCII using Python?

The question is whether the existing infrastructure (probably the
struct module) can do it in one or two lines quickly, or if more
elaborate code would have to be written.

Of course, you do need to know how the data was encoded in order to
reliably decode it at all..

-bob
 
J

John Roth

Angelo Secchi said:
John and Anthon thanks for your help.
If I did it correctly this is the outcome of the code you asked me to
try:

from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970)
for i in range(223):
index = i * 4 + 113
print hexlify(data[index: index + 4])


46ee3bb4 (I know that this should be 15612852)

Using the following little program:

-----------------------
import binascii
import struct

str1 = "46ee3bb4"
str2 = binascii.unhexlify(str1)
float = struct.unpack(">f", str2)
print float
float2 = struct.unpack("f", str2)
print float2
-----------------------------

I get:

-----------------------------
(30493.8515625,)
(-1.7502415516901237e-007,)
-----------------------------

So it's clearly *not* standard IEEE-488
format, and it's going to require some bit
twiddling to convert.

The last 3 bytes (in big-endian format)
are the fraction. I believe the first bit is
the fraction sign, and the next 7 bits are
the signed exponent in 4-bit chunks.
In other words, the exponent of your
normalized example is -6. The shift
quantity is in hex digits, not bits!

I've verified that this is the actual format
using the Windows Calc applet - handy
sucker when you want to do hex to decimal
conversions.

John Roth

00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
465de39a (I know that this should be 6153114)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
...


John just to understand what you said, if the file is in IBM propietary
binary format (EBCDIC ?) I cannot convert it in ASCII using Python?

Thanks again
Angelo







Anton Vredegoor said:
For the rest of it, I'd like to see a *real* hex dump in mainframe
format. From what you've given us so far I'm not certain whether
the struct module can convert the data for you.

Probably what John wants to see is the output of something like
this:

#one 'line' of data (since it's a binary file there is no real line
#ending convention: a line is just a specific number of bytes long,
#it's important to find out how many exactly)

from binascii import hexlify
inf = file('somefile','rb')
data = inf.read(1005) #a 113 bytes string + 232 4 bytes floats
L = map(hexlify,data)
print L

Anton

Thank you! I wasn't aware of that module. It looks like it
should do exactly what's needed.

Although this would probably do instead of the last 2 lines
(and excuse the fact that it's real ugly code, as well as
untested)

for i in range(4):
index = i * 32
print hexlify(data[index: index+32])
for i in range(232):
index = i * 4 + 113
print hexlify(data[index: index + 4])

John Roth


--
========================================================
Angelo Secchi PGP Key ID:EA280337
========================================================
Current Position:
Graduate Fellow Scuola Superiore S.Anna
Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
ph.: +39 050 883365
email: (e-mail address removed) www.sssup.it/~secchi/
========================================================
 
A

Anton Vredegoor

Angelo Secchi said:
from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970)
for i in range(223):
index = i * 4 + 113
print hexlify(data[index: index + 4])


46ee3bb4 (I know that this should be 15612852)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
465de39a (I know that this should be 6153114)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)

Based on this message:

http://groups.google.com/groups?&[email protected]

I wrote a function that maybe could do it (use at own risk). It
converts a 4-byte real from "IBM System/370 Floating Point Format" to
a Python float. In your code above replace one line:
print hexlify(data[index: index + 4])
with:
print ibm370tofloat(data[index: index + 4])


The conversion function and a test function:

import struct

def ibm370tofloat(fourbytes):
i = struct.unpack('>I',fourbytes)[0]
sign = [1,-1][bool(i & 0x100000000L)]
characteristic = ((i >> 24) & 0x7f) - 64
fraction = (i & 0xffffff)/float(0x1000000L)
return sign*16**characteristic*fraction

def test():
import binascii
pi = "413243f7"
s1 = "46ee3bb4"
s2 = "465de39a"
for x in [pi,s1,s2]:
y = binascii.unhexlify(x)
print ibm370tofloat(y)

if __name__=='__main__':
test()

output:

3.14159297943
15612852.0
6153114.0

I hope this helps.

Anton
 
H

Howard Lightstone

John and Anthon thanks for your help.
If I did it correctly this is the outcome of the code you asked me to
try:
from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970) # 1970 is the exact length of the line
for i in range(4):
index = i * 32
print hexlify(data[index: index+32])

3131313131313131312030313032323230313033343130323238353332303031
3220202020203032323835333230303132303030303031303030303030314338
3930303030303120203939333531302030313130303030312020313220313335
313030313032323230313033343133343146ee3bb40000000000000000465de3



from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970)
for i in range(223):
index = i * 4 + 113
print hexlify(data[index: index + 4])


46ee3bb4 (I know that this should be 15612852)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
465de39a (I know that this should be 6153114)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
...


I seem to recall that the format is
sCCCCCCC MMMMMMMM MMMMMMMM MMMMMMMM

The CCCCCCC exponent is biased by 64 and is HEX (16**(CCCCCCC-64)).

I recall significance issues on Gould machines because the normalization
was only to the nibble so there might be 2 or 3 0-bits in the mantissa.

41100000 was 1.0

My sample code:

import struct

dividend=float(16**6)

def fromhex370(invalue):
"convert 4 char binary string 370-formatted floating point float value"
if invalue == 0:
return 0.0
istic,a,b,c=struct.unpack('>BBBB',invalue)
if istic >= 128:
sign= -1.0
istic = istic - 128
else:
sign = 1.0
mant= float(a<<16) + float(b<<8) +float(c)
return sign* 16**(istic-64)*(mant/dividend)

Of course, no optimation was even thought of ......

I believe the long floating point format just added 4 more bytes of
significance to the end of the 4 bytes I mentioned above.

HTH
 
A

Angelo Secchi

Anton and Howard,
thank you very much (both your sample code do the job) I was really lost
into bit manipulation without any real hope of success!!

Angelo




from binascii import hexlify
inf = file('foo','rb')
data = inf.read(1970)
for i in range(223):
index = i * 4 + 113
print hexlify(data[index: index + 4])


46ee3bb4 (I know that this should be 15612852)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)
465de39a (I know that this should be 6153114)
00000000 (I know that this should be 0)
00000000 (I know that this should be 0)

Based on this message:

http://groups.google.com/groups?&[email protected]

I wrote a function that maybe could do it (use at own risk). It
converts a 4-byte real from "IBM System/370 Floating Point Format" to
a Python float. In your code above replace one line:
print hexlify(data[index: index + 4])
with:
print ibm370tofloat(data[index: index + 4])


The conversion function and a test function:

import struct

def ibm370tofloat(fourbytes):
i = struct.unpack('>I',fourbytes)[0]
sign = [1,-1][bool(i & 0x100000000L)]
characteristic = ((i >> 24) & 0x7f) - 64
fraction = (i & 0xffffff)/float(0x1000000L)
return sign*16**characteristic*fraction

def test():
import binascii
pi = "413243f7"
s1 = "46ee3bb4"
s2 = "465de39a"
for x in [pi,s1,s2]:
y = binascii.unhexlify(x)
print ibm370tofloat(y)

if __name__=='__main__':
test()

output:

3.14159297943
15612852.0
6153114.0

I hope this helps.

Anton
[/QUOTE]


--
========================================================
Angelo Secchi PGP Key ID:EA280337
========================================================
Current Position:
Graduate Fellow Scuola Superiore S.Anna
Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
ph.: +39 050 883365
email: (e-mail address removed) www.sssup.it/~secchi/
========================================================
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top