Newbie lost

Discussion in 'Python' started by Angelo Secchi, Feb 25, 2004.

  1. I'm fighting with a binary file and I am definitely lost.
    I know that each line of the file has a first part that is a string with
    length 113 and then that there is a group of identical fields. I do not
    know the precise format of these fields even if I know that the file was
    created on an IBM Mainframe and that in the binary part there should be
    223 fields with the same width 4.
    Just to give you an idea if I read the first line of my file as a
    string I obtain something like (just a small part of the first line):

    13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3\x9
    a\x00


    Still I am not able to convert this binary. Can anybody give some
    advices?

    Thanks
    Angelo



    --
    ========================================================
    Angelo Secchi PGP Key ID:EA280337
    ========================================================
    Current Position:
    Graduate Fellow Scuola Superiore S.Anna
    Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
    ph.: +39 050 883365
    email: www.sssup.it/~secchi/
    ========================================================
     
    Angelo Secchi, Feb 25, 2004
    #1
    1. Advertisements

  2. The string above contains escape sequences, so sometimes four
    characters correspond to one byte, sometimes a char is just a byte.
    This is not really present in the file but just an artifact of the way
    you chose to print it. In order to gain more insight:

    #open the file in binary mode e.g:
    inf = file('somefile','rb')

    #read 1 line e.g:
    line = inf.readline()

    #turn this line into a list of characters:
    L = list(line)

    #Inspect the list L and come back here with further questions,
    #if you have any :)
    print L

    Anton
     
    Anton Vredegoor, Feb 25, 2004
    #2
    1. Advertisements

  3. I checked with a statistical software(SAS) and I was able to convert my
    file using as a format something called s370frb4. that according to its
    manual should correspond to a float in C and to a REAL*4 in fortran. Now
    I know that the first 3 numbers in the binary part should be exactly:


    15612852 0 0

    I also notice using struct module that
    '\xb4;\xee\x00'
    '\x00\x00\x00\x00'


    not very different (just the order and the F around) from the beginning
    of the binary part of my file

    13510010222010341341F\xee;\xb4\x00\x00\x00\x00\x00\x00\x00\x00F]\xe3\x9
    a\x00


    Can these infos help anybody to help me?
    Thanks again hoping to be able to throe away any proprietary sofware...
    Angelo




    The string above contains escape sequences, so sometimes four
    characters correspond to one byte, sometimes a char is just a byte.
    This is not really present in the file but just an artifact of the way
    you chose to print it. In order to gain more insight:

    #open the file in binary mode e.g:
    inf = file('somefile','rb')

    #read 1 line e.g:
    line = inf.readline()

    #turn this line into a list of characters:
    L = list(line)

    #Inspect the list L and come back here with further questions,
    #if you have any :)
    print L

    Anton
    [/QUOTE]


    --
    ========================================================
    Angelo Secchi PGP Key ID:EA280337
    ========================================================
    Current Position:
    Graduate Fellow Scuola Superiore S.Anna
    Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
    ph.: +39 050 883365
    email: www.sssup.it/~secchi/
    ========================================================
     
    Angelo Secchi, Feb 25, 2004
    #3
  4. Angelo Secchi

    John Roth Guest

    If you could read it with SAS, most likely the floats are in
    IBM's proprietary format, not in standard IEEE-488 (or whatever)
    format. (IBM, by the way, was there first...) I'm not certain whether
    Python can convert them. You might have to do some bit twiddling,
    which is going to be awfully slow.

    For the rest of it, I'd like to see a *real* hex dump in mainframe
    format. From what you've given us so far I'm not certain whether
    the struct module can convert the data for you.

    John Roth




    --
    ========================================================
    Angelo Secchi PGP Key ID:EA280337
    ========================================================
    Current Position:
    Graduate Fellow Scuola Superiore S.Anna
    Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
    ph.: +39 050 883365
    email: www.sssup.it/~secchi/
    ========================================================
    [/QUOTE]
     
    John Roth, Feb 25, 2004
    #4
  5. Probably what John wants to see is the output of something like this:

    #one 'line' of data (since it's a binary file there is no real line
    #ending convention: a line is just a specific number of bytes long,
    #it's important to find out how many exactly)

    from binascii import hexlify
    inf = file('somefile','rb')
    data = inf.read(1005) #a 113 bytes string + 232 4 bytes floats
    L = map(hexlify,data)
    print L

    Anton
     
    Anton Vredegoor, Feb 26, 2004
    #5
  6. Angelo Secchi

    John Roth Guest

    Thank you! I wasn't aware of that module. It looks like it
    should do exactly what's needed.

    Although this would probably do instead of the last 2 lines
    (and excuse the fact that it's real ugly code, as well as
    untested)

    for i in range(4):
    index = i * 32
    print hexlify(data[index: index+32])
    for i in range(232):
    index = i * 4 + 113
    print hexlify(data[index: index + 4])

    John Roth
     
    John Roth, Feb 26, 2004
    #6
  7. John and Anthon thanks for your help.
    If I did it correctly this is the outcome of the code you asked me to
    try:

    46ee3bb4 (I know that this should be 15612852)
    00000000 (I know that this should be 0)
    00000000 (I know that this should be 0)
    465de39a (I know that this should be 6153114)
    00000000 (I know that this should be 0)
    00000000 (I know that this should be 0)
    ....


    John just to understand what you said, if the file is in IBM propietary
    binary format (EBCDIC ?) I cannot convert it in ASCII using Python?

    Thanks again
    Angelo








    --
    ========================================================
    Angelo Secchi PGP Key ID:EA280337
    ========================================================
    Current Position:
    Graduate Fellow Scuola Superiore S.Anna
    Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
    ph.: +39 050 883365
    email: www.sssup.it/~secchi/
    ========================================================
     
    Angelo Secchi, Feb 26, 2004
    #7
  8. Angelo Secchi

    John Roth Guest

    The character data seems to be in ASCII, so it won't be any
    problem.

    I'm going to have to go onto
    the IBM site later today to check the floating point formats; someone who
    knows IEEE-488 should be able to tell you whether or not those match
    standard formats, though.

    John Roth
     
    John Roth, Feb 26, 2004
    #8
  9. Angelo Secchi

    Bob Ippolito Guest

    The question is whether the existing infrastructure (probably the
    struct module) can do it in one or two lines quickly, or if more
    elaborate code would have to be written.

    Of course, you do need to know how the data was encoded in order to
    reliably decode it at all..

    -bob
     
    Bob Ippolito, Feb 26, 2004
    #9
  10. Angelo Secchi

    John Roth Guest

    Using the following little program:

    -----------------------
    import binascii
    import struct

    str1 = "46ee3bb4"
    str2 = binascii.unhexlify(str1)
    float = struct.unpack(">f", str2)
    print float
    float2 = struct.unpack("f", str2)
    print float2
    -----------------------------

    I get:

    -----------------------------
    (30493.8515625,)
    (-1.7502415516901237e-007,)
    -----------------------------

    So it's clearly *not* standard IEEE-488
    format, and it's going to require some bit
    twiddling to convert.

    The last 3 bytes (in big-endian format)
    are the fraction. I believe the first bit is
    the fraction sign, and the next 7 bits are
    the signed exponent in 4-bit chunks.
    In other words, the exponent of your
    normalized example is -6. The shift
    quantity is in hex digits, not bits!

    I've verified that this is the actual format
    using the Windows Calc applet - handy
    sucker when you want to do hex to decimal
    conversions.

    John Roth

     
    John Roth, Feb 26, 2004
    #10
  11. Based on this message:

    http://groups.google.com/groups?&selm=

    I wrote a function that maybe could do it (use at own risk). It
    converts a 4-byte real from "IBM System/370 Floating Point Format" to
    a Python float. In your code above replace one line:

    The conversion function and a test function:

    import struct

    def ibm370tofloat(fourbytes):
    i = struct.unpack('>I',fourbytes)[0]
    sign = [1,-1][bool(i & 0x100000000L)]
    characteristic = ((i >> 24) & 0x7f) - 64
    fraction = (i & 0xffffff)/float(0x1000000L)
    return sign*16**characteristic*fraction

    def test():
    import binascii
    pi = "413243f7"
    s1 = "46ee3bb4"
    s2 = "465de39a"
    for x in [pi,s1,s2]:
    y = binascii.unhexlify(x)
    print ibm370tofloat(y)

    if __name__=='__main__':
    test()

    output:

    3.14159297943
    15612852.0
    6153114.0

    I hope this helps.

    Anton
     
    Anton Vredegoor, Feb 28, 2004
    #11

  12. I seem to recall that the format is
    sCCCCCCC MMMMMMMM MMMMMMMM MMMMMMMM

    The CCCCCCC exponent is biased by 64 and is HEX (16**(CCCCCCC-64)).

    I recall significance issues on Gould machines because the normalization
    was only to the nibble so there might be 2 or 3 0-bits in the mantissa.

    41100000 was 1.0

    My sample code:

    import struct

    dividend=float(16**6)

    def fromhex370(invalue):
    "convert 4 char binary string 370-formatted floating point float value"
    if invalue == 0:
    return 0.0
    istic,a,b,c=struct.unpack('>BBBB',invalue)
    if istic >= 128:
    sign= -1.0
    istic = istic - 128
    else:
    sign = 1.0
    mant= float(a<<16) + float(b<<8) +float(c)
    return sign* 16**(istic-64)*(mant/dividend)

    Of course, no optimation was even thought of ......

    I believe the long floating point format just added 4 more bytes of
    significance to the end of the 4 bytes I mentioned above.

    HTH
     
    Howard Lightstone, Feb 28, 2004
    #12
  13. Anton and Howard,
    thank you very much (both your sample code do the job) I was really lost
    into bit manipulation without any real hope of success!!

    Angelo




    Based on this message:

    http://groups.google.com/groups?&selm=

    I wrote a function that maybe could do it (use at own risk). It
    converts a 4-byte real from "IBM System/370 Floating Point Format" to
    a Python float. In your code above replace one line:

    The conversion function and a test function:

    import struct

    def ibm370tofloat(fourbytes):
    i = struct.unpack('>I',fourbytes)[0]
    sign = [1,-1][bool(i & 0x100000000L)]
    characteristic = ((i >> 24) & 0x7f) - 64
    fraction = (i & 0xffffff)/float(0x1000000L)
    return sign*16**characteristic*fraction

    def test():
    import binascii
    pi = "413243f7"
    s1 = "46ee3bb4"
    s2 = "465de39a"
    for x in [pi,s1,s2]:
    y = binascii.unhexlify(x)
    print ibm370tofloat(y)

    if __name__=='__main__':
    test()

    output:

    3.14159297943
    15612852.0
    6153114.0

    I hope this helps.

    Anton
    [/QUOTE]


    --
    ========================================================
    Angelo Secchi PGP Key ID:EA280337
    ========================================================
    Current Position:
    Graduate Fellow Scuola Superiore S.Anna
    Piazza Martiri della Liberta' 33, Pisa, 56127 Italy
    ph.: +39 050 883365
    email: www.sssup.it/~secchi/
    ========================================================
     
    Angelo Secchi, Feb 28, 2004
    #13
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.