Extract double in binary file

P

Pascal

Hello,

I've a binary file with data in it.
This file come from an old ms dos application (multilog ~ 1980).
In this application, a field is declared as a 'decimal' (999 999
999.99).
I put 0.00 in the field and save the record to the file.
When I look in the binary file (with python or an hex editor), the
field is stored on 8 bytes: 00-00-00-00-00-00-7F-00.
I try unpack from struct module but the result isn't good.

Can someone help me?

Thanks.
 
G

Gandalf

I've a binary file with data in it.
This file come from an old ms dos application (multilog ~ 1980).
In this application, a field is declared as a 'decimal' (999 999
999.99).
I put 0.00 in the field and save the record to the file.
When I look in the binary file (with python or an hex editor), the
field is stored on 8 bytes: 00-00-00-00-00-00-7F-00.
I try unpack from struct module but the result isn't good.

Can someone help me?
Most likely it is a BCD field. Please watch for number in the file with
a simple text
viewer. In that case, you can read the number as a string (length is
len('999999999.99')
or similar) and convert it with int() or long().

Cheers,

L
 
T

Terry Reedy

Gandalf said:
Most likely it is a BCD field.

I believe hex OP gave is for double 0.0e0 (exponents are biased).
OTOH, BCD for 11 digits would be 6 bytes, all 0, which also has.
Putting -1.0 and 1.0 in field and storing would clarify storage format
if really not known.

tjr
 
C

Christopher A. Craig

I've a binary file with data in it.
This file come from an old ms dos application (multilog ~ 1980).
In this application, a field is declared as a 'decimal' (999 999
999.99).
I put 0.00 in the field and save the record to the file.
When I look in the binary file (with python or an hex editor), the
field is stored on 8 bytes: 00-00-00-00-00-00-7F-00.
I try unpack from struct module but the result isn't good.

Can someone help me?

You're sort of vague here, but I don't think struct is going to help
you regardless. "decimal" in this case is almost certainly some sort
of BCD, which isn't a standard C struct (and therefore unknown to the
struct module).

You really need to figure out how the data is stored. Based on your
one example it looks like it's stored as a series of 7 bit values
representing the decimal digits with 0x7f indicating the decimal
point. If this is correct you could use something like

tstr=''
for c in instr:
if c == chr(0x7f):
tstr+='.'
else:
tstr += str(ord(c))
fl = float(tstr)

With two major caveats:
1) that this is going to return a float, not a decimal
2) There's no way for me to even guess how negative numbers are
represented
 
C

Colin Brown

Pascal said:
Hello,

I've a binary file with data in it.
This file come from an old ms dos application (multilog ~ 1980).
In this application, a field is declared as a 'decimal' (999 999
999.99).
I put 0.00 in the field and save the record to the file.
When I look in the binary file (with python or an hex editor), the
field is stored on 8 bytes: 00-00-00-00-00-00-7F-00.
I try unpack from struct module but the result isn't good.

Can someone help me?

Thanks

If the number is saved in a floating point representation (IEEE?),
typically [sign][exponent][fraction] then you really need to know
what the type is. For example, I had to make cross-platform real
numbers at one stage and fabricated them as below.

Colin Brown
PyNZ


import math

def vmsR4(real):
'''vmsR4(real): returns an integer that is equivalent to a VMS real*4 '''
(m, e) = math.frexp(real)
if m == 0.0:
return 0
else:
sign = m < 0
exp = e + 128
mant = int((16777216L * abs(m)) + 0.5) - 8388608
return (sign << 15) + (exp << 7) + (mant >> 16) + (mant << 16)
 
P

Pascal

Hello,

I've a binary file with data in it.
This file come from an old ms dos application (multilog ~ 1980).
In this application, a field is declared as a 'decimal' (999 999
999.99).
I put 0.00 in the field and save the record to the file.
When I look in the binary file (with python or an hex editor), the
field is stored on 8 bytes: 00-00-00-00-00-00-7F-00.
I try unpack from struct module but the result isn't good.

Can someone help me?

Thanks.

First, thanks for answers.

Some precisions:
0.00 > 00-00-00-00-00-00-7F-00
1.00 > 00-00-00-00-00-00-00-81
2.00 > 00-00-00-00-00-00-00-82
3.00 > 00-00-00-00-00-00-40-82
4.00 > 00-00-00-00-00-00-00-83

10.00 > 00-00-00-00-00-00-20-84
1000.00 > 00-00-00-00-00-00-7A-8A

1.11 > 14-AE-47-E1-7A-14-0E-81
 
T

Terry Reedy

Pascal said:
Some precisions:

('examples': 'precisions' does not work here in English)
0.00 > 00-00-00-00-00-00-7F-00
1.00 > 00-00-00-00-00-00-00-81
2.00 > 00-00-00-00-00-00-00-82
3.00 > 00-00-00-00-00-00-40-82
4.00 > 00-00-00-00-00-00-00-83

10.00 > 00-00-00-00-00-00-20-84
1000.00 > 00-00-00-00-00-00-7A-8A

1.11 > 14-AE-47-E1-7A-14-0E-81

The only obvious pattern I see is that 2**0 -> 81, 2**1->82, ...
2**9->8A (where A==10) ie, for non-zero, last byte is 81 + exponent
of largest power of two, which seems like type of float, and first 6
are 0 if integral. May be proprietary format.

TJR
 
D

Dennis Lee Bieber

Pascal fed this fish to the penguins on Thursday 27 November 2003 01:07
am:

Some precisions:
0.00 > 00-00-00-00-00-00-7F-00
1.00 > 00-00-00-00-00-00-00-81
2.00 > 00-00-00-00-00-00-00-82
3.00 > 00-00-00-00-00-00-40-82
4.00 > 00-00-00-00-00-00-00-83

10.00 > 00-00-00-00-00-00-20-84
1000.00 > 00-00-00-00-00-00-7A-8A

1.11 > 14-AE-47-E1-7A-14-0E-81

You didn't show any negative values but...

My first attack would be to byte reverse that series:

00 7F 00 00 00 00 00 00 #strange, I'm used to "true" 0 formats

81 00 00 00 00 00 00 00 #many binary formats drop the
82 00 00 00 00 00 00 00 #leading bit, and I still don't
82 40 00 00 00 00 00 00 #know where a sign bit is
83 00 00 00 00 00 00 00 #I'm used to the left bit being
#the sign, but these seem to be
#using it as the exponent offset
#from 81

84 20 00 00 00 00 00 00

Inserting the implied/hidden leading bit gives

81 1 00 00 00 00 00 00 00
82 1 00 00 00 00 00 00 00
82 1 40 00 00 00 00 00 00
83 1 00 00 00 00 00 00 00
84 1 20 00 00 00 00 00 00

converting exponents excess 81...

0 1 00 00 00 00 00 00 00
1 1 00 00 00 00 00 00 00
1 1 40 00 00 00 00 00 00
2 1 00 00 00 00 00 00 00
3 1 20 00 00 00 00 00 00

Working bitwise, treating each bit as 2^x... (x varies by position)

2^0 1.0
2^1 2.0
2^1 + 2^0 3.0 #treating 80 as a sign bit
2^2 4.0
2^3 + 2^1 10.0


--
 
F

Francis Avila

Dennis Lee Bieber wrote in message ...
Pascal fed this fish to the penguins on Thursday 27 November 2003 01:07
am:
converting exponents excess 81...
0 1 00 00 00 00 00 00 00
1 1 00 00 00 00 00 00 00
1 1 40 00 00 00 00 00 00
2 1 00 00 00 00 00 00 00
3 1 20 00 00 00 00 00 00


That is a bizarre format, and of course I had to implement it. (Even C is
more pleasant in Python!).

It works for the cases given, but do find out where the sign bit is for the
mantissa. (This code assumes it's the MSB of the mantissa.)

Also tease out the NaN and +-Infinity cases.

--- Code ---
#! /usr/bin/env python
# by Francis Avila
#
# Decode a peculiar binary floating point encoding
# used by 'multilog', an old dos spreadsheet.

import struct

_known = (('0.00', '\x00\x00\x00\x00\x00\x00\x7F\x00'),
('1.00', '\x00\x00\x00\x00\x00\x00\x00\x81'),
('2.00', '\x00\x00\x00\x00\x00\x00\x00\x82'),
('3.00', '\x00\x00\x00\x00\x00\x00@\x82'),
('4.00', '\x00\x00\x00\x00\x00\x00\x00\x83'),
('10.00','\x00\x00\x00\x00\x00\x00 \x84'),
('1000.00','\x00\x00\x00\x00\x00\x00z\x8a'),
('1.11', '\x14\xaeG\xe1z\x14\x0e\x81'))

def _test():
tests = [(str(float(i)),str(dectofloat(j))) for i,j in _known]
results = [expect==got for expect,got in tests]

failed = [tests for i, passed in enumerate(results) if not passed]

if failed: return failed
else: return 'Passed'

def bin(I):
"""Return list of bits of int I in little endian."""
if I < 0:
raise ValueError, "I must be >= 0"
bits = []
if I == 0:
bits = [0]
while I>0:
r = (I & 0x1)
if r: r = 1
bits.append(r)
I >>= 1
bits.reverse()
return bits

def binaryE(n, exp):
"""Return result of a binary n*10**exp.

As n*10**exp is to decimal, so binaryE(n, exp) is to binary.
"""
return sum([2**(exp-i) for i,bit in enumerate(bin(n)) if bit])


#Add special cases here:
SPECIAL = {'\x00\x00\x00\x00\x00\x00\x7f\x00':0.0}

def dectofloat(S):
"""Return float value of 8-byte 'decimal' string."""
if S in SPECIAL:
return SPECIAL

# Convert to byteswapped long.
N, = struct.unpack('<Q', S)

# Grab exponent and mantissa parts using bitmasks.
# The eight MSBs are exponent; rest mantissa.
exp, mant = (N&(0xffL<<56))>>56, N&~(0xffL<<56)

exp -= 0x81 # Exponential part is excess 0x81 (e.g., 0x82 is 1).

msign = mant & (0x80L<<48) # MSB of mantissa is sign bit. 0==positive.
if not msign:
msign = 1
else:
msign = -1

mant |= (0x80L<<48) # Add implied 1 to the MSB of mantissa.

# Now, binary scientific notation:

return float(msign * binaryE(mant, exp))
 
D

Dennis Lee Bieber

Francis Avila fed this fish to the penguins on Friday 28 November 2003
01:45 am:
That is a bizarre format, and of course I had to implement it. (Even C
is more pleasant in Python!).
And I thought /I/ was the masochist...

I do have to confess that short tests with struct did reveal that, on
my system, regular doubles do have the same byte order as the original
data. I'm just more comfortable with seeing hex representations of
numbers with the MSB on the left.
It works for the cases given, but do find out where the sign bit is
for the mantissa. (This code assumes it's the MSB of the mantissa.)
I suspect most would consider the format my college computer used to
be weird... Xerox Sigma 6... Excess 64 (decimal) (as I recall) exponent
powers of sixteen! A "normalized" mantissa could have up to three
leading 0 bits, and there were no "hidden" bit.

S eeeeeee mmmmmmmmm mmmmmmmm mmmmmmmm ...

--
 
B

Bengt Richter

Francis Avila fed this fish to the penguins on Friday 28 November 2003
01:45 am:

And I thought /I/ was the masochist...

I do have to confess that short tests with struct did reveal that, on
my system, regular doubles do have the same byte order as the original
data. I'm just more comfortable with seeing hex representations of
numbers with the MSB on the left.

I suspect most would consider the format my college computer used to
be weird... Xerox Sigma 6... Excess 64 (decimal) (as I recall) exponent
powers of sixteen! A "normalized" mantissa could have up to three
leading 0 bits, and there were no "hidden" bit.

S eeeeeee mmmmmmmmm mmmmmmmm mmmmmmmm ...
Does the OP have the ability ot generate example values at will,
or is it a matter of scrounging through some old recorded data with
no way of making more?

If he can make more, I'd suggest a number with most of the nybbles
of data individually numbered, e.g.,
0xfedcba987654321
E.g., if it's a 64-bit format, a 64-bit integer converted would probably
tell the a lot about where the bits go (how many get shifted out, hidden,
how they're ordered). And then the same number negative. E.g.,
'-0xFEDCBA987654321L'
'0xFFFFF0123456789ABCDFL'

Regards,
Bengt Richter
 
P

Pascal

First thanks for trying!

May be these values will tell you somethings:
1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x81
-1 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x81
2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x82
-2 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x82
3 0x0 0x0 0x0 0x0 0x0 0x0 0x40 0x82
-3 0x0 0x0 0x0 0x0 0x0 0x0 0xc0 0x82
1.1 0xcd 0xcc 0xcc 0xcc 0xcc 0xcc 0xc 0x81
1.2 0x9a 0x99 0x99 0x99 0x99 0x99 0x19 0x81
1.3 0x66 0x66 0x66 0x66 0x66 0x66 0x26 0x81
1.4 0x33 0x33 0x33 0x33 0x33 0x33 0x33 0x81
1.01 0x48 0xe1 0x7a 0x14 0xae 0x47 0x1 0x81
0.01 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7a
1.02 0x8f 0xc2 0xf5 0x28 0x5c 0x8f 0x2 0x81
0.02 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7b
 
B

Bengt Richter

First thanks for trying!

May be these values will tell you somethings:
1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x81
-1 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x81
2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x82
-2 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x82
3 0x0 0x0 0x0 0x0 0x0 0x0 0x40 0x82
-3 0x0 0x0 0x0 0x0 0x0 0x0 0xc0 0x82
1.1 0xcd 0xcc 0xcc 0xcc 0xcc 0xcc 0xc 0x81
1.2 0x9a 0x99 0x99 0x99 0x99 0x99 0x19 0x81
1.3 0x66 0x66 0x66 0x66 0x66 0x66 0x26 0x81
1.4 0x33 0x33 0x33 0x33 0x33 0x33 0x33 0x81
1.01 0x48 0xe1 0x7a 0x14 0xae 0x47 0x1 0x81
0.01 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7a
1.02 0x8f 0xc2 0xf5 0x28 0x5c 0x8f 0x2 0x81
0.02 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7b

This needs some cleanup and optimization, but for the above it seems to work:

====< PascalParent.py >=================================
#First thanks for trying!

#May be these values will tell you somethings:
data = """\
1 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x81
-1 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x81
2 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x82
-2 0x0 0x0 0x0 0x0 0x0 0x0 0x80 0x82
3 0x0 0x0 0x0 0x0 0x0 0x0 0x40 0x82
-3 0x0 0x0 0x0 0x0 0x0 0x0 0xc0 0x82
1.1 0xcd 0xcc 0xcc 0xcc 0xcc 0xcc 0xc 0x81
1.2 0x9a 0x99 0x99 0x99 0x99 0x99 0x19 0x81
1.3 0x66 0x66 0x66 0x66 0x66 0x66 0x26 0x81
1.4 0x33 0x33 0x33 0x33 0x33 0x33 0x33 0x81
1.01 0x48 0xe1 0x7a 0x14 0xae 0x47 0x1 0x81
0.01 0xd7 0xa3 0x70 0x3d 0xa 0xd7 0x23 0x7a
0.0 0x0 0x0 0x0 0x0 0x0 0x0 0x7F 0x0
"""
def bytes2float(bytes):
if bytes == [0,0,0,0,0,0,0x7f,0]: return 0.0
b = bytes[:]
sign = bytes[-2]&0x80
b[-2] |= 0x80 # hidden most significant bit in place of sign
exp = bytes[-1] - 0x80 -56 # exponent offset
acc = 0L
for i,byte in enumerate(b[:-1]):
acc |= (long(byte)<<(i*8))
return (float(acc)*2.0**exp)*((1.,-1.)[sign!=0])

for line in data.splitlines():
nlist = line.split()
fnum = float(nlist[0])
le_bytes = map(lambda x:int(x,16) ,nlist[1:])
test = bytes2float(le_bytes)
print ' in: %r\nout: %r\n'%(fnum,test)
========================================================
Result:

[10:44] C:\pywk\clp>PascalParent.py
in: 1.0
out: 1.0

in: -1.0
out: -1.0

in: 2.0
out: 2.0

in: -2.0
out: -2.0

in: 3.0
out: 3.0

in: -3.0
out: -3.0

in: 1.1000000000000001
out: 1.1000000000000001

in: 1.2
out: 1.2

in: 1.3
out: 1.3

in: 1.3999999999999999
out: 1.3999999999999999

in: 1.01
out: 1.01

in: 0.01
out: 0.01

in: 0.0
out: 0.0

HTH

Regards,
Bengt Richter
 
P

Pascal

A very big thanks to you.
The function run perfectly (after python 2.3! installed for enumerate function)
If you can, give me more details on the methode or the number's representation.

Thanks a lot!
 
B

Bengt Richter

A very big thanks to you.
The function run perfectly (after python 2.3! installed for enumerate function)
If you can, give me more details on the methode or the number's representation.

Thanks a lot!
According to an old MASM 5.0 programmer's guide, there was a Microsoft Binary format
for encoding real numbers, both short (32 bits) and long (64 bits).

There were 3 parts:

1. Biased 8-bit exponent in the highest byte (last in the little-endian view we've been using)
It says the bias is 0x81 for short numbers and 0x401 for long, but I'm not sure where that lines up.
I just got there by experimentation.

2. Sign bit (0 for +, 1 for -) in upper bit of second highest byte.

3. All except the first set bit of the mantissa in the remaining 7 bits of the second highest byte,
and the rest of the bytes. And since the most signficant bit for non-zero numbers is 1, it
is not represented. But if if were, it would share the same bit position where the sign is
(that's why I or-ed it in there to complete the actual mantissa).

MASM also supported a 10-byte format similar to IEEE. I didn't see anything in that section
on NaNs and INFs.

HTH

Regards,
Bengt Richter
 
D

Dennis Lee Bieber

Bengt Richter fed this fish to the penguins on Tuesday 02 December 2003
14:17 pm:
According to an old MASM 5.0 programmer's guide, there was a Microsoft
Binary format for encoding real numbers, both short (32 bits) and long
(64 bits).
You mean after all this, all we've done is reverse engineer a
Microsoft floating-point (software library) format?! <G>

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top