Does any one recognize this binary data storage format

G

geskerrett

I am hoping someone can help me solve a bit of a puzzle.

We are working on a data file reader and extraction tool for an old
MS-DOS accounting system dating back to the mid 80's.

In the data files, the text information is stored in clearly readable
ASCII text, so I am comfortable that this file isn't EBCIDIC, however,
the some of the numbers are stored in a format that we can't seem to
recognize or unpack using the standard python tools (struct, binascii)
.... or or atleast our understanding of how these tools work !


Any assistance would be appreciated.

Here are a few examples of telephone numbers;

Exmaple 1:

Phone 1: 5616864700
Hex On Disk: C0DBA8ECF441

Phone 2: 5616885403
Hex on Disk: B0E9ADECF4F1



Another example:
Phone 1: 8003346488
Hex On Disk: 800396d0fd41

Phone2: 9544261331
Hex On Disk: F8f50ec70142

Phone3: 9544278601
Hex On Disk: 481211c70142


TIA.
 
C

Christopher Subich

I am hoping someone can help me solve a bit of a puzzle.

We are working on a data file reader and extraction tool for an old
MS-DOS accounting system dating back to the mid 80's.

In the data files, the text information is stored in clearly readable
ASCII text, so I am comfortable that this file isn't EBCIDIC, however,
the some of the numbers are stored in a format that we can't seem to
recognize or unpack using the standard python tools (struct, binascii)
... or or atleast our understanding of how these tools work !


Any assistance would be appreciated.

Here are a few examples of telephone numbers;

Exmaple 1:

Phone 1: 5616864700
Hex On Disk: C0DBA8ECF441

Phone 2: 5616885403
Hex on Disk: B0E9ADECF4F1

Is this value a typo instead of ...F441?
 
D

Dejan Rodiger

(e-mail address removed) said the following on 9.08.2005 19:29:
We are working on a data file reader and extraction tool for an old
MS-DOS accounting system dating back to the mid 80's.

Could you tell us what is the extension of those files?

Could you post full 5-10 records (ASCII + HEX)?
 
D

Dejan Rodiger

(e-mail address removed) said the following on 9.08.2005 19:29:
Phone 1: 5616864700
Hex On Disk: C0DBA8ECF441

5616864700(10)=14ECA8DBC(16)
14 EC A8 DB C leftshift by 4 bits (it will add 0 on last C)
C0 DB A8 EC 14 00 write bytes from right to left
C0 DB A8 EC F4 41 Add E041
Phone 1: 8003346488
Hex On Disk: 800396d0fd41

8003346488(10)=1DD096038(16)
1D D0 96 03 8
80 03 96 D0 1D 00
80 03 96 d0 fd 41 Add E041

But works only for Phone 1 :)
 
G

geskerrett

the extension on the files is *.mas but I a pretty sure it is not
relevant. I beleive it used by the application.

I can posted records as it will take up to much space.
But all three phone numbers are stored in 8 bytes with null bytes (ie.
00) stored in the leading positions (ie. the left hand side)

I do have some more examples;

I have inserted the leading null bytes and seperated with spaces for
clarity.

Ex #1) 333-3333
Hex On disk: 00 00 00 80 6a 6e 49 41

Ex #2) 666-6666
Hex On disk: 00 00 00 80 6a 6e 59 41

Ex#3) 777-7777
Hex On Disk: 00 00 00 40 7C AB 5D 41

Ex#4) 123-4567
Hex On Disk: 00 00 00 00 87 D6 32 41

Ex#5) 000-0001
Hex On disk: 00 00 00 00 00 00 F0 3F

Ex#6) 999-9999
Hex On disk: 00 00 00 E0 CF 12 63 41
 
C

Christopher Subich

Dejan said:
8003346488(10)=1DD096038(16)
1D D0 96 03 8
80 03 96 D0 1D 00
80 03 96 d0 fd 41 Add E041

I'm pretty sure that the last full byte is a parity check of some sort.
I still thing that Phone2 (..F1) is a typo and should be 41. Even if
it's not, it could be a more detailed parity (crc-like?) check.

If the F1/41 is a typo, the last byte is ..41 if the parity of the other
40 bits is odd, and ..42 if the parity is even. (Since ..41 and ..42
each have two 1s, it does not change the parity of the entire string).
If not, Lucy has some 'splaining to do.

Taking the last byte out of ther equation entirely, 40 bytes for 10
decimal numbers is 4 bytes / number, meaning there is some redundancy
still in the remainder (the full 10-digit number can be expressed with
room to spare in 36 bits).

Thinking like an 80s Mess-Dos programmer, 32-bit math is out of the
question since the CPU doesn't support it. Five decimal digits already
pushes the 16-bit boundary, so thinking of using the full phone number
or any computation is insane.

#1/#2 and #4/#5 share both the first five digits of the real phone
number and the last 16 bits of the remaining expression. Both pairs
*also* share bits 5-8 (the second hex digit).

Therefore, we may possibly conclude that digits 5-10 are stored in bits
5-8 and 21-36. This is an even 20 of the 40 data-bits. In fact, bits
6-8 of all expamples given were all 0, but since I can't find an
equivalent always-x set for the other 5 digits I'm not sure that this is
significant.

Therefore:
95442 = 8c701 = 1 + c701 (?)
56168 = 0ECF4 = 0 + ecf4 (?)

I'm not coming up with a terribly useful algorithm from this, though. :/
My guess is that somewhere, there's a boolean check based on whether a
digit is >= 6 [maybe 3?] (to prevent running afoul of 16-bitness). I'm
also 90% sure that the first and second halves of the phone number are
processed separately, at mostly, for the same reason.
 
R

Roel Schroeven

Dejan said:
(e-mail address removed) said the following on 9.08.2005 19:29:



5616864700(10)=14ECA8DBC(16)
14 EC A8 DB C leftshift by 4 bits (it will add 0 on last C)
C0 DB A8 EC 14 00 write bytes from right to left
C0 DB A8 EC F4 41 Add E041




8003346488(10)=1DD096038(16)
1D D0 96 03 8
80 03 96 D0 1D 00
80 03 96 d0 fd 41 Add E041

But works only for Phone 1 :)

E041 is some kind of checksum perhaps?
 
G

geskerrett

You are correct, that was a typo.
the second example should end in F441.

Thanks.
 
G

Grant Edwards

I can posted records as it will take up to much space. But all
three phone numbers are stored in 8 bytes with null bytes (ie.
00) stored in the leading positions (ie. the left hand side)

I do have some more examples;

I have inserted the leading null bytes and seperated with spaces for
clarity.

Ex #1) 333-3333
Hex On disk: 00 00 00 80 6a 6e 49 41

Ex #2) 666-6666
Hex On disk: 00 00 00 80 6a 6e 59 41

So there's only a 1-bit different between the on-disk
representation of 333-3333 and 666-6666.

That sounds pretty unlikely. Are you 100% sure you're looking
at the correct bytes?
 
G

geskerrett

Yes I double checked as I appreciate any help, but that is what is
stored on disk.

If it helps, we modified Ex#3. to be 777-777-7777
On disk this is now 00 00 10 87 77 F9 Fc 41

All the input fields are filled in this new example.
 
D

Dejan Rodiger

(e-mail address removed) said the following on 9.08.2005 22:45:
Yes I double checked as I appreciate any help, but that is what is
stored on disk.

If it helps, we modified Ex#3. to be 777-777-7777
On disk this is now 00 00 10 87 77 F9 Fc 41

All the input fields are filled in this new example.

So for number with 10 digit numbers you could say that it is:
7777777777(10)=1CF977871(16)
1CF977871 SHL 4 bits = 1C F9 77 87 10
write them from right to left and shift left for 8 bits
10 87 77 f9 1C 00
And then add F0 41

Could you also give some examples with nine to one digits?
 
D

Dejan Rodiger

Dejan Rodiger said the following on 9.08.2005 23:28:
(e-mail address removed) said the following on 9.08.2005 22:45:



So for number with 10 digit numbers you could say that it is:
7777777777(10)=1CF977871(16)
1CF977871 SHL 4 bits = 1C F9 77 87 10
write them from right to left and shift left for 8 bits
10 87 77 f9 1C 00
And then add F0 41

And add E041 (not F0 41)
 
S

Scott David Daniels

Grant said:
So there's only a 1-bit different between the on-disk
representation of 333-3333 and 666-6666.

That sounds pretty unlikely. Are you 100% sure you're looking
at the correct bytes?

Perhaps the one bit is an exponent -- some kind of floating point
based format? That matches the doubling of all digits.

--Scott David Daniels
(e-mail address removed)
 
G

Grant Edwards

Perhaps the one bit is an exponent -- some kind of floating point
based format? That matches the doubling of all digits.

That would just be sick. I can't imagine anybody on an 8-bit
CPU using FP for a phone number.
 
C

Christopher Subich

Grant said:
That would just be sick. I can't imagine anybody on an 8-bit
CPU using FP for a phone number.

Nobody on an 8-bit CPU would have a FPU, so I'll guarantee that this is
done using only 8 or 16-bit (probably 8) integer math.
 
G

Grant Edwards

Nobody on an 8-bit CPU would have a FPU, so I'll guarantee that this is
done using only 8 or 16-bit (probably 8) integer math.

And I'll guarantee that the difference between 333-3333 and
666-6666 has to be more than 1-bit. There's no way that can be
the correct data unless it's something like an index into a
different table or a pointer or something along those lines.
 
C

Christopher Subich

Grant said:
And I'll guarantee that the difference between 333-3333 and
666-6666 has to be more than 1-bit. There's no way that can be
the correct data unless it's something like an index into a
different table or a pointer or something along those lines.

Absolutely. I hadn't even taken a good look at those datapoints yet.

The dataset that I'd like to see:
000-000-0001
000-000-0010
(etc)
000-000-0002
000-000-0004
000-000-0008
000-000-0016
(etc)

I also wonder if the last 8-16 bits involves, at least in part, a count
of the length of the phone number, or at least a flag to distinguish 7
from 10 digits.
 
B

Bengt Richter

That would just be sick. I can't imagine anybody on an 8-bit
CPU using FP for a phone number.
... "convert little-endian hex of ieee double binary to double"
... assert len(dhex)==16, (
... "hex of double in binary must be 8 bytes (hex pairs in little-endian order")
... dhex = ''.join(reversed([dhex[i:i+2] for i in xrange(0,16,2)]))
... m = int(dhex, 16)
... x = ((m>>52)&0x7ff) - 0x3ff - 52
... s = (m>>63)&0x1
... f = (m & ((1<<52)-1))|((m and 1 or 0)<<52)
... return (1.0,-1.0)*f*2.0**x
... 7777777777.0

;-)

Regards,
Bengt Richter
 
J

John Machin

Bengt said:
That would just be sick. I can't imagine anybody on an 8-bit
CPU using FP for a phone number.
... "convert little-endian hex of ieee double binary to double"
... assert len(dhex)==16, (
... "hex of double in binary must be 8 bytes (hex pairs in little-endian order")
... dhex = ''.join(reversed([dhex[i:i+2] for i in xrange(0,16,2)]))
... m = int(dhex, 16)
... x = ((m>>52)&0x7ff) - 0x3ff - 52
... s = (m>>63)&0x1
... f = (m & ((1<<52)-1))|((m and 1 or 0)<<52)
... return (1.0,-1.0)*f*2.0**x
...7777777777.0

;-)

Regards,
Bengt Richter


Well done, Scott & Bengt!!
I've just verified that this works with all 12 corrected examples posted
by the OP.

Grant, MS-DOS implies 16 bits at least; and yes there was an FPU (the
8087). And yes there are a lot of sick people who store things as
numbers (whether integer or FP) when the only arithmetic operations that
can be applied to them stuff them up mightily (like losing leading
zeroes off post-codes, having NEGATIVE tax file numbers, etc) and it's
still happening on the best OSes and 64-bit CPUS. Welcome to the real
world :)

Cheers,
John
 
B

Bengt Richter

That would just be sick. I can't imagine anybody on an 8-bit
CPU using FP for a phone number.
... "convert little-endian hex of ieee double binary to double"
... assert len(dhex)==16, (
... "hex of double in binary must be 8 bytes (hex pairs in little-endian order")
... dhex = ''.join(reversed([dhex[i:i+2] for i in xrange(0,16,2)]))
... m = int(dhex, 16)
... x = ((m>>52)&0x7ff) - 0x3ff - 52
... s = (m>>63)&0x1
... f = (m & ((1<<52)-1))|((m and 1 or 0)<<52)
... return (1.0,-1.0)*f*2.0**x
...7777777777.0

;-)

Now the easy way ;-)
... return struct.unpack('d',''.join(chr(int(h[i:i+2],16)) for i in xrange(0,16,2)))[0]
... 7777777777.0

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,007
Latest member
obedient dusk

Latest Threads

Top