Help wanted with md2 hash algorithm

W

wjb131

hi all,

below you find my simple python version of MD2 algorithm
as described in RFC1319 (http://rfc1319.x42.com/MD2).
It produces correct results for strings shorter than 16 Bytes and wrong
results for longer strings.

I can't find what's wrong.

Can anybody help?

Regards
Wolfgang

-------------------------------------

#--- MD2 validation data
md2_test = [
('', '8350e5a3e24c153df2275c9f80692773'),
("a", '32ec01ec4a6dac72c0ab96fb34c0b5d1'),
("abc", 'da853b0d3f88d99b30283a69e6ded6bb'),
("message digest", 'ab4f496bfb2a530b219ff33031fe06b0'),
("abcdefghijklmnopqrstuvwxyz",
'4e8ddff3650292ab5a4108c3aa47940b'),
("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
'da33def2a42df13975352846c30338cd'),

("12345678901234567890123456789012345678901234567890123456789012345678901234567890",

'd5976f79d83d3a0dc9806c3c66f3efd8' )
]


#--- 256-byte "random" permutation constructed from the digits of pi
PI_SUBST = [41, 46, 67, 201, 162, 216, 124, 1, 61, 54, 84, 161, 236,
240, 6,
19, 98, 167, 5, 243, 192, 199, 115, 140, 152, 147, 43, 217, 188,
76, 130, 202, 30, 155, 87, 60, 253, 212, 224, 22, 103, 66, 111, 24,
138, 23, 229, 18, 190, 78, 196, 214, 218, 158, 222, 73, 160, 251,
245, 142, 187, 47, 238, 122, 169, 104, 121, 145, 21, 178, 7, 63,
148, 194, 16, 137, 11, 34, 95, 33, 128, 127, 93, 154, 90, 144, 50,
39, 53, 62, 204, 231, 191, 247, 151, 3, 255, 25, 48, 179, 72, 165,
181, 209, 215, 94, 146, 42, 172, 86, 170, 198, 79, 184, 56, 210,
150, 164, 125, 182, 118, 252, 107, 226, 156, 116, 4, 241, 69, 157,
112, 89, 100, 113, 135, 32, 134, 91, 207, 101, 230, 45, 168, 2, 27,
96, 37, 173, 174, 176, 185, 246, 28, 70, 97, 105, 52, 64, 126, 15,
85, 71, 163, 35, 221, 81, 175, 58, 195, 92, 249, 206, 186, 197,
234, 38, 44, 83, 13, 110, 133, 40, 132, 9, 211, 223, 205, 244, 65,
129, 77, 82, 106, 220, 55, 200, 108, 193, 171, 250, 36, 225, 123,
8, 12, 189, 177, 74, 120, 136, 149, 139, 227, 99, 232, 109, 233,
203, 213, 254, 59, 0, 29, 57, 242, 239, 183, 14, 102, 88, 208, 228,
166, 119, 114, 248, 235, 117, 75, 10, 49, 68, 80, 180, 143, 237,
31, 26, 219, 153, 141, 51, 159, 17, 131, 20]


PADDING = ["".join(map(chr, *i)) for i in range(17)]
SIZE = 16

#----------------------------------------------------------
def md2(m):

## (1) prepare message

#--- append to m i byte with value i, len(m) % 16 == 0
padLen = SIZE - len(m) % SIZE
m += PADDING[padLen]

#--- compute checksum C of m and append it to m
C = [0] * SIZE
L = 0
for i in range(len(m) / SIZE):
m16 = m[i*SIZE : (i+1)*SIZE]
for j in range(SIZE):
c = ord(m16[j])
C[j] = PI_SUBST[ c ^ L ]
L = C[j]
C = "".join( map(chr, C) )
m += C

## (2) compress message

X = [0] * 48 # 'compressor'

for i in range(len(m) / SIZE):

# fill X
m16 = m[i*SIZE : (i+1)*SIZE]
X[16:32] = map(ord, m16)
X[32:48] = [ a^b for (a,b) in zip(X[16:48], X[:16]) ]

# compress m
t = 0
for j in range(18):
for k in range(48):
t = X[k] ^ PI_SUBST[t]
X[k] = t
t = (t+j) % 256

X = "".join(map(lambda d: "%02x" % d, X[:SIZE]))
return X


def test():
for (i, j) in md2_test:
md = md2(i)
print "Message: %s" % i
print "My MD:%s" % md
print "Test MD:%s" % j
print "%s" % (md== j)
print

if __name__ == "__main__":
test()
 
T

Tom Anderson

below you find my simple python version of MD2 algorithm
as described in RFC1319 (http://rfc1319.x42.com/MD2).
It produces correct results for strings shorter than 16 Bytes and wrong
results for longer strings.

I can't find what's wrong.

Can anybody help?

Okay, i've reimplemented the code from scratch, based on the RFC, without
even looking at your code, as a basis for comparison.

The trouble is, i get exactly the same results as you!

Here's mine:

http://urchin.earth.li/~twic/md2.py

I guess the thing to do is extract the C code from the RFC and compile it,
verify that it works, then stick loads of print statements in the C and
the python, to see where the states of the checksum engines diverge.

tom
 
P

Paul Rubin

below you find my simple python version of MD2 algorithm
as described in RFC1319 (http://rfc1319.x42.com/MD2).
It produces correct results for strings shorter than 16 Bytes and wrong
results for longer strings.

Why do you want to use MD2? It's very slow and it's also been
deprecated for security reasons. Use SHA1 (built into Python library)
or SHA256/384/512 (implementations are circulating) instead.
 
W

wjb131

Paul said:
Why do you want to use MD2? It's very slow and it's also been
deprecated for security reasons. Use SHA1 (built into Python library)
or SHA256/384/512 (implementations are circulating) instead.

I want to understand it, and -- therefor ;-) -- I want to implement it
in pure Pyhton.
 
P

Paul Rubin

I want to understand it, and -- therefor ;-) -- I want to implement it
in pure Pyhton.

OK. It should be pretty easy to implement. You should find the
official rfc at ietf.org. I remember there was some minor erratum in
the original version that may or may not have been corrected, but it
shouldn't cause any serious confusion. I implemented it in Javascript
a long time ago and as I remember, all the test vectors passed.
 
W

wjb131

Paul said:
OK. It should be pretty easy to implement. You should find the
official rfc at ietf.org. I remember there was some minor erratum in
the original version that may or may not have been corrected, but it
shouldn't cause any serious confusion. I implemented it in Javascript
a long time ago and as I remember, all the test vectors passed.

I thought I had build a proper implementation in Python. The error you
mention can be avoided by studying the C implementation in RFC 1319.
BUT: Some of the test vectors failed. That's my problem ;-(
And therefore I asked for help.
Wolfgang
 
P

Paul Rubin

Already done before my first posting. But the problem was there. I
studied the C sources of MD2 of that package, too. But all test cases
with more than 16 bytes failed.

Hmm, did the test cases work for the RFC 1319 reference code? What
about OpenSSL?

I thought when I did my JS implementation, I checked all the test
vectors, but it was a long time ago and I can't absolutely be sure.
 
P

Paul Rubin

Paul Rubin said:
Hmm, did the test cases work for the RFC 1319 reference code? What
about OpenSSL?

I just checked OpenSSL and all the test values it computes match the RFC.
 
T

Tom Anderson

I guess the thing to do is extract the C code from the RFC and compile
it, verify that it works, then stick loads of print statements in the C
and the python, to see where the states of the checksum engines diverge.

Okay, i've done this. I had to fiddle with the source a bit - added a
#include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and
took the corresponding includes out of md2c.c and mddriver.c (to avoid
duplicate definitions) - but after that, it built cleanly with:

gcc -DMD=2 *.c *.h -o mddriver

A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it
even built cleanly with -Wall.

Running the test suite with mddriver -x gives results matching the test
vectors in the RFC - a good start!

Patching the code to dump the checksums immediately after updating with
the pad, and before updating with the checksum:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3
MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b
*** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = da33def2a42df13975352846c30338cd
*** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8

And here's my python code with the same modification, running the test
suite:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 539ba695f264f365bcabc5c8b10913c7
MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56
*** checksum after padding = 365fe0617f5f56a56090af1cfd6caac3
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813
*** checksum after padding = 9acf39425d22c4e3b4ddbdc563d23716
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf

As you can see, the checksums start to go wrong when we hit 16 bytes.

So, let us turn our attention to the checksum function.

Here's the python i wrote:

def checksum_old(c, buf): # c is checksum array, buf is input block
l = c[-1]
for i in xrange(digest_size):
l = S[(buf ^ l)]
c = l

Here's the C from the RFC:

unsigned int i, j, t;
t = checksum[15];
for (i = 0; i < 16; i++)
t = checksum ^= PI_SUBST[block ^ t];

Spot the difference. Yes, the assignment into the checksum array is a ^=,
not a straight = - checksum bytes get set to
current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor
accumulator). Translating that into python, we get:

def checksum(c, buf):
l = c[-1]
for i in xrange(digest_size):
l = S[(buf ^ l)] ^ c
c = l

And when we put that back into the code, we get the right digests out.
Victory!

However, here's what the pseudocode in the RFC says:

For j = 0 to 15 do
Set c to M[i*16+j].
Set C[j] to S[c xor L].
Set L to C[j].
end /* of loop on j */

I certainly don't see any sign of a xor with the
current-value-of-checksum-byte in there - it looks like the C and
pseudocode in the RFC don't match up.

And, yes, googling for "RFC 1319 errata" brings up a report correcting
this. They really ought to amend RFCs to mention errata!

Correct code here:

http://urchin.earth.li/~twic/md2.py

tom
 
W

wjb131

Tom said:
I guess the thing to do is extract the C code from the RFC and compile
it, verify that it works, then stick loads of print statements in the C
and the python, to see where the states of the checksum engines diverge.

Okay, i've done this. I had to fiddle with the source a bit - added a
#include "global.h" to md2.h (it needs it for the PROTO_LIST macro) and
took the corresponding includes out of md2c.c and mddriver.c (to avoid
duplicate definitions) - but after that, it built cleanly with:

gcc -DMD=2 *.c *.h -o mddriver

A couple of pairs of (somewhat spurious) parentheses in mddriver.c, and it
even built cleanly with -Wall.

Running the test suite with mddriver -x gives results matching the test
vectors in the RFC - a good start!

Patching the code to dump the checksums immediately after updating with
the pad, and before updating with the checksum:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 4a42d3a377b7e9988fb9289699e4d3a3
MD2 ("abcdefghijklmnopqrstuvwxyz") = 4e8ddff3650292ab5a4108c3aa47940b
*** checksum after padding = c3db7592ee1dd9b84505cfb4e2f9a765
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = da33def2a42df13975352846c30338cd
*** checksum after padding = 59ca5673c8f931bc41214f56b5c6c01
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = d5976f79d83d3a0dc9806c3c66f3efd8

And here's my python code with the same modification, running the test
suite:

*** checksum after padding = 623867b6af52795e5f214e9720beea8d
MD2 ("") = 8350e5a3e24c153df2275c9f80692773
*** checksum after padding = 19739cada3ba281693348e9d256fff31
MD2 ("a") = 32ec01ec4a6dac72c0ab96fb34c0b5d1
*** checksum after padding = 19e29d1b7304368e595a276f302f57cc
MD2 ("abc") = da853b0d3f88d99b30283a69e6ded6bb
*** checksum after padding = 56d65157dedfcd75a7b1e82d970eec4b
MD2 ("message digest") = ab4f496bfb2a530b219ff33031fe06b0
*** checksum after padding = 539ba695f264f365bcabc5c8b10913c7
MD2 ("abcdefghijklmnopqrstuvwxyz") = 65182bb8c569485fcba44dbc66a02b56
*** checksum after padding = 365fe0617f5f56a56090af1cfd6caac3
MD2 ("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789") = a1ccc835ea9654d6a2926c21f0b20813
*** checksum after padding = 9acf39425d22c4e3b4ddbdc563d23716
MD2 ("12345678901234567890123456789012345678901234567890123456789012345678901234567890") = 8f1f49dc8de490b9aa7c99cec3fbccdf

As you can see, the checksums start to go wrong when we hit 16 bytes.

So, let us turn our attention to the checksum function.

Here's the python i wrote:

def checksum_old(c, buf): # c is checksum array, buf is input block
l = c[-1]
for i in xrange(digest_size):
l = S[(buf ^ l)]
c = l

Here's the C from the RFC:

unsigned int i, j, t;
t = checksum[15];
for (i = 0; i < 16; i++)
t = checksum ^= PI_SUBST[block ^ t];

Spot the difference. Yes, the assignment into the checksum array is a ^=,
not a straight = - checksum bytes get set to
current-value-of-checksum-byte xor S-box-transformation-of (input-byte xor
accumulator). Translating that into python, we get:

def checksum(c, buf):
l = c[-1]
for i in xrange(digest_size):
l = S[(buf ^ l)] ^ c
c = l

And when we put that back into the code, we get the right digests out.
Victory!

However, here's what the pseudocode in the RFC says:

For j = 0 to 15 do
Set c to M[i*16+j].
Set C[j] to S[c xor L].
Set L to C[j].
end /* of loop on j */

I certainly don't see any sign of a xor with the
current-value-of-checksum-byte in there - it looks like the C and
pseudocode in the RFC don't match up.

And, yes, googling for "RFC 1319 errata" brings up a report correcting
this. They really ought to amend RFCs to mention errata!

Correct code here:

http://urchin.earth.li/~twic/md2.py

tom



Hi Tom,

thank you very much for your analysis and your solution. Great.
(My knowledge of C language is not that good.)
Wolfgang
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,045
Messages
2,570,389
Members
47,052
Latest member
ketan

Latest Threads

Top