L
LaundroMat
Hi -
I'm trying to calculate unique hash values for binary files,
independent of their location and filename, and I was wondering
whether I'm going in the right direction.
Basically, the hash values are calculated thusly:
f = open('binaryfile.bin')
import hashlib
h = hashlib.sha1()
h.update(f.read())
hash = h.hexdigest()
f.close()
A quick try-out shows that effectively, after renaming a file, its
hash remains the same as it was before.
I have my doubts however as to the usefulness of this. As f.read()
does not seem to read until the end of the file (for a 3.3MB file only
a string of 639 bytes is being returned, perhaps a 00-byte counts as
EOF?), is there a high danger for collusion?
Are there better ways of calculating hash values of binary files?
Thanks in advance,
Mathieu
I'm trying to calculate unique hash values for binary files,
independent of their location and filename, and I was wondering
whether I'm going in the right direction.
Basically, the hash values are calculated thusly:
f = open('binaryfile.bin')
import hashlib
h = hashlib.sha1()
h.update(f.read())
hash = h.hexdigest()
f.close()
A quick try-out shows that effectively, after renaming a file, its
hash remains the same as it was before.
I have my doubts however as to the usefulness of this. As f.read()
does not seem to read until the end of the file (for a 3.3MB file only
a string of 639 bytes is being returned, perhaps a 00-byte counts as
EOF?), is there a high danger for collusion?
Are there better ways of calculating hash values of binary files?
Thanks in advance,
Mathieu