Is anybody knows about a linkable, quick MD5/SHA1 calculator library ?

Discussion in 'Python' started by DurumDara, May 29, 2006.

  1. DurumDara

    DurumDara Guest

    Hi !

    I need to speedup my MD5/SHA1 calculator app that working on
    filesystem's files.
    I use the Python standard modules, but I think that it can be faster if
    I use C, or other module for it.

    I use FSUM before, but I got problems, because I "move" into "DOS area",
    and the parameterizing of outer process maked me very angry (not working).
    You will see this in this place:
    http://mail.python.org/pipermail/python-win32/2006-May/004697.html

    So: I must handle unicode filenames. I think that if I find a library
    that can working with py's unicode chars, and I can load and use it to
    hash files, the code be better, and faster.

    Anybody knows about same code ?

    Py2.4, Windows, Py2Exe, wxPy... That was the specification.

    Thanx for help:
    dd
    DurumDara, May 29, 2006
    #1
    1. Advertising

  2. DurumDara

    John Machin Guest

    Re: Is anybody knows about a linkable, quick MD5/SHA1 calculatorlibrary ?

    On 30/05/2006 2:57 AM, DurumDara wrote:
    > Hi !
    >
    > I need to speedup my MD5/SHA1 calculator app that working on
    > filesystem's files.
    > I use the Python standard modules, but I think that it can be faster if
    > I use C, or other module for it.
    >
    > I use FSUM before, but I got problems, because I "move" into "DOS area",
    > and the parameterizing of outer process maked me very angry (not working).
    > You will see this in this place:
    > http://mail.python.org/pipermail/python-win32/2006-May/004697.html
    >
    > So: I must handle unicode filenames. I think that if I find a library
    > that can working with py's unicode chars, and I can load and use it to
    > hash files, the code be better, and faster.
    >
    > Anybody knows about same code ?
    >
    > Py2.4, Windows, Py2Exe, wxPy... That was the specification.
    >


    Hello (again), dd ...

    As the effbot has said, the Python md5 and sha modules are written in C.
    Hints: (1) the helpfile index says "builtin module" (2) you don't find a
    sha.py or md5.py in c:\Python24\Lib\

    An md5/sha library will concern itself with strings (which you obtain
    from a file's *contents*), just like Python's modules do. Any struggle
    with Unicode characters in the *names* of files is a separate concern.

    Let's all stop worrying about low-level things like getting the 8.3
    filename so that you can pass it to an MS-DOS program, and let's try to
    explore why you think there is a problem with your initial approach.

    At the end of this posting is a very simple Python function that
    calculates the hash of a file (and its length), given the name of the
    file (str or unicode, doesn't matter), which hashing module to use, and
    a blocksize to use when reading. There is a really flash :) user
    interface that allows you to try it with either a glob pattern "*.txt",
    or (as glob doesn't grok Windows mbcs/unicode filenames) a single
    utf8-encoded filename.

    Please try it out. My expectation is that, with a suitable choice of
    blocksize, you will not be able to find anything that is significantly
    faster and won't be difficult to interface to (like the FSUM program!).
    If you have any problems or more questions, please don't hesitate to ask.

    HTH,
    John

    === function and driver ===
    C:\junk>type hashtestbed.py

    def hash_of_file(hash_module, fname, block_size):
    f = open(fname, 'rb')
    hashobj = hash_module.new()
    filesize = 0
    while True:
    block = f.read(block_size)
    if not block: break
    filesize += len(block)
    hashobj.update(block)
    f.close()
    return (filesize, hashobj.digest())

    def to_hex(s):
    return ''.join('%02x' % ord(c) for c in s)

    if __name__ == "__main__":
    import sha, md5, time, sys, glob
    # print sys.argv
    mdlname = sys.argv[1]
    mdl = {'sha': sha, 'md5': md5}[mdlname]
    szs = sys.argv[2].lower()
    factor = {'m': 1024*1024, 'k': 1024}.get(szs[-1], 1)
    if factor == 1:
    bsz = int(szs)
    else:
    bsz = int(szs[:-1]) * factor
    filearg = sys.argv[3]
    if filearg.startswith("'"):
    # repr(single filename, encoded in utf8)
    filenames = [eval(filearg).decode('utf8')]
    # print filenames
    else:
    filenames = glob.glob(sys.argv[3])
    # I'm entering the above for the "Best UI of the Year" award :)
    for fn in filenames:
    t0 = time.time()
    fsz, digest = hash_of_file(mdl, fn, bsz)
    seconds = time.time() - t0
    print "%s, %r, bksz %d: %d bytes," \
    " %.2f secs (%.4f secs/MB)\n\thash = %s" \
    % (mdlname, fn, bsz, fsz,
    seconds, seconds/fsz*1024*1024, to_hex(digest))

    C:\junk>

    === sample usage ===

    C:\junk>hashtestbed.py md5 32k '\xe5\xbc\xa0\xe6\x95\x8f.txt'
    md5, u'\u5f20\u654f.txt', bksz 32768: 17 bytes, 0.00 secs (0.0000 secs/MB)
    hash = 746d0931605368989a20691a906a67f8

    C:\junk>hashtestbed.py md5 32k \downloads\python*.msi
    md5, '\\downloads\\python-2.4.2.msi', bksz 32768: 9671168 bytes, 0.08
    secs (0.00
    86 secs/MB)
    hash = bfb6fc0704d225c7a86d4ba8c922c7f5
    md5, '\\downloads\\python-2.4.3.msi', bksz 32768: 9688576 bytes, 0.06
    secs (0.00
    67 secs/MB)
    hash = ab946459d7cfba4a8500f9ff8d35cc97
    md5, '\\downloads\\python-2.5a2.msi', bksz 32768: 10274816 bytes, 0.05
    secs (0.0
    048 secs/MB)
    hash = cedc1e1fed9c4cd137921a80485bf007

    === end ===
    John Machin, May 30, 2006
    #2
    1. Advertising

  3. Re: Is anybody knows about a linkable, quick MD5/SHA1 calculatorlibrary ?

    DurumDara wrote:
    > Hi !
    >
    > I need to speedup my MD5/SHA1 calculator app that working on
    > filesystem's files.


    You could try using threads. This would allow the CPU and the disk to
    work in parallel.

    The sha/md5 modules don't seem to release the global interpreter lock,
    so you won't be able to use multiple CPUs/cores yet.

    Daniel
    Daniel Dittmar, May 30, 2006
    #3
  4. DurumDara

    Serge Orlov Guest

    DurumDara wrote:
    > Hi !
    >
    > I need to speedup my MD5/SHA1 calculator app that working on
    > filesystem's files.
    > I use the Python standard modules, but I think that it can be faster if
    > I use C, or other module for it.
    >
    > I use FSUM before, but I got problems, because I "move" into "DOS area",
    > and the parameterizing of outer process maked me very angry (not working).
    > You will see this in this place:
    > http://mail.python.org/pipermail/python-win32/2006-May/004697.html


    FWIW I looked at what is the problem, apparently fsum converts the name
    back to unicode, tries to print it and silently corrupts the output.
    You give it short name XA02BB~1 of the file xAÿ and fsum prints xA

    Use python module or try another utility.
    Serge Orlov, May 30, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steve
    Replies:
    2
    Views:
    554
    Steve
    May 5, 2004
  2. GS
    Replies:
    0
    Views:
    392
  3. Rafal 'Raf256' Maj

    md5/sha1

    Rafal 'Raf256' Maj, Jul 5, 2003, in forum: C Programming
    Replies:
    2
    Views:
    525
    Shill
    Jul 5, 2003
  4. LMZ
    Replies:
    5
    Views:
    507
    Martin v. Löwis
    Apr 6, 2008
  5. Adam Tauno Williams
    Replies:
    2
    Views:
    848
    Stefan Behnel
    Dec 30, 2010
Loading...

Share This Page