External Hashing [was Re: matching strings in a large set of strings]

Discussion in 'Python' started by Helmut Jarausch, Apr 30, 2010.

  1. I think one could apply an external hashing technique which would require only
    very few disk accesses per lookup.
    Unfortunately, I'm now aware of an implementation in Python.
    Does anybody know about a Python implementation of external hashing?

    Thanks,
    Helmut.

    --
    Helmut Jarausch

    Lehrstuhl fuer Numerische Mathematik
    RWTH - Aachen University
    D 52056 Aachen, Germany
     
    Helmut Jarausch, Apr 30, 2010
    #1
    1. Advertising

  2. Helmut Jarausch

    John Bokma Guest

    Re: External Hashing

    Helmut Jarausch <-aachen.de> writes:

    > I think one could apply an external hashing technique which would require only
    > very few disk accesses per lookup.
    > Unfortunately, I'm now aware of an implementation in Python.
    > Does anybody know about a Python implementation of external hashing?


    SQLite?

    --
    John Bokma j3b

    Hacking & Hiking in Mexico - http://johnbokma.com/
    http://castleamber.com/ - Perl & Python Development
     
    John Bokma, Apr 30, 2010
    #2
    1. Advertising

  3. Helmut Jarausch

    Tim Chase Guest

    Re: External Hashing [was Re: matching strings in a large set ofstrings]

    On 04/30/2010 12:51 PM, Helmut Jarausch wrote:
    > I think one could apply an external hashing technique which would require only
    > very few disk accesses per lookup.
    > Unfortunately, I'm now aware of an implementation in Python.
    > Does anybody know about a Python implementation of external hashing?


    While you don't detail what you're hashing, Stephan Behnel
    already suggested (in the parent thread) using one of Python's
    native dbm modules (I just use anydbm and let it choose). The
    underlying implementations should be fairly efficient assuming
    you don't use the dumbdbm last-resort fallback). With the anydbm
    interface, you can implement dict/set semantics as long as you
    take care that everything is marshalled into and out of strings
    for keys/values.

    -tkc
     
    Tim Chase, Apr 30, 2010
    #3
  4. Helmut Jarausch

    Dave Angel Guest

    Re: External Hashing [was Re: matching strings in a large set ofstrings]

    Helmut Jarausch wrote:
    > I think one could apply an external hashing technique which would require only
    > very few disk accesses per lookup.
    > Unfortunately, I'm now aware of an implementation in Python.
    > Does anybody know about a Python implementation of external hashing?
    >
    > Thanks,
    > Helmut.
    >
    >

    That's what databases are for. But if you're trying to streamline it
    further than a database, you need to give some constraints. In an
    earlier thread, you mentioned strings with a constant size. I don't
    think you ever actually said that the match string was also of the same
    size, but I'll assume it is. Final question is how often you'll be
    updating the list, versus searching it. If you do both frequently,
    stick to a database.

    On the other hand, if the list on disk is fixed, you could thoroughly
    optimize its makeup. Perhaps the easiest efficient form would be to
    have a file in two parts. The first part is the hash table, where each
    entry has a seek-offset and a size. And the rest of the file is just a
    "list" of items, sorted by hash value.

    If you make a hash large enough that the maximum collision list is a
    couple of k, then two seeks would get any record. First seek to the
    hash entry, then use that to decide where to seek for the data, and how
    much to read. Then you simply search that block in memory.

    You could also play games with a two-level hash, where each block of
    records has its own mini hash table. Doesn't seem worth it for a mere
    83million records.

    DaveA
     
    Dave Angel, Apr 30, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. anonym
    Replies:
    1
    Views:
    1,043
    Knute Johnson
    Jan 15, 2009
  2. Karin Lagesen

    matching strings in a large set of strings

    Karin Lagesen, Apr 29, 2010, in forum: Python
    Replies:
    13
    Views:
    466
    Bryan
    May 3, 2010
  3. WTH

    Hashing long strings...

    WTH, Aug 18, 2010, in forum: Java
    Replies:
    8
    Views:
    626
    Roedy Green
    Aug 22, 2010
  4. Replies:
    5
    Views:
    927
    Xho Jingleheimerschmidt
    Apr 2, 2009
  5. Adam Funk
    Replies:
    22
    Views:
    157
    Adam Funk
    Jun 3, 2014
Loading...

Share This Page