Re: Binary search trees or hash tables?

Discussion in 'C Programming' started by BGB / cr88192, Jan 6, 2010.

  1. "remod" <> wrote in message
    news:4b42f961$0$1131$...
    > Talking about containers made me wanting to reconsider my choice of hash
    > tables as the basis for the mapping key/value I have in my library.
    >
    > My (self imposed) requirement are:
    >
    > - container for key/value mapping
    > - simplicity of implementation
    > - reasonable fast insert/search/delete
    > - reasonable overhead
    >


    <snip>

    >
    > This very rough estimate made me lean forward hashtables. Did I miss
    > anything?
    >
    > I should now make a more precise analyis and take a final decision:
    > rewrite/optimize the current hashtable functions or move to BST?
    >
    > What would you suggest?
    >


    personally, I have not used BST's much for key/value lookup and similar
    uses.

    typically, I have used either:
    large hashes, which can be very fast for certain uses;
    hashed arrays, possibly with chaining between the array elements.

    so, yeah, large hashes, allow quick lookup, and do have costs for resizes or
    deletes (as a result, most places where I have used them, they have been
    fixed-size without support for delete).

    a hashed array is typically a bit better, since in this case one can have a
    small fixed-size hash, and use it to speed up lookup into a potentially much
    larger array (and, as well, the array need not have bunches of wasted
    space).

    sorted arrays are also nice, because they allow an O(log n) lookup without
    the added bulk or rebalancing of a BST, but may have a higher cost for
    inserts or deletes if the array needs to be kept sorted. sorted arrays are
    possibly a good option if one needs fast lookup but inserts/deletes are
    infrequent (a BST will often involve a little higher overhead for lookups,
    due mostly node stepping, and possibly a notably higher cost if one
    implements them via recursion rather than a loop).

    another observation: for small N, a linear lookup may actually be faster
    than either a binary lookup or a BST (for N<=5 or so, linear-search seems to
    be the best bet).

    yes, IME, one can find themselves doing a very large number of lookups over
    a small number of items, in contrast to the much more "popular" case of a
    small number of lookups over a large number items (well, except if the
    number of lookups is very rare, where the cost of maintaining a BST or
    sorted array could easily exceed the costs of just doing lookups with a
    linear search).


    so, I don't think there is a generally "best" option here, more specific
    needs in specific use cases.


    in my case, I have mostly used binary trees for spatial lookup tasks, since
    these are not readily optimized with other strategies (one can't exactly
    "hash" a spatial coordinate).

    actually, one algo I like is actually derived partly from quicksort, since I
    had noted that with slight tweaking a quicksort-like strategy can build BSPs
    in O(n log n) time, vs the O(n^2) or O(n^3) as is more commonly the case for
    BSP building, but this is at the cost that they tend to be far from optimal.

    so, it is usually useful for building a BSP in real-time, but not so good if
    one needs a good static BSP...
     
    BGB / cr88192, Jan 6, 2010
    #1
    1. Advertising

  2. "Richard Harter" <> wrote in message
    news:...
    > On Wed, 6 Jan 2010 12:19:19 -0700, "BGB / cr88192"
    > <> wrote:
    >
    > [snip]
    >
    >>another observation: for small N, a linear lookup may actually be faster
    >>than either a binary lookup or a BST (for N<=5 or so, linear-search seems
    >>to
    >>be the best bet).

    >
    > For searches of integers linear lookups using a sentinel are
    > typically faster up to about N=20 to 25.
    >


    ok, I hadn't checked exactly...

    where I had observed this mostly was while using it for looking up a value
    within a range:
    min<=value<max.

    I tried switching to a binary lookup, but had observed that is was actually
    slower (and also that N was generally small), but had not gone on an attempt
    to figure out where the "break-even" point was.

    my estimate of "5 or so" was based roughly on algorithmic complexity, noting
    that, for example, log2(20) is a bit less than 20, ...
     
    BGB / cr88192, Jan 6, 2010
    #2
    1. Advertising

  3. "remod" <> wrote in message
    news:4b44f637$0$1119$...
    > Richard Harter ha scritto:
    >> For searches of integers linear lookups using a sentinel are
    >> typically faster up to about N=20 to 25.

    >
    > This would make me lean even further towards hashtables. I was thinking
    > about implmenting some form of cuckoo hashing but I learned yesterday
    > about "Hopscotch hashing" (which I had never heard) and that would be
    > intersting to use.
    >


    I have usually implemented the step-hopping via a small PRNG (pseudo-random
    number generator).
    the PRNG is usually a fairly simple expression:
    key1=key*prime;

    which can them be masked (or shifted and masked, depending on the prime
    chosen, ...) to get the hash-spot.

    IME this usually works fairly well...


    oddly, I have never heard of anyone else doing this.

    then again, I am probably also in a minority for using a memory allocator
    based on bit-maps as well, ...
     
    BGB / cr88192, Jan 6, 2010
    #3
  4. BGB / cr88192

    Moi Guest

    On Wed, 06 Jan 2010 14:10:33 -0700, BGB / cr88192 wrote:

    > "remod" <> wrote in message
    > news:4b44f637$0$1119$...
    >> Richard Harter ha scritto:


    > I have usually implemented the step-hopping via a small PRNG
    > (pseudo-random number generator).
    > the PRNG is usually a fairly simple expression: key1=key*prime;



    odd * odd -->> odd
    odd * even -->> even
    even * even --> even

    Now consider what happens when your prime is odd (which is very likely ;-)


    > which can them be masked (or shifted and masked, depending on the prime
    > chosen, ...) to get the hash-spot.
    >
    > IME this usually works fairly well...
    >
    >
    > oddly, I have never heard of anyone else doing this.


    Maybe there is a reason for that.

    AvK
     
    Moi, Jan 6, 2010
    #4
  5. "Moi" <> wrote in message
    news:69aee$4b4505ac$5350c024$...
    > On Wed, 06 Jan 2010 14:10:33 -0700, BGB / cr88192 wrote:
    >
    >> "remod" <> wrote in message
    >> news:4b44f637$0$1119$...
    >>> Richard Harter ha scritto:

    >
    >> I have usually implemented the step-hopping via a small PRNG
    >> (pseudo-random number generator).
    >> the PRNG is usually a fairly simple expression: key1=key*prime;

    >
    >
    > odd * odd -->> odd
    > odd * even -->> even
    > even * even --> even
    >
    > Now consider what happens when your prime is odd (which is very likely ;-)
    >


    I often do a right shift as well (to key the slot index).

    so, if my prime is 4093, I often use >>12.
    however, if the prime is mersenne, it seems to be better not to shift.

    this basic strategy was grabbed from 'rand()', and in general seems to have
    fairly good distribution properties.


    >
    >> which can them be masked (or shifted and masked, depending on the prime
    >> chosen, ...) to get the hash-spot.
    >>
    >> IME this usually works fairly well...
    >>
    >>
    >> oddly, I have never heard of anyone else doing this.

    >
    > Maybe there is a reason for that.
    >


    dunno...

    most people "seem" to be a lot of funkiness with xor, but in my tests the
    statistical distribution tends to be worse with these xor based tricks, and
    also there does not seem to be much speed difference between an integer
    multiply and an xor.


    > AvK
     
    BGB / cr88192, Jan 6, 2010
    #5
  6. "remod" <> wrote in message
    news:4b450975$0$1115$...
    > BGB / cr88192 ha scritto:
    >
    >> "remod" <> wrote in message
    >>> [...] I learned yesterday about "Hopscotch hashing"(which
    > >> I had never heard of) and that would be intersting to use.

    >
    >> I have usually implemented the step-hopping via a small PRNG
    >> (pseudo-random number generator). oddly, I have never heard of anyone
    >> else doing this.

    >
    > Probably because it's not easy to ensure the PRNG has a period that is
    > long enough. The technique is well documented in the literature, this
    > report, for example, gives a good explanation (section 2.2 and section 3)
    > on why the generator (5i mod 4*n)/4 works well with tables that have a
    > number of slots n that is a power of 2:
    > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.95.5606
    >
    > The Hopscotch hashing should be more cache-friendly that using a PRNG,
    > since it's highly probable that elements that hashes to the same number
    > are also in the same cache line.
    >


    well, I never really looked into the theory that much, and most of this was
    actually a result of occasional use of statistical tests and benchmarking...


    > Anyway, seems that I discarded BST again! :)
    >


    ok.


    > R.D.
     
    BGB / cr88192, Jan 7, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ptrSriram

    Help : Merging Binary Search Trees

    ptrSriram, Dec 10, 2005, in forum: C Programming
    Replies:
    3
    Views:
    1,019
    Thad Smith
    Dec 12, 2005
  2. binary search trees

    , Mar 5, 2008, in forum: C++
    Replies:
    2
    Views:
    273
  3. jacob navia

    Binary search trees (AVL trees)

    jacob navia, Jan 3, 2010, in forum: C Programming
    Replies:
    34
    Views:
    1,435
    Dann Corbit
    Jan 8, 2010
  4. Dann Corbit

    Re: Binary search trees or hash tables?

    Dann Corbit, Jan 6, 2010, in forum: C Programming
    Replies:
    0
    Views:
    630
    Dann Corbit
    Jan 6, 2010
  5. Replies:
    0
    Views:
    82
Loading...

Share This Page