maximum size of a hash table

Discussion in 'Perl Misc' started by Lee, Feb 24, 2005.

  1. Lee

    Lee Guest

    I couldn't find a clear answer to this. Seems to be a lot of hearsay.

    Does anyone absolutely know the maximum size that a hash table can
    Lee, Feb 24, 2005
    1. Advertisements

  2. No, because it depends on the particular data being stored and the
    size of the memory on your particular machine.

    You can keep adding elements until your machine runs out of memory.
    Tad McClellan, Feb 24, 2005
    1. Advertisements

  3. Lee

    John Bokma Guest

    Isn't there a limit on the number of bits in the hash?
    Just curious.
    John Bokma, Feb 24, 2005
  4. Also sprach John Bokma:
    What bits? Perl hashes are not bloom filters so the concepts of bits (or
    the number thereof) isn't involved anywhere.

    Tassilo v. Parseval, Feb 24, 2005
  5. No, they don't, because the answer changes depending on your situation.

    A hash can grow until it uses all the free memory available to the
    perl process, same as any other data structure in Perl.
    Christopher Mattern

    "Which one you figure tracked us?"
    "The ugly one, sir."
    "...Could you be more specific?"
    Chris Mattern, Feb 24, 2005
  6. Maybe this has long been resolved, but we had an app using 5.00404
    that would keel over dead if the process size exceeded 1GB on
    linux and HPUX.

    Our software support guys made a customized version for Solaris
    that would get around this limit. We've since re-done the app
    to use Mysql for part of the data storage.

    Chris Richmond - MD6-FDC ~, Feb 24, 2005
  7. Lee

    xhoster Guest

    The maximum size of a hash table is 7 times bigger than the larget possible
    skein of yarn.

    xhoster, Feb 24, 2005
  8. Lee

    John Bokma Guest

    Ages ago, when I wrote my own hash table stuff (C), I used a function to
    calculate the hash code of a string. The result was a 32 bit value, and
    hence the number of available slots was 2^32. Of course this doesn't
    mean that one can store only 2^32 values, since I handled collision by
    using lists. But if you put enough values in such a hash, it will get
    more and more inefficient. (Moreover, the pointers used in the list must
    be able to address the memory).

    So, I was just wondering if Perl has a similar "limit" (ie. the result
    of "its" hash function is always n bits).
    John Bokma, Feb 24, 2005
  9. Also sprach John Bokma:
    Naturally, this is what happens. If you have a mapping between a
    potentially infinite set into a confined one, collisions are going to
    happen. Perl, too, uses lists for buckets with collisions.
    That's the case for any hash function. Yet, the number of those bits (as
    you put it) has no bearing on the capacity of a hash. You could use a
    hash function that only returns 2bit-wide numbers and still store an
    amount of items only limited by the available memory in the hash.

    Tassilo v. Parseval, Feb 24, 2005
  10. Lee

    John Bokma Guest

    Yes, but than one ends up with a hash table with O(n) look up time :)
    So if the hash function is limited to n bits there is a point that the hash
    table acts more like 2^n big lists instead of a hash table.
    John Bokma, Feb 24, 2005
  11. Lee

    Anno Siegel Guest

    That's no contradiction. A hash of size k (with this kind of collision
    management) *is* a way to distribute a long list of length n over k
    lists of average length n/k, no more, no less. Keeping the average
    length <= 1 is an option, but not the only way to run a hash.

    Anno Siegel, Feb 25, 2005
  12. Lee

    Ted Zlatanov Guest

    In the Perl 5.8.6 hv.h, you can see:

    #define PERL_HASH_INTERNAL(hash,str,len) \
    STMT_START { \
    register const char *s_PeRlHaSh_tmp = str; \
    register const unsigned char *s_PeRlHaSh = (const unsigned char *)s_PeRlHaSh_tmp; \
    register I32 i_PeRlHaSh = len; \
    register U32 hash_PeRlHaSh = PL_rehash_seed; \
    while (i_PeRlHaSh--) { \
    hash_PeRlHaSh += *s_PeRlHaSh++; \
    hash_PeRlHaSh += (hash_PeRlHaSh << 10); \
    hash_PeRlHaSh ^= (hash_PeRlHaSh >> 6); \
    } \
    hash_PeRlHaSh += (hash_PeRlHaSh << 3); \
    hash_PeRlHaSh ^= (hash_PeRlHaSh >> 11); \
    (hash) = (hash_PeRlHaSh + (hash_PeRlHaSh << 15)); \
    } STMT_END

    which is used most of the time (there's also a PERL_HASH macro which
    is slower, but with the same data types) in hv.c.

    This seems, to my still-bleeding eyes, to say that the actual hash is
    an unsigned 32-bit value and hence the answer to your question is 32.
    If I've misunderstood something, please let me know :) I haven't done
    C in ages.

    Ted Zlatanov, Feb 25, 2005
  13. Lee

    John Bokma Guest

    I am more than aware of that, (they teach those things in Utrecht, you know
    ;-) ). I didn't state it was a contradiction, but only that O(n) hash look
    up is not what I want out of a hash table.

    I am happy with constant avg length of c, and hence O(1) look up.
    John Bokma, Feb 25, 2005
  14. You could use your own hashing function, if Perl's built-in gives you such
    poor results. XS code can pass a hash value to hv_fetch_ent() and
    hv_store_ent() - normally you'd pass 0 to have Perl calculate it for you
    using its built-in, but that's just a convenience, not a requirement.

    You could write a couple of fetch/store routines in C, export them to Perl,
    and provide a nice %hash wrapper for them with tie(). The rest of your Perl
    code would never need to know the difference.

    Sherm Pendley, Feb 25, 2005
  15. Lee

    John Bokma Guest

    [ snip ]
    Shouldn't that be StIlLBlEeDiNgEyEs << 3 ? :-D
    I was guessing that :-D.
    Same here :-D. (I am now even more happy about this).

    So to answer the OPs question, you can have at max 2^32 slots, and only
    to store the hash value for each slot requires 16 GB ( 32 bit x 2 ^ 32 =
    2 ^ 2 x 2 ^32 = 2 ^36 ).

    So, until you can buy a few TB at your local supermarket, don't worry
    too much :-D.
    John Bokma, Feb 25, 2005
  16. Actually, there's a comment towards the top of handy.h that sheds a bit more
    light. I32 & U32 are at *least* 32 bits, but on systems with 64-bit ints,
    I32 & U32 are also 64 bits.

    Sherm Pendley, Feb 25, 2005
  17. That's not entirely accurate. On 64-bit architectures, I32 & U32 are 64-bit
    ints (see handy.h), so on those architectures you can have 2^64 slots.

    Thus, even on a machine with a 64-bit address space, and a perl that's built
    to support it, the limiting factor is RAM.

    Sherm Pendley, Feb 25, 2005
  18. Lee

    John Bokma Guest

    The poor result I was talking about was when the 32 bit limit on the hash
    code results in O(n) hash look ups because there is a lot of data in the
    hash. :-D.
    Thanks. I never had this problem, but it's good to know.
    John Bokma, Feb 25, 2005
  19. Lee

    Anno Siegel Guest

    Hey, what's wrong? It's a public discussion. Some readers may appreciate
    an explicit argument.
    Fine. My point is that there are useful applications of hash tables
    also in the O(n) range.

    Anno Siegel, Feb 26, 2005
  20. Lee

    John Bokma Guest

    Yup, but it sounded to me a bit like that I had no clue how a hash table
    works :-D So apologies if I read it wrong.
    Uhm, like? (Ok, memory stress testing is one).
    John Bokma, Feb 26, 2005
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.