maximum size of a hash table

Discussion in 'Perl Misc' started by Lee, Feb 24, 2005.

  1. Lee

    Lee Guest

    I couldn't find a clear answer to this. Seems to be a lot of hearsay.

    Does anyone absolutely know the maximum size that a hash table can
    achieve?
     
    Lee, Feb 24, 2005
    #1
    1. Advertising

  2. Lee <> wrote:

    > Does anyone absolutely know the maximum size that a hash table can
    > achieve?



    No, because it depends on the particular data being stored and the
    size of the memory on your particular machine.

    You can keep adding elements until your machine runs out of memory.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
     
    Tad McClellan, Feb 24, 2005
    #2
    1. Advertising

  3. Lee

    John Bokma Guest

    Tad McClellan wrote:

    > Lee <> wrote:
    >
    >> Does anyone absolutely know the maximum size that a hash table can
    >> achieve?

    >
    > No, because it depends on the particular data being stored and the
    > size of the memory on your particular machine.


    Isn't there a limit on the number of bits in the hash?
    Just curious.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Feb 24, 2005
    #3
  4. Also sprach John Bokma:
    > Tad McClellan wrote:
    >
    >> Lee <> wrote:
    >>
    >>> Does anyone absolutely know the maximum size that a hash table can
    >>> achieve?

    >>
    >> No, because it depends on the particular data being stored and the
    >> size of the memory on your particular machine.

    >
    > Isn't there a limit on the number of bits in the hash?
    > Just curious.


    What bits? Perl hashes are not bloom filters so the concepts of bits (or
    the number thereof) isn't involved anywhere.

    Tassilo
    --
    use bigint;
    $n=71423350343770280161397026330337371139054411854220053437565440;
    $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    Tassilo v. Parseval, Feb 24, 2005
    #4
  5. Lee wrote:

    > I couldn't find a clear answer to this. Seems to be a lot of hearsay.
    >
    > Does anyone absolutely know the maximum size that a hash table can
    > achieve?


    No, they don't, because the answer changes depending on your situation.

    A hash can grow until it uses all the free memory available to the
    perl process, same as any other data structure in Perl.
    --
    Christopher Mattern

    "Which one you figure tracked us?"
    "The ugly one, sir."
    "...Could you be more specific?"
     
    Chris Mattern, Feb 24, 2005
    #5
  6. In article <>,
    Tad McClellan <> writes:
    >You can keep adding elements until your machine runs out of memory.


    Maybe this has long been resolved, but we had an app using 5.00404
    that would keel over dead if the process size exceeded 1GB on
    linux and HPUX.

    Our software support guys made a customized version for Solaris
    that would get around this limit. We've since re-done the app
    to use Mysql for part of the data storage.

    Chris

    --
    Chris Richmond | I don't speak for Intel & vise versa
     
    Chris Richmond - MD6-FDC ~, Feb 24, 2005
    #6
  7. Lee

    Guest

    "Lee" <> wrote:
    > I couldn't find a clear answer to this. Seems to be a lot of hearsay.
    >
    > Does anyone absolutely know the maximum size that a hash table can
    > achieve?


    The maximum size of a hash table is 7 times bigger than the larget possible
    skein of yarn.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Feb 24, 2005
    #7
  8. Lee

    John Bokma Guest

    Tassilo v. Parseval wrote:

    > Also sprach John Bokma:
    >> Tad McClellan wrote:
    >>
    >>> Lee <> wrote:
    >>>
    >>>> Does anyone absolutely know the maximum size that a hash table can
    >>>> achieve?
    >>>
    >>> No, because it depends on the particular data being stored and the
    >>> size of the memory on your particular machine.

    >>
    >> Isn't there a limit on the number of bits in the hash?
    >> Just curious.

    >
    > What bits? Perl hashes are not bloom filters so the concepts of bits
    > (or the number thereof) isn't involved anywhere.


    Ages ago, when I wrote my own hash table stuff (C), I used a function to
    calculate the hash code of a string. The result was a 32 bit value, and
    hence the number of available slots was 2^32. Of course this doesn't
    mean that one can store only 2^32 values, since I handled collision by
    using lists. But if you put enough values in such a hash, it will get
    more and more inefficient. (Moreover, the pointers used in the list must
    be able to address the memory).

    So, I was just wondering if Perl has a similar "limit" (ie. the result
    of "its" hash function is always n bits).

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Feb 24, 2005
    #8
  9. Also sprach John Bokma:

    > Tassilo v. Parseval wrote:
    >
    >> Also sprach John Bokma:


    >>> Isn't there a limit on the number of bits in the hash?
    >>> Just curious.

    >>
    >> What bits? Perl hashes are not bloom filters so the concepts of bits
    >> (or the number thereof) isn't involved anywhere.

    >
    > Ages ago, when I wrote my own hash table stuff (C), I used a function to
    > calculate the hash code of a string. The result was a 32 bit value, and
    > hence the number of available slots was 2^32. Of course this doesn't
    > mean that one can store only 2^32 values, since I handled collision by
    > using lists. But if you put enough values in such a hash, it will get
    > more and more inefficient. (Moreover, the pointers used in the list must
    > be able to address the memory).


    Naturally, this is what happens. If you have a mapping between a
    potentially infinite set into a confined one, collisions are going to
    happen. Perl, too, uses lists for buckets with collisions.

    > So, I was just wondering if Perl has a similar "limit" (ie. the result
    > of "its" hash function is always n bits).


    That's the case for any hash function. Yet, the number of those bits (as
    you put it) has no bearing on the capacity of a hash. You could use a
    hash function that only returns 2bit-wide numbers and still store an
    amount of items only limited by the available memory in the hash.

    Tassilo
    --
    use bigint;
    $n=71423350343770280161397026330337371139054411854220053437565440;
    $m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);
     
    Tassilo v. Parseval, Feb 24, 2005
    #9
  10. Lee

    John Bokma Guest

    Tassilo v. Parseval wrote:

    > Also sprach John Bokma:


    > Yet, the number of those bits (as
    > you put it) has no bearing on the capacity of a hash. You could use a
    > hash function that only returns 2bit-wide numbers and still store an
    > amount of items only limited by the available memory in the hash.


    Yes, but than one ends up with a hash table with O(n) look up time :)
    So if the hash function is limited to n bits there is a point that the hash
    table acts more like 2^n big lists instead of a hash table.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Feb 24, 2005
    #10
  11. Lee

    Anno Siegel Guest

    John Bokma <> wrote in comp.lang.perl.misc:
    > Tassilo v. Parseval wrote:
    >
    > > Also sprach John Bokma:

    >
    > > Yet, the number of those bits (as
    > > you put it) has no bearing on the capacity of a hash. You could use a
    > > hash function that only returns 2bit-wide numbers and still store an
    > > amount of items only limited by the available memory in the hash.

    >
    > Yes, but than one ends up with a hash table with O(n) look up time :)
    > So if the hash function is limited to n bits there is a point that the hash
    > table acts more like 2^n big lists instead of a hash table.


    That's no contradiction. A hash of size k (with this kind of collision
    management) *is* a way to distribute a long list of length n over k
    lists of average length n/k, no more, no less. Keeping the average
    length <= 1 is an option, but not the only way to run a hash.

    Anno
     
    Anno Siegel, Feb 25, 2005
    #11
  12. Lee

    Ted Zlatanov Guest

    On 24 Feb 2005, wrote:

    > So, I was just wondering if Perl has a similar "limit" (ie. the result
    > of "its" hash function is always n bits).


    In the Perl 5.8.6 hv.h, you can see:

    #ifdef PERL_HASH_INTERNAL_ACCESS
    #define PERL_HASH_INTERNAL(hash,str,len) \
    STMT_START { \
    register const char *s_PeRlHaSh_tmp = str; \
    register const unsigned char *s_PeRlHaSh = (const unsigned char *)s_PeRlHaSh_tmp; \
    register I32 i_PeRlHaSh = len; \
    register U32 hash_PeRlHaSh = PL_rehash_seed; \
    while (i_PeRlHaSh--) { \
    hash_PeRlHaSh += *s_PeRlHaSh++; \
    hash_PeRlHaSh += (hash_PeRlHaSh << 10); \
    hash_PeRlHaSh ^= (hash_PeRlHaSh >> 6); \
    } \
    hash_PeRlHaSh += (hash_PeRlHaSh << 3); \
    hash_PeRlHaSh ^= (hash_PeRlHaSh >> 11); \
    (hash) = (hash_PeRlHaSh + (hash_PeRlHaSh << 15)); \
    } STMT_END
    #endif

    which is used most of the time (there's also a PERL_HASH macro which
    is slower, but with the same data types) in hv.c.

    This seems, to my still-bleeding eyes, to say that the actual hash is
    an unsigned 32-bit value and hence the answer to your question is 32.
    If I've misunderstood something, please let me know :) I haven't done
    C in ages.

    Ted
     
    Ted Zlatanov, Feb 25, 2005
    #12
  13. Lee

    John Bokma Guest

    Anno Siegel wrote:

    > John Bokma <> wrote in comp.lang.perl.misc:
    >> Tassilo v. Parseval wrote:
    >>
    >> > Also sprach John Bokma:

    >>
    >> > Yet, the number of those bits (as
    >> > you put it) has no bearing on the capacity of a hash. You could use
    >> > a hash function that only returns 2bit-wide numbers and still store
    >> > an amount of items only limited by the available memory in the
    >> > hash.

    >>
    >> Yes, but than one ends up with a hash table with O(n) look up time
    >> :) So if the hash function is limited to n bits there is a point
    >> that the hash table acts more like 2^n big lists instead of a hash
    >> table.

    >
    > That's no contradiction. A hash of size k (with this kind of
    > collision management) *is* a way to distribute a long list of length n
    > over k lists of average length n/k, no more, no less. Keeping the
    > average length <= 1 is an option, but not the only way to run a hash.


    I am more than aware of that, (they teach those things in Utrecht, you know
    ;-) ). I didn't state it was a contradiction, but only that O(n) hash look
    up is not what I want out of a hash table.

    I am happy with constant avg length of c, and hence O(1) look up.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Feb 25, 2005
    #13
  14. John Bokma wrote:

    > I am more than aware of that, (they teach those things in Utrecht, you
    > know ;-) ). I didn't state it was a contradiction, but only that O(n) hash
    > look up is not what I want out of a hash table.


    You could use your own hashing function, if Perl's built-in gives you such
    poor results. XS code can pass a hash value to hv_fetch_ent() and
    hv_store_ent() - normally you'd pass 0 to have Perl calculate it for you
    using its built-in, but that's just a convenience, not a requirement.

    You could write a couple of fetch/store routines in C, export them to Perl,
    and provide a nice %hash wrapper for them with tie(). The rest of your Perl
    code would never need to know the difference.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Feb 25, 2005
    #14
  15. Lee

    John Bokma Guest

    Ted Zlatanov wrote:

    > On 24 Feb 2005, wrote:
    >
    >> So, I was just wondering if Perl has a similar "limit" (ie. the
    >> result of "its" hash function is always n bits).


    [ snip ]

    > This seems, to my still-bleeding eyes,


    Shouldn't that be StIlLBlEeDiNgEyEs << 3 ? :-D

    > to say that the actual hash is
    > an unsigned 32-bit value and hence the answer to your question is 32.


    I was guessing that :-D.

    > If I've misunderstood something, please let me know :) I haven't done
    > C in ages.


    Same here :-D. (I am now even more happy about this).

    So to answer the OPs question, you can have at max 2^32 slots, and only
    to store the hash value for each slot requires 16 GB ( 32 bit x 2 ^ 32 =
    2 ^ 2 x 2 ^32 = 2 ^36 ).

    So, until you can buy a few TB at your local supermarket, don't worry
    too much :-D.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Feb 25, 2005
    #15
  16. Ted Zlatanov wrote:

    > This seems, to my still-bleeding eyes, to say that the actual hash is
    > an unsigned 32-bit value and hence the answer to your question is 32.


    Actually, there's a comment towards the top of handy.h that sheds a bit more
    light. I32 & U32 are at *least* 32 bits, but on systems with 64-bit ints,
    I32 & U32 are also 64 bits.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Feb 25, 2005
    #16
  17. John Bokma wrote:

    > So to answer the OPs question, you can have at max 2^32 slots


    That's not entirely accurate. On 64-bit architectures, I32 & U32 are 64-bit
    ints (see handy.h), so on those architectures you can have 2^64 slots.

    Thus, even on a machine with a 64-bit address space, and a perl that's built
    to support it, the limiting factor is RAM.

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
     
    Sherm Pendley, Feb 25, 2005
    #17
  18. Lee

    John Bokma Guest

    Sherm Pendley wrote:

    > John Bokma wrote:
    >
    >> I am more than aware of that, (they teach those things in Utrecht,
    >> you know ;-) ). I didn't state it was a contradiction, but only that
    >> O(n) hash look up is not what I want out of a hash table.

    >
    > You could use your own hashing function, if Perl's built-in gives you
    > such poor results.


    The poor result I was talking about was when the 32 bit limit on the hash
    code results in O(n) hash look ups because there is a lot of data in the
    hash. :-D.

    > XS code can pass a hash value to hv_fetch_ent() and
    > hv_store_ent() - normally you'd pass 0 to have Perl calculate it for
    > you using its built-in, but that's just a convenience, not a
    > requirement.
    >
    > You could write a couple of fetch/store routines in C, export them to
    > Perl, and provide a nice %hash wrapper for them with tie(). The rest
    > of your Perl code would never need to know the difference.


    Thanks. I never had this problem, but it's good to know.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Feb 25, 2005
    #18
  19. Lee

    Anno Siegel Guest

    John Bokma <> wrote in comp.lang.perl.misc:
    > Anno Siegel wrote:
    >
    > > John Bokma <> wrote in comp.lang.perl.misc:
    > >> Tassilo v. Parseval wrote:
    > >>
    > >> > Also sprach John Bokma:
    > >>
    > >> > Yet, the number of those bits (as
    > >> > you put it) has no bearing on the capacity of a hash. You could use
    > >> > a hash function that only returns 2bit-wide numbers and still store
    > >> > an amount of items only limited by the available memory in the
    > >> > hash.
    > >>
    > >> Yes, but than one ends up with a hash table with O(n) look up time
    > >> :) So if the hash function is limited to n bits there is a point
    > >> that the hash table acts more like 2^n big lists instead of a hash
    > >> table.

    > >
    > > That's no contradiction. A hash of size k (with this kind of
    > > collision management) *is* a way to distribute a long list of length n
    > > over k lists of average length n/k, no more, no less. Keeping the
    > > average length <= 1 is an option, but not the only way to run a hash.

    >
    > I am more than aware of that, (they teach those things in Utrecht, you know


    Hey, what's wrong? It's a public discussion. Some readers may appreciate
    an explicit argument.

    > ;-) ). I didn't state it was a contradiction, but only that O(n) hash look
    > up is not what I want out of a hash table.
    >
    > I am happy with constant avg length of c, and hence O(1) look up.


    Fine. My point is that there are useful applications of hash tables
    also in the O(n) range.

    Anno
     
    Anno Siegel, Feb 26, 2005
    #19
  20. Lee

    John Bokma Guest

    Anno Siegel wrote:

    > John Bokma <> wrote in comp.lang.perl.misc:
    >> Anno Siegel wrote:
    >>
    >> > John Bokma <> wrote in
    >> > comp.lang.perl.misc:
    >> >> Tassilo v. Parseval wrote:
    >> >>
    >> >> > Also sprach John Bokma:
    >> >>
    >> >> > Yet, the number of those bits (as
    >> >> > you put it) has no bearing on the capacity of a hash. You could
    >> >> > use a hash function that only returns 2bit-wide numbers and
    >> >> > still store an amount of items only limited by the available
    >> >> > memory in the hash.
    >> >>
    >> >> Yes, but than one ends up with a hash table with O(n) look up time
    >> >> :) So if the hash function is limited to n bits there is a point
    >> >> that the hash table acts more like 2^n big lists instead of a hash
    >> >> table.
    >> >
    >> > That's no contradiction. A hash of size k (with this kind of
    >> > collision management) *is* a way to distribute a long list of
    >> > length n over k lists of average length n/k, no more, no less.
    >> > Keeping the average length <= 1 is an option, but not the only way
    >> > to run a hash.

    >>
    >> I am more than aware of that, (they teach those things in Utrecht,
    >> you know

    >
    > Hey, what's wrong? It's a public discussion. Some readers may
    > appreciate an explicit argument.


    Yup, but it sounded to me a bit like that I had no clue how a hash table
    works :-D So apologies if I read it wrong.

    >> ;-) ). I didn't state it was a contradiction, but only that O(n) hash
    >> look up is not what I want out of a hash table.
    >>
    >> I am happy with constant avg length of c, and hence O(1) look up.

    >
    > Fine. My point is that there are useful applications of hash tables
    > also in the O(n) range.


    Uhm, like? (Ok, memory stress testing is one).

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
     
    John Bokma, Feb 26, 2005
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Pieter Claassen
    Replies:
    1
    Views:
    1,115
    CBFalconer
    Aug 4, 2004
  2. rp
    Replies:
    1
    Views:
    537
    red floyd
    Nov 10, 2011
  3. Bart Braem

    Maximum value of hash

    Bart Braem, Sep 12, 2006, in forum: Ruby
    Replies:
    9
    Views:
    142
    Bart Braem
    Sep 13, 2006
  4. Srijayanth Sridhar
    Replies:
    19
    Views:
    625
    David A. Black
    Jul 2, 2008
  5. phanhuyich
    Replies:
    4
    Views:
    277
Loading...

Share This Page