maximum size of a hash table

L

Lee

I couldn't find a clear answer to this. Seems to be a lot of hearsay.

Does anyone absolutely know the maximum size that a hash table can
achieve?
 
T

Tad McClellan

Lee said:
Does anyone absolutely know the maximum size that a hash table can
achieve?


No, because it depends on the particular data being stored and the
size of the memory on your particular machine.

You can keep adding elements until your machine runs out of memory.
 
J

John Bokma

Tad said:
No, because it depends on the particular data being stored and the
size of the memory on your particular machine.

Isn't there a limit on the number of bits in the hash?
Just curious.
 
T

Tassilo v. Parseval

Also sprach John Bokma:
Isn't there a limit on the number of bits in the hash?
Just curious.

What bits? Perl hashes are not bloom filters so the concepts of bits (or
the number thereof) isn't involved anywhere.

Tassilo
 
C

Chris Mattern

Lee said:
I couldn't find a clear answer to this. Seems to be a lot of hearsay.

Does anyone absolutely know the maximum size that a hash table can
achieve?

No, they don't, because the answer changes depending on your situation.

A hash can grow until it uses all the free memory available to the
perl process, same as any other data structure in Perl.
--
Christopher Mattern

"Which one you figure tracked us?"
"The ugly one, sir."
"...Could you be more specific?"
 
C

Chris Richmond - MD6-FDC ~

You can keep adding elements until your machine runs out of memory.

Maybe this has long been resolved, but we had an app using 5.00404
that would keel over dead if the process size exceeded 1GB on
linux and HPUX.

Our software support guys made a customized version for Solaris
that would get around this limit. We've since re-done the app
to use Mysql for part of the data storage.

Chris
 
X

xhoster

Lee said:
I couldn't find a clear answer to this. Seems to be a lot of hearsay.

Does anyone absolutely know the maximum size that a hash table can
achieve?

The maximum size of a hash table is 7 times bigger than the larget possible
skein of yarn.

Xho
 
J

John Bokma

Tassilo said:
Also sprach John Bokma:

What bits? Perl hashes are not bloom filters so the concepts of bits
(or the number thereof) isn't involved anywhere.

Ages ago, when I wrote my own hash table stuff (C), I used a function to
calculate the hash code of a string. The result was a 32 bit value, and
hence the number of available slots was 2^32. Of course this doesn't
mean that one can store only 2^32 values, since I handled collision by
using lists. But if you put enough values in such a hash, it will get
more and more inefficient. (Moreover, the pointers used in the list must
be able to address the memory).

So, I was just wondering if Perl has a similar "limit" (ie. the result
of "its" hash function is always n bits).
 
T

Tassilo v. Parseval

Also sprach John Bokma:
Ages ago, when I wrote my own hash table stuff (C), I used a function to
calculate the hash code of a string. The result was a 32 bit value, and
hence the number of available slots was 2^32. Of course this doesn't
mean that one can store only 2^32 values, since I handled collision by
using lists. But if you put enough values in such a hash, it will get
more and more inefficient. (Moreover, the pointers used in the list must
be able to address the memory).

Naturally, this is what happens. If you have a mapping between a
potentially infinite set into a confined one, collisions are going to
happen. Perl, too, uses lists for buckets with collisions.
So, I was just wondering if Perl has a similar "limit" (ie. the result
of "its" hash function is always n bits).

That's the case for any hash function. Yet, the number of those bits (as
you put it) has no bearing on the capacity of a hash. You could use a
hash function that only returns 2bit-wide numbers and still store an
amount of items only limited by the available memory in the hash.

Tassilo
 
J

John Bokma

Tassilo said:
Also sprach John Bokma:
Yet, the number of those bits (as
you put it) has no bearing on the capacity of a hash. You could use a
hash function that only returns 2bit-wide numbers and still store an
amount of items only limited by the available memory in the hash.

Yes, but than one ends up with a hash table with O(n) look up time :)
So if the hash function is limited to n bits there is a point that the hash
table acts more like 2^n big lists instead of a hash table.
 
A

Anno Siegel

John Bokma said:
Yes, but than one ends up with a hash table with O(n) look up time :)
So if the hash function is limited to n bits there is a point that the hash
table acts more like 2^n big lists instead of a hash table.

That's no contradiction. A hash of size k (with this kind of collision
management) *is* a way to distribute a long list of length n over k
lists of average length n/k, no more, no less. Keeping the average
length <= 1 is an option, but not the only way to run a hash.

Anno
 
T

Ted Zlatanov

So, I was just wondering if Perl has a similar "limit" (ie. the result
of "its" hash function is always n bits).

In the Perl 5.8.6 hv.h, you can see:

#ifdef PERL_HASH_INTERNAL_ACCESS
#define PERL_HASH_INTERNAL(hash,str,len) \
STMT_START { \
register const char *s_PeRlHaSh_tmp = str; \
register const unsigned char *s_PeRlHaSh = (const unsigned char *)s_PeRlHaSh_tmp; \
register I32 i_PeRlHaSh = len; \
register U32 hash_PeRlHaSh = PL_rehash_seed; \
while (i_PeRlHaSh--) { \
hash_PeRlHaSh += *s_PeRlHaSh++; \
hash_PeRlHaSh += (hash_PeRlHaSh << 10); \
hash_PeRlHaSh ^= (hash_PeRlHaSh >> 6); \
} \
hash_PeRlHaSh += (hash_PeRlHaSh << 3); \
hash_PeRlHaSh ^= (hash_PeRlHaSh >> 11); \
(hash) = (hash_PeRlHaSh + (hash_PeRlHaSh << 15)); \
} STMT_END
#endif

which is used most of the time (there's also a PERL_HASH macro which
is slower, but with the same data types) in hv.c.

This seems, to my still-bleeding eyes, to say that the actual hash is
an unsigned 32-bit value and hence the answer to your question is 32.
If I've misunderstood something, please let me know :) I haven't done
C in ages.

Ted
 
J

John Bokma

Anno said:
That's no contradiction. A hash of size k (with this kind of
collision management) *is* a way to distribute a long list of length n
over k lists of average length n/k, no more, no less. Keeping the
average length <= 1 is an option, but not the only way to run a hash.

I am more than aware of that, (they teach those things in Utrecht, you know
;-) ). I didn't state it was a contradiction, but only that O(n) hash look
up is not what I want out of a hash table.

I am happy with constant avg length of c, and hence O(1) look up.
 
S

Sherm Pendley

John said:
I am more than aware of that, (they teach those things in Utrecht, you
know ;-) ). I didn't state it was a contradiction, but only that O(n) hash
look up is not what I want out of a hash table.

You could use your own hashing function, if Perl's built-in gives you such
poor results. XS code can pass a hash value to hv_fetch_ent() and
hv_store_ent() - normally you'd pass 0 to have Perl calculate it for you
using its built-in, but that's just a convenience, not a requirement.

You could write a couple of fetch/store routines in C, export them to Perl,
and provide a nice %hash wrapper for them with tie(). The rest of your Perl
code would never need to know the difference.

sherm--
 
J

John Bokma

[ snip ]
This seems, to my still-bleeding eyes,

Shouldn't that be StIlLBlEeDiNgEyEs << 3 ? :-D
to say that the actual hash is
an unsigned 32-bit value and hence the answer to your question is 32.

I was guessing that :-D.
If I've misunderstood something, please let me know :) I haven't done
C in ages.

Same here :-D. (I am now even more happy about this).

So to answer the OPs question, you can have at max 2^32 slots, and only
to store the hash value for each slot requires 16 GB ( 32 bit x 2 ^ 32 =
2 ^ 2 x 2 ^32 = 2 ^36 ).

So, until you can buy a few TB at your local supermarket, don't worry
too much :-D.
 
S

Sherm Pendley

Ted said:
This seems, to my still-bleeding eyes, to say that the actual hash is
an unsigned 32-bit value and hence the answer to your question is 32.

Actually, there's a comment towards the top of handy.h that sheds a bit more
light. I32 & U32 are at *least* 32 bits, but on systems with 64-bit ints,
I32 & U32 are also 64 bits.

sherm--
 
S

Sherm Pendley

John said:
So to answer the OPs question, you can have at max 2^32 slots

That's not entirely accurate. On 64-bit architectures, I32 & U32 are 64-bit
ints (see handy.h), so on those architectures you can have 2^64 slots.

Thus, even on a machine with a 64-bit address space, and a perl that's built
to support it, the limiting factor is RAM.

sherm--
 
J

John Bokma

Sherm said:
You could use your own hashing function, if Perl's built-in gives you
such poor results.

The poor result I was talking about was when the 32 bit limit on the hash
code results in O(n) hash look ups because there is a lot of data in the
hash. :-D.
XS code can pass a hash value to hv_fetch_ent() and
hv_store_ent() - normally you'd pass 0 to have Perl calculate it for
you using its built-in, but that's just a convenience, not a
requirement.

You could write a couple of fetch/store routines in C, export them to
Perl, and provide a nice %hash wrapper for them with tie(). The rest
of your Perl code would never need to know the difference.

Thanks. I never had this problem, but it's good to know.
 
A

Anno Siegel

John Bokma said:
I am more than aware of that, (they teach those things in Utrecht, you know

Hey, what's wrong? It's a public discussion. Some readers may appreciate
an explicit argument.
;-) ). I didn't state it was a contradiction, but only that O(n) hash look
up is not what I want out of a hash table.

I am happy with constant avg length of c, and hence O(1) look up.

Fine. My point is that there are useful applications of hash tables
also in the O(n) range.

Anno
 
J

John Bokma

Anno said:
Hey, what's wrong? It's a public discussion. Some readers may
appreciate an explicit argument.

Yup, but it sounded to me a bit like that I had no clue how a hash table
works :-D So apologies if I read it wrong.
Fine. My point is that there are useful applications of hash tables
also in the O(n) range.

Uhm, like? (Ok, memory stress testing is one).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top