accessing perl's internal hashing algorithm

R

rob

Hi, I have a performance intensive script that needs to hash a lot of
strings. I am guessing that perl's internal hashing algorithm is
implemented at a low level, is optimized, etc. and I would like to
access it though my perl scripts. Does anyone know how to do this?

Thanks,
Rob
 
M

Mark Clements

rob said:
Hi, I have a performance intensive script that needs to hash a lot of
strings. I am guessing that perl's internal hashing algorithm is
implemented at a low level, is optimized, etc. and I would like to
access it though my perl scripts. Does anyone know how to do this?
I wouldn't have thought that the algorithm would be suitable for general
purpose use, but I stand open to correction. The Digest::SHA1 and
Digest::MD5 modules, for example, are implemented via xs and not native
perl (though clearly this is not a guarantee of speed: the algorithm can
suck whatever the implementation), so if raw performance is your
requirement then you could do worse than starting with these. You could
always check out the perl source directly if you're stuck on using
perl's internal algorithm: hv.c?

Mark
 
A

Ala Qumsieh

rob said:
Hi, I have a performance intensive script that needs to hash a lot of
strings. I am guessing that perl's internal hashing algorithm is
implemented at a low level, is optimized, etc. and I would like to
access it though my perl scripts. Does anyone know how to do this?

A lot of the performance hit is probably due to Perl dynamically
enlarging hashes. If I remember correctly, when you declare a hash, Perl
allocates a certain size for it. As you starting populating this hash,
and once the number of keys hashed to the same integer exceeds a certain
threshold, Perl doubles the size of the hash (implemented as an array in
C), and re-indexes the keys. This can cause a lot of performace delay in
the case of a large number of keys to be hashed, especially if this
needs to be done multiple times.

The solution is to preallocate a large enough hash, if you think you can
guess how many keys you will need:

keys %hash = 5000;

Perl will round that number to the next power of two.

All of this is explained in Srinivasan's "Advanced Perl Programming"
(the Panther).

--Ala
 
C

ctcgag

Hi, I have a performance intensive script that needs to hash a lot of
strings. I am guessing that perl's internal hashing algorithm is
implemented at a low level, is optimized, etc. and I would like to
access it though my perl scripts. Does anyone know how to do this?

I do this by using Perl hash variables :)

It is not clear to me exactly what you want to do. You want to use
Perl's hashing algorithm from within a Perl script, but without the
rest of the Hash-table machinery? I suppose you could copy the hashing
algorithm out of the Perl source and use it to create your own XS or
Inline::C code.
perlguts:

The hash algorithm is defined in the "PERL_HASH(hash, key, klen)"
macro:

hash = 0;
while (klen--)
hash = (hash * 33) + *key++;
hash = hash + (hash >> 5); /* after 5.6 */

But I don't see what this will get you. Calling this many times from
within Perl will probably spend more time on function-call overhead than on
hashing.



Xho
 
M

Michele Dondi

The solution is to preallocate a large enough hash, if you think you can
guess how many keys you will need:

keys %hash = 5000;

Cool! A good example of how precious bits of information can leak in
clpmisc... actually it's all there in the docs, as I have promptly
checked, but, well: would I have done that had I not read this?!?


Michele
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top