accessing perl's internal hashing algorithm

Discussion in 'Perl Misc' started by rob, Sep 27, 2004.

  1. rob

    rob Guest

    Hi, I have a performance intensive script that needs to hash a lot of
    strings. I am guessing that perl's internal hashing algorithm is
    implemented at a low level, is optimized, etc. and I would like to
    access it though my perl scripts. Does anyone know how to do this?

    Thanks,
    Rob
     
    rob, Sep 27, 2004
    #1
    1. Advertising

  2. rob wrote:
    > Hi, I have a performance intensive script that needs to hash a lot of
    > strings. I am guessing that perl's internal hashing algorithm is
    > implemented at a low level, is optimized, etc. and I would like to
    > access it though my perl scripts. Does anyone know how to do this?

    I wouldn't have thought that the algorithm would be suitable for general
    purpose use, but I stand open to correction. The Digest::SHA1 and
    Digest::MD5 modules, for example, are implemented via xs and not native
    perl (though clearly this is not a guarantee of speed: the algorithm can
    suck whatever the implementation), so if raw performance is your
    requirement then you could do worse than starting with these. You could
    always check out the perl source directly if you're stuck on using
    perl's internal algorithm: hv.c?

    Mark
     
    Mark Clements, Sep 27, 2004
    #2
    1. Advertising

  3. rob

    Ala Qumsieh Guest

    rob wrote:

    > Hi, I have a performance intensive script that needs to hash a lot of
    > strings. I am guessing that perl's internal hashing algorithm is
    > implemented at a low level, is optimized, etc. and I would like to
    > access it though my perl scripts. Does anyone know how to do this?


    A lot of the performance hit is probably due to Perl dynamically
    enlarging hashes. If I remember correctly, when you declare a hash, Perl
    allocates a certain size for it. As you starting populating this hash,
    and once the number of keys hashed to the same integer exceeds a certain
    threshold, Perl doubles the size of the hash (implemented as an array in
    C), and re-indexes the keys. This can cause a lot of performace delay in
    the case of a large number of keys to be hashed, especially if this
    needs to be done multiple times.

    The solution is to preallocate a large enough hash, if you think you can
    guess how many keys you will need:

    keys %hash = 5000;

    Perl will round that number to the next power of two.

    All of this is explained in Srinivasan's "Advanced Perl Programming"
    (the Panther).

    --Ala
     
    Ala Qumsieh, Sep 27, 2004
    #3
  4. rob

    Guest

    (rob) wrote:
    > Hi, I have a performance intensive script that needs to hash a lot of
    > strings. I am guessing that perl's internal hashing algorithm is
    > implemented at a low level, is optimized, etc. and I would like to
    > access it though my perl scripts. Does anyone know how to do this?


    I do this by using Perl hash variables :)

    It is not clear to me exactly what you want to do. You want to use
    Perl's hashing algorithm from within a Perl script, but without the
    rest of the Hash-table machinery? I suppose you could copy the hashing
    algorithm out of the Perl source and use it to create your own XS or
    Inline::C code.
    perlguts:

    The hash algorithm is defined in the "PERL_HASH(hash, key, klen)"
    macro:

    hash = 0;
    while (klen--)
    hash = (hash * 33) + *key++;
    hash = hash + (hash >> 5); /* after 5.6 */

    But I don't see what this will get you. Calling this many times from
    within Perl will probably spend more time on function-call overhead than on
    hashing.



    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
     
    , Sep 27, 2004
    #4
  5. On Mon, 27 Sep 2004 18:00:04 GMT, Ala Qumsieh <>
    wrote:

    >The solution is to preallocate a large enough hash, if you think you can
    > guess how many keys you will need:
    >
    > keys %hash = 5000;


    Cool! A good example of how precious bits of information can leak in
    clpmisc... actually it's all there in the docs, as I have promptly
    checked, but, well: would I have done that had I not read this?!?


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
     
    Michele Dondi, Sep 29, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. dpackwood
    Replies:
    3
    Views:
    1,824
  2. Ahmed Moustafa
    Replies:
    0
    Views:
    785
    Ahmed Moustafa
    Nov 15, 2003
  3. Michael Pernkopf
    Replies:
    0
    Views:
    543
    Michael Pernkopf
    May 27, 2004
  4. howa
    Replies:
    8
    Views:
    486
    Roedy Green
    Jun 15, 2007
  5. Arthur Chan
    Replies:
    0
    Views:
    172
    Arthur Chan
    Apr 27, 2009
Loading...

Share This Page