Trying to convert small C++ subroutine (by Peter Weinberger) to Perl

Discussion in 'Perl Misc' started by David Filmer, Feb 24, 2009.

  1. David Filmer

    David Filmer Guest

    Greetings.

    In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
    (1989, Prentice Hall, page 14), I find a short hashing subroutine
    ("hash" as in cryptographic hash, not an associative array) by the
    famous Peter Weinberger.

    The subroutine accepts a string, and returns a numerical value. One
    possible application of such a subroutine would be for storing (and
    retrieving) of a vast number of files (many thousands or millions) -
    the files could be stored in an arbitrary number of directories/
    filesystems (ie, "hash buckets") and retrieved based on the hash value
    of their filename.

    I wish to convert this C++ subroutine to a Perl subroutine. But my
    knowledge of C++ is limited to what I learned in a class in college -
    I have never coded in C++ and I am severely deficient in this
    language.

    Based on my limited C++ skills, I have made an effort to do this
    conversion, but it does not produce the expected results. I was
    hoping someone here who was more familiar with C++ could point out the
    error of my ways.

    Here is subroutine from the book, based on Peter Weinberger's code:

    int
    hashpjw( char *s ) {
    const prime = 211;
    unsigned hash = 0, g;

    for( char *p = s; *p, p++ ) {
    hash = ( hash << 4 ) + *p;
    // assumes 32 bit int size
    if( g = hash & 0xf0000000 ) {
    hash ^= g >> 24;
    hsh ^= g;
    }
    }
    return hash % prime;
    }


    Here is my <lame>attempt</lame> to convert it to Perl:

    #!/usr/bin/perl
    use strict;

    print hashpjw( 'bar', 123 );

    sub hashpjw {
    my( $char, $s ) = @_;
    my $prime = 211;
    my ( $hash, $g, $p );
    for my $char( $p = $s, $p, $p++ ) {
    $hash = ( $hash << 4 ) + $p;
    if( $g = $hash & 0xf0000000 ) {
    $hash = $hash ^ ($g >> 24);
    $hash = $hash ^ $g;
    }
    }
    return $hash % $prime;
    }
    __END__

    Any assistance is greatly appreciated!
     
    David Filmer, Feb 24, 2009
    #1
    1. Advertising

  2. David Filmer <> wrote:
    > In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
    > (1989, Prentice Hall, page 14), I find a short hashing subroutine
    > ("hash" as in cryptographic hash, not an associative array) by the
    > famous Peter Weinberger.


    > The subroutine accepts a string, and returns a numerical value. One
    > possible application of such a subroutine would be for storing (and
    > retrieving) of a vast number of files (many thousands or millions) -
    > the files could be stored in an arbitrary number of directories/
    > filesystems (ie, "hash buckets") and retrieved based on the hash value
    > of their filename.


    Mmmm, but you also have to take into account possible "hash
    collisions"...

    > I wish to convert this C++ subroutine to a Perl subroutine. But my
    > knowledge of C++ is limited to what I learned in a class in college -
    > I have never coded in C++ and I am severely deficient in this
    > language.


    > Based on my limited C++ skills, I have made an effort to do this
    > conversion, but it does not produce the expected results. I was
    > hoping someone here who was more familiar with C++ could point out the
    > error of my ways.


    > Here is subroutine from the book, based on Peter Weinberger's code:


    > int
    > hashpjw( char *s ) {
    > const prime = 211;
    > unsigned hash = 0, g;


    > for( char *p = s; *p, p++ ) {
    > hash = ( hash << 4 ) + *p;
    > // assumes 32 bit int size
    > if( g = hash & 0xf0000000 ) {
    > hash ^= g >> 24;
    > hsh ^= g;
    > }
    > }
    > return hash % prime;
    > }


    Your 'prime' is awfully small, with that you never get more
    than 211 different hash values...

    > Here is my <lame>attempt</lame> to convert it to Perl:


    > #!/usr/bin/perl
    > use strict;


    use warnings;

    ;-)

    > print hashpjw( 'bar', 123 );


    Why do you pass it two arguments if the C/C++ version only
    takes a single one, the string of which you want to compute
    the hash value?

    > sub hashpjw {
    > my( $char, $s ) = @_;
    > my $prime = 211;
    > my ( $hash, $g, $p );
    > for my $char( $p = $s, $p, $p++ ) {
    > $hash = ( $hash << 4 ) + $p;
    > if( $g = $hash & 0xf0000000 ) {
    > $hash = $hash ^ ($g >> 24);
    > $hash = $hash ^ $g;
    > }
    > }
    > return $hash % $prime;
    > }


    Try this instead (and pass it just the string)

    sub hashpjw {
    my ( $s, $prime, $hash ) = ( shift, 211, 0 );

    for ( split //, $s ) {
    $hash = ( $hash << 4 ) + ord $_;
    if ( my $g = $hash & 0xf0000000 ) {
    $hash ^= $g >> 24;
    $hash ^= $g;
    }
    }

    return $hash % $prime;
    }

    One warning: the C/C++ version is rather likely meant to be used
    with 8-bit characters. The Perl version as it is will happily
    accept e.g. Unicode characters and then the result will differ
    from the C/C++ version!
    Regards, Jens
    --
    \ Jens Thoms Toerring ___
    \__________________________ http://toerring.de
     
    Jens Thoms Toerring, Feb 24, 2009
    #2
    1. Advertising

  3. David Filmer

    smallpond Guest

    Re: Trying to convert small C++ subroutine (by Peter Weinberger) toPerl

    On Feb 24, 1:36 am, David Filmer <> wrote:

    > #!/usr/bin/perl
    > use strict;
    >
    > print hashpjw( 'bar', 123 );


    Call hashpjw with two args, a string 'bar' and a number 123

    >
    > sub hashpjw {
    > my( $char, $s ) = @_;


    Get two args: $char = 'bar' and $s = 123


    > for my $char( $p = $s, $p, $p++ ) {


    Here you define a new $char hiding the one that has 'bar'.

    Also this loop is almost certainly not doing what you want, unless
    what you want is very strange:

    perl -e 'for $c ($p=123, $p, $p++) { print $c,"\n" }'
    124
    124
    123

    > Any assistance is greatly appreciated!


    Start by treating perl as a different language than C. First off,
    there are no pointers, so trying to do pointer arithmetic is, well,
    pointless.
     
    smallpond, Feb 24, 2009
    #3
  4. David Filmer

    Uri Guttman Guest

    Re: Trying to convert small C++ subroutine (by Peter Weinberger) toPerl

    >>>>> "DF" == David Filmer <> writes:

    hi dave!

    DF> In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
    DF> (1989, Prentice Hall, page 14), I find a short hashing subroutine
    DF> ("hash" as in cryptographic hash, not an associative array) by the
    DF> famous Peter Weinberger.

    well, perl hashes are called that because they use a hash function to
    allow O(1) (or close to it) access!

    DF> The subroutine accepts a string, and returns a numerical value. One
    DF> possible application of such a subroutine would be for storing (and
    DF> retrieving) of a vast number of files (many thousands or millions) -
    DF> the files could be stored in an arbitrary number of directories/
    DF> filesystems (ie, "hash buckets") and retrieved based on the hash value
    DF> of their filename.

    which is also how perl hashes work!

    DF> int
    DF> hashpjw( char *s ) {
    DF> const prime = 211;
    DF> unsigned hash = 0, g;

    DF> for( char *p = s; *p, p++ ) {
    DF> hash = ( hash << 4 ) + *p;
    DF> // assumes 32 bit int size
    DF> if( g = hash & 0xf0000000 ) {
    DF> hash ^= g >> 24;
    DF> hsh ^= g;
    DF> }
    DF> }
    DF> return hash % prime;
    DF> }


    DF> print hashpjw( 'bar', 123 );

    you pass a string to $char but a number to $s

    DF> sub hashpjw {
    DF> my( $char, $s ) = @_;

    the c++ code declares only a single char pointer arg which is similar to
    a perl string. and as i pointed out above, you pass in two args.

    DF> my $prime = 211;
    DF> my ( $hash, $g, $p );
    DF> for my $char( $p = $s, $p, $p++ ) {

    that redeclares $char so the string arg you pass in is never used. also
    this loop just scans the string to be hashed. there are many easy idioms
    for this. this s about the simplest but may be slow on very long
    strings:

    foreach my $char ( split //, $s ) {

    this works better on long strings:

    while( my ($char) = $s =~ /(.)/sg ) {


    DF> $hash = ( $hash << 4 ) + $p;

    you never set $p to anything. it needs to be the next char. so change $p
    to $char. *p is accessing the next char in the c++ string.

    DF> if( $g = $hash & 0xf0000000 ) {
    DF> $hash = $hash ^ ($g >> 24);
    DF> $hash = $hash ^ $g;
    DF> }

    that looks ok but i can't tell without testing.

    DF> }
    DF> return $hash % $prime;

    same here.

    well, you never scan the input string which is never properly passed
    in. fix those bugs and see what happens. also test this out with a
    single char string so you can debug all the steps more easily.

    uri

    --
    Uri Guttman ------ -------- http://www.sysarch.com --
    ----- Perl Code Review , Architecture, Development, Training, Support ------
    --------- Free Perl Training --- http://perlhunter.com/college.html ---------
    --------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
     
    Uri Guttman, Feb 24, 2009
    #4
  5. David Filmer

    Guest

    On 24 Feb 2009 09:11:20 GMT, (Jens Thoms Toerring) wrote:

    >David Filmer <> wrote:
    >> In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
    >> (1989, Prentice Hall, page 14), I find a short hashing subroutine

    [snip]

    >Try this instead (and pass it just the string)
    >
    >sub hashpjw {
    > my ( $s, $prime, $hash ) = ( shift, 211, 0 );
    >
    > for ( split //, $s ) {
    > $hash = ( $hash << 4 ) + ord $_;
    > if ( my $g = $hash & 0xf0000000 ) {
    > $hash ^= $g >> 24;
    > $hash ^= $g;
    > }
    > }
    >
    > return $hash % $prime;
    >}
    >


    The oputput for this is the same as the C function, for
    'bar' its 168. However the C function as typed in won't compile
    without some fixes.

    If there were ever a need for speed, the last two $hash asignments
    can be combined into a single asignment.

    $hash = ($hash ^ $g >> 24) ^ $g;

    -sln
     
    , Feb 24, 2009
    #5
  6. David Filmer

    Guest

    On Mon, 23 Feb 2009 22:36:22 -0800 (PST), David Filmer <> wrote:

    >Greetings.
    >
    >In the book, "Programming in C++" by Stephen Dewhurst and Kathy Stark
    >(1989, Prentice Hall, page 14), I find a short hashing subroutine
    >("hash" as in cryptographic hash, not an associative array) by the
    >famous Peter Weinberger.
    >
    >The subroutine accepts a string, and returns a numerical value. One
    >possible application of such a subroutine would be for storing (and
    >retrieving) of a vast number of files (many thousands or millions) -
    >the files could be stored in an arbitrary number of directories/
    >filesystems (ie, "hash buckets") and retrieved based on the hash value
    >of their filename.
    >
    >I wish to convert this C++ subroutine to a Perl subroutine. But my
    >knowledge of C++ is limited to what I learned in a class in college -
    >I have never coded in C++ and I am severely deficient in this
    >language.
    >
    >Based on my limited C++ skills, I have made an effort to do this
    >conversion, but it does not produce the expected results. I was
    >hoping someone here who was more familiar with C++ could point out the
    >error of my ways.
    >
    >Here is subroutine from the book, based on Peter Weinberger's code:
    >
    >int
    >hashpjw( char *s ) {
    > const prime = 211;
    > unsigned hash = 0, g;
    >
    > for( char *p = s; *p, p++ ) {
    > hash = ( hash << 4 ) + *p;
    > // assumes 32 bit int size
    > if( g = hash & 0xf0000000 ) {
    > hash ^= g >> 24;
    > hsh ^= g;
    > }
    > }
    > return hash % prime;
    >}
    >

    [snip]

    Jen has a nice Perl solution.
    But, the C code won't compile under C++, probably some typo's.

    C++ does not support default-int
    There is no default type for const.
    The for loop has a ',' where it should have a ';'.

    Fixed up and speedier version:

    int
    hashpjw( char *s ) {
    const int prime = 211;
    unsigned hash = 0, g;

    for( char *p = s; *p; p++ ) {
    hash = ( hash << 4 ) + *p;
    // assumes 32 bit int size
    if( g = hash & 0xf0000000 )
    hash = (hash ^ g >> 24) ^ g;
    }
    return hash % prime;
    }

    -sln
     
    , Feb 24, 2009
    #6
  7. David Filmer

    Ted Zlatanov Guest

    On 24 Feb 2009 09:11:20 GMT (Jens Thoms Toerring) wrote:

    JTT> for ( split //, $s ) {
    JTT> $hash = ( $hash << 4 ) + ord $_;
    ....
    JTT> The Perl version as it is will happily accept e.g. Unicode
    JTT> characters and then the result will differ from the C/C++ version!

    An easy way to fix that is to replace the lines with

    for ( unpack "C*", $s) {
    $hash = ( $hash << 4 ) + $_;

    which will probably also be faster, a nice property for a hashing
    function (unpack over split, and no ord() call).

    Ted
     
    Ted Zlatanov, Feb 25, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Davisro
    Replies:
    3
    Views:
    405
    bruce barker
    Apr 29, 2004
  2. Davisro
    Replies:
    1
    Views:
    389
    George Ter-Saakov
    May 3, 2004
  3. Roedy Green

    Peter van der Linden

    Roedy Green, May 7, 2004, in forum: Java
    Replies:
    7
    Views:
    537
    marcus
    May 10, 2004
  4. Justus Ohlhaver
    Replies:
    22
    Views:
    270
    Lloyd Linklater
    Nov 14, 2008
  5. king
    Replies:
    5
    Views:
    186
Loading...

Share This Page