trying to generate integer from string

B

bpatton

I'm trying to generate a unique integer from a string. It must
generate the same integer each time it has the same string. I'm
trying to use unpack to do this.
Here is a small sample. My real version now has @ 2000 strings, but
his is only going up.
my $s1 = '-Dfull_drc=true -Dgds_file=gds.VIA4T -Dgds_layer=VIA4T -
Dstore_layer=VIA4T';
my $s2 = '-Dfull_drc=true -Dgds_file=gds.VIA1T -Dgds_layer=VIA1T -
Dstore_layer=VIA1T';
my ($u1,$u2);
($u1) = unpack("%J*",$s1);
($u2) = unpack("%J*",$s2);
print "u1 = $u1\n";
print "u2 = $u2\n";

If I change the J to an A this example works ok, but hunderds other
one fail.
I'm checking these by creating a perl hash where the $U# is the key
and the string is the value.
So that I check for the existance of a key, if it exists the I compare
the values. if the are equal then it is an error


Here is my actual code : (less genRppPermutations too large) $s1 and
$s2 are examples from genRppPerrmutations.
my @switchList;
my ($rpp,%hash,$key,$string);
foreach $rpp ( qw ( COMBINE GEN_STORE L2G.gatet L2G.met L2G.primary
L2G.umc MASTER_RPP PG_PASS1 PG_PASS2 PG_PASS2.SPLITPOL PG_PASS2.met
PG_PASS3) ) {
@switchList = genRppPermutations($rpp);
foreach $string (@switchList) {
($key) = unpack("%A*",$string);
if (exists $hash{$key}) {
unless ($hash{$key} eq $string) {
print "collision between strings, both generated '$key'\n";
print " s1 : $string\n";
print " s2 : $hash{$key}\n";
}
} else {
$hash{$key} = $string;
}
}
}
 
M

Mirco Wahab

bpatton said:
I'm trying to generate a unique integer from a string. It must
generate the same integer each time it has the same string. I'm
trying to use unpack to do this.
Here is a small sample. My real version now has @ 2000 strings, but
his is only going up.
my $s1 = '-Dfull_drc=true -Dgds_file=gds.VIA4T -Dgds_layer=VIA4T -
Dstore_layer=VIA4T';
my $s2 = '-Dfull_drc=true -Dgds_file=gds.VIA1T -Dgds_layer=VIA1T -
Dstore_layer=VIA1T';
my ($u1,$u2);
($u1) = unpack("%J*",$s1);
($u2) = unpack("%J*",$s2);
print "u1 = $u1\n";
print "u2 = $u2\n";

If I change the J to an A this example works ok, but hunderds other
one fail.
I'm checking these by creating a perl hash where the $U# is the key
and the string is the value.
So that I check for the existance of a key, if it exists the I compare
the values. if the are equal then it is an error

Depending on the length of the string, compute a 10-20 byte 'fingerprint'
of them, for example with the md5 or sha1 algorithm. There are modules for
this purpose, you may use one of the Digest:: Modules
(http://search.cpan.org/~gaas/Digest-1.15/Digest.pm), eg. SHA1
Here is my actual code : (less genRppPermutations too large) $s1 and
$s2 are examples from genRppPerrmutations.

Example:
==>

use strict;
use warnings;
# print 20 byte number , sha1 (40 byte hex code)
use Digest::SHA1 qw(sha1_hex);

my @strings = qw'
COMBINE GEN_STORE L2G.gatet L2G.met L2G.primary L2G.umc MASTER_RPP PG_PASS1
PG_PASS2 PG_PASS2.SPLITPOL PG_PASS2.met PG_PASS3';

my ($rpp, %hash, $key, $string, $collision);

foreach $rpp (@strings) {
foreach $string ( genRppPermutations($rpp) ) {
$key = sha1_hex( $string );
if( exists $hash{$key} ) {
if( $hash{$key} ne $string ) {
print "collision" . ++$collision . "between generated '$key'\n";
print " s1 : $string\n";
print " s2 : $hash{$key}\n"
}
}
else {
$hash{$key} = $string;
print "$key, "
}
}
}
print "all ok!\n" unless $collision;

<==

Regards

M.
 
M

Mirco Wahab

Mirco said:
Depending on the length of the string, compute a 10-20 byte 'fingerprint'
of them, for example with the md5 or sha1 algorithm. There are modules for
this purpose, you may use one of the Digest:: Modules
(http://search.cpan.org/~gaas/Digest-1.15/Digest.pm), eg. SHA1

If you need "normal integers (4 byte)" as keys,
you'd look at the CRC32 algorithm, where a
module is also available. The following would
use "regular" integers as keys:
(only modified parts shown)
==>
...
use Digest::CRC qw'crc32';
...

...
foreach $string ( genRppPermutations($rpp) ) {
my $key = crc32($string);
if( exists $hash{$key} ) {
if( $hash{$key} ne $string ) {
print "collision " . ++$collision . " between generated '$key'\n";
print " s1 : $string\n s2 : $hash{$key}\n";
}
}
else {
$hash{$key} = $string;
printf "0x%08X, ", $key
}
}
...


<==

Regards

M.
 
A

anno4000

bpatton said:
I'm trying to generate a unique integer from a string. It must
generate the same integer each time it has the same string. I'm
trying to use unpack to do this.
Here is a small sample. My real version now has @ 2000 strings, but
his is only going up.

But two thousand is nothing. Just use the strings as hash keys.

If it were two millions or more, using a digest could be meaningful.
If so, use a module that generates a tried-and-(mathematically-)proven
digest instead of am ad-hoc solution.

Anno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top