hash two keys to one index

M

Mark

Hi,

I need to figure out how to map two different keys (a string and an
integer) to the same index. I don't even know where to start... but I
don't want two copies of the the hash table.

Any ideas?
 
E

Eric Sosman

Mark said:
Hi,

I need to figure out how to map two different keys (a string and an
integer) to the same index. I don't even know where to start... but I
don't want two copies of the the hash table.

Any ideas?

My only "idea" is that you should take a few steps back
from the immediate tactical issue and explain your overall
strategic goal. *Why* do you want "ABC" and new Integer(42)
to produce the same hashCode(), and what good do you think
it will do you? On first sight, at least, it seems daft.
Pray explain the well-concealed sanity that lies beneath.
 
M

Mark

Eric said:
My only "idea" is that you should take a few steps back
from the immediate tactical issue and explain your overall
strategic goal. *Why* do you want "ABC" and new Integer(42)
to produce the same hashCode(), and what good do you think
it will do you? On first sight, at least, it seems daft.
Pray explain the well-concealed sanity that lies beneath.

Well, if I want to pull up all the data pertaining to a student, but I
only know his last name or his student number, not both...
 
P

Patricia Shanahan

Mark said:
Well, if I want to pull up all the data pertaining to a student, but I
only know his last name or his student number, not both...

That is a very good reason for having a HashMap with names as keys, and
a HashMap with student numbers as keys...

Note that every non-null non-primitive variable or expression in Java is
a pointer to some object, not the value of an object. That means that
the two HashMap instances can point to the same Student instances.

For the specific case of name and number, you could get away with a
single table. It might be slightly slower than two tables, but probably
only slightly. However, mixed key tables cannot take advantage of
generics, and you must be very sure that you never need e.g. to index by
two different numbers, where there could be a risk of hitting on the
wrong key.

Generally, two tables is much cleaner.

Patricia
 
M

Mark

Patricia said:
That is a very good reason for having a HashMap with names as keys, and
a HashMap with student numbers as keys...

Note that every non-null non-primitive variable or expression in Java is
a pointer to some object, not the value of an object. That means that
the two HashMap instances can point to the same Student instances.

For the specific case of name and number, you could get away with a
single table. It might be slightly slower than two tables, but probably
only slightly. However, mixed key tables cannot take advantage of
generics, and you must be very sure that you never need e.g. to index by
two different numbers, where there could be a risk of hitting on the
wrong key.

Generally, two tables is much cleaner.

Patricia

Well, two tables is okay I guess as long as there aren't actually two
instances of the data.
So basically... I have two arrays of references...
 
E

Eric Sosman

Mark said:
Well, two tables is okay I guess as long as there aren't actually two
instances of the data.
So basically... I have two arrays of references...

Almost: you have two Maps that associate keys with references.
But you seem to have grasped the main point, which is that the
references point to the objects but are not the object instances.
What goes into the map are pairs of (reference to key, reference
to target), not the keys and the targets themselves.

Faulty analogy: You have your best friends' phone numbers in
your Palm Pilot, and maybe written down in a few places, and for
that matter you could always look 'em up in the phone book. But
your friends' actual, physical phones are in none of these places;
the things you store and retrieve are "references" to those phones.
Through the magic of the phone network you can use a phone number
to make a friend's phone ring, just as through the magic of the JVM
you can use a reference to make an object instance do something.
But the friend's phone, like the object instance, is not "in your
possession;" all you're holding is the reference.
 
T

tam

Patricia said:
That is a very good reason for having a HashMap with names as keys, and
a HashMap with student numbers as keys...

Note that every non-null non-primitive variable or expression in Java is
a pointer to some object, not the value of an object. That means that
the two HashMap instances can point to the same Student instances.

For the specific case of name and number, you could get away with a
single table. It might be slightly slower than two tables, but probably
only slightly. However, mixed key tables cannot take advantage of
generics, and you must be very sure that you never need e.g. to index by
two different numbers, where there could be a risk of hitting on the
wrong key.

Generally, two tables is much cleaner.

Patricia

An alternative implementation is two hashes where
one hash gives the student number given the name and the other
gives the student record given the student number. I suspect
this would mimic the underlying database more closely
and it seems a bit cleaner to me. There could still be a
convenience method that does
studentRecord.get(studentNumber.get(studentName))
so from the user perspective it's just as convenient. [Might
be slower but I doubt that will be an issue.]
When we create a new student record, we only update a single
hash, and when we update a name (e.g., after marriage) we
only update items relating to the name, not the the student
record hash.
Regards,
Tom McGlynn
 
M

Mark

Patricia said:
That is a very good reason for having a HashMap with names as keys, and
a HashMap with student numbers as keys...

Note that every non-null non-primitive variable or expression in Java is
a pointer to some object, not the value of an object. That means that
the two HashMap instances can point to the same Student instances.

For the specific case of name and number, you could get away with a
single table. It might be slightly slower than two tables, but probably
only slightly. However, mixed key tables cannot take advantage of
generics, and you must be very sure that you never need e.g. to index by
two different numbers, where there could be a risk of hitting on the
wrong key.

Generally, two tables is much cleaner.

Patricia

An alternative implementation is two hashes where
one hash gives the student number given the name and the other
gives the student record given the student number. I suspect
this would mimic the underlying database more closely
and it seems a bit cleaner to me. There could still be a
convenience method that does
studentRecord.get(studentNumber.get(studentName))
so from the user perspective it's just as convenient. [Might
be slower but I doubt that will be an issue.]
When we create a new student record, we only update a single
hash, and when we update a name (e.g., after marriage) we
only update items relating to the name, not the the student
record hash.
Regards,
Tom McGlynn

In that case, the student number is dependent on the student's name,
no?
Unfortunately, that doesn't work for me.
 
M

Mark

Eric said:
Almost: you have two Maps that associate keys with references.
But you seem to have grasped the main point, which is that the
references point to the objects but are not the object instances.
What goes into the map are pairs of (reference to key, reference
to target), not the keys and the targets themselves.

Faulty analogy: You have your best friends' phone numbers in
your Palm Pilot, and maybe written down in a few places, and for
that matter you could always look 'em up in the phone book. But
your friends' actual, physical phones are in none of these places;
the things you store and retrieve are "references" to those phones.
Through the magic of the phone network you can use a phone number
to make a friend's phone ring, just as through the magic of the JVM
you can use a reference to make an object instance do something.
But the friend's phone, like the object instance, is not "in your
possession;" all you're holding is the reference.

I've run into a bit of a snag.

When I insert an object into the hash table, I pass in (a reference to)
the object, and a hash code, which can either be based on the last name
or the student number. To deal with collisions, I use a quadratic
probing technique...

void insert(Object obj, int hash) throws HashTableFull
{
if( count == table.length ) throw new HashTableFull("Cannot insert:
hash table is full");
int key = probe(hash);
table[key] = obj;
++count;
}

int sign(int i)
{
if( i < 0 ) return -1;
return 1;
}

int probe(int hash)
{
int probe = 0;

while(true)
{
int key = (hash + probe*probe*sign(probe)) % table.length;

if( table[key] == null )
return key;

if( probe <= 0 )
probe = -probe+1;
else
probe = -probe;
}
}

Which, in theory, should work nicely. However... when retrieving the
object, how do I know when I've found the right one?

Object find(int hash) throws ObjectNotFound
{
int probe = 0;

for(int i=0; i<table.length; i++)
{
int key = (hash + probe*probe*sign(probe)) % table.length;

if( /* table[key] == ??? */ )
return table[key];

if( probe <= 0 )
probe = -probe+1;
else
probe = -probe;
}

throw new ObjectNotFound("Object not found");
}
 
M

Mark

Mark said:
Eric said:
Almost: you have two Maps that associate keys with references.
But you seem to have grasped the main point, which is that the
references point to the objects but are not the object instances.
What goes into the map are pairs of (reference to key, reference
to target), not the keys and the targets themselves.

Faulty analogy: You have your best friends' phone numbers in
your Palm Pilot, and maybe written down in a few places, and for
that matter you could always look 'em up in the phone book. But
your friends' actual, physical phones are in none of these places;
the things you store and retrieve are "references" to those phones.
Through the magic of the phone network you can use a phone number
to make a friend's phone ring, just as through the magic of the JVM
you can use a reference to make an object instance do something.
But the friend's phone, like the object instance, is not "in your
possession;" all you're holding is the reference.

I've run into a bit of a snag.

When I insert an object into the hash table, I pass in (a reference to)
the object, and a hash code, which can either be based on the last name
or the student number. To deal with collisions, I use a quadratic
probing technique...

void insert(Object obj, int hash) throws HashTableFull
{
if( count == table.length ) throw new HashTableFull("Cannot insert:
hash table is full");
int key = probe(hash);
table[key] = obj;
++count;
}

int sign(int i)
{
if( i < 0 ) return -1;
return 1;
}

int probe(int hash)
{
int probe = 0;

while(true)
{
int key = (hash + probe*probe*sign(probe)) % table.length;

if( table[key] == null )
return key;

if( probe <= 0 )
probe = -probe+1;
else
probe = -probe;
}
}

Which, in theory, should work nicely. However... when retrieving the
object, how do I know when I've found the right one?

Object find(int hash) throws ObjectNotFound
{
int probe = 0;

for(int i=0; i<table.length; i++)
{
int key = (hash + probe*probe*sign(probe)) % table.length;

if( /* table[key] == ??? */ )
return table[key];

if( probe <= 0 )
probe = -probe+1;
else
probe = -probe;
}

throw new ObjectNotFound("Object not found");
}

Rereading what you wrote, you indicate that a reference to the key
should also be stored. I didn't really understand why a reference to
the key should be stored, if the key could easily be derived from
hashing the object. However... I suppose it could be useful when
trying to find an object...

For example, if one object has a hash "5" and another object has hash
"15", and the hash table capacity is 10... then 5%10 and 15%10 both
equal 5..so they have the same key, but different hashes.. then we can
check if we've found the right object like so...

Object find(int hash) throws ObjectNotFound
{
int probe = 0;

for(int i=0; i<m_table.length; i++)
{
int key = (hash + probe*probe*sign(probe)) % m_table.length;

if( m_hash[key] == hash )
return m_table[key];

if( probe <= 0 )
probe = -probe+1;
else
probe = -probe;
}

throw new ObjectNotFound("Object not found");
}

using the modified insert...

void insert(Object obj, int hash) throws HashTableFull
{
if( count == m_table.length ) throw new HashTableFull("Cannot insert:
hash m_table is full");
int key = probe(hash);
m_table[key] = obj;
m_hash[key] = hash;
++count;
}

am I correct?
 
E

Eric Sosman

Mark wrote On 11/21/06 16:57,:
If you're trying to learn how to implement a hash table,
fine: go ahead and fiddle around to your curiosity's limits.
But if you're just trying to use a hash table in the solution
of some other problem, I recommend java.util.HashMap to your
attention. The wheel has already been invented, and is not
too badly out of round ...
void insert(Object obj, int hash) throws HashTableFull
{
if( count == table.length ) throw new HashTableFull("Cannot insert:
hash table is full");
int key = probe(hash);
table[key] = obj;
++count;
}

int sign(int i)
{
if( i < 0 ) return -1;
return 1;
}

int probe(int hash)
{
int probe = 0;

while(true)
{
int key = (hash + probe*probe*sign(probe)) % table.length;

if( table[key] == null )
return key;

if( probe <= 0 )
probe = -probe+1;
else
probe = -probe;
}
}

Which, in theory, should work nicely.

I haven't studied it closely, but at a quick glance it
doesn't look noticeably better than linear probing. Every
hash value that starts by probing a particular bucket [k]
then follows the same path: [k+1], [k-1], [k+4], [k-4], ...
Also, I haven't convinced myself that this probe sequence
will eventually visit every table location instead of just
cycling back on itself.

Also, the % operator doesn't do quite what you want.
For example, -7 % 4 yields -3, not 1.
However... when retrieving the
object, how do I know when I've found the right one?
[...]

Rereading what you wrote, you indicate that a reference to the key
should also be stored. I didn't really understand why a reference to
the key should be stored, if the key could easily be derived from
hashing the object. However... I suppose it could be useful when
trying to find an object...

The "key" is the actual Student name or number, not the
hash code derived from it nor the location of some bucket in
the table. You hash the key, do some numerical hocus-pocus,
and derive a bucket number. In that bucket, if it's not empty,
you find a (key,value) pair. You compare the bucket's key to
the search key to find out whether this is the desired pair or
whether you need to probe further.

Note: Since you stop searching as soon as you've found a
bucket whose key equals the search key, it follows that the
keys (not necessarily their hash codes) must be unique. If
you've got two different Students both named "John Smith"
there'll be trouble. You could, if you wanted, have a Map
that associated a name with a collection of all the Students
sharing that same name; many collections would contain just
one Student apiece, but some might contain several.

Recommended reading: D.E. Knuth, "The Art of Computer
Programming, Volume III: Sorting and Searching." Some people
are intimidated by Knuth's rigor (I confess that some sections
are beyond my own mathematical skills), but he always ends the
deeper derivations with a straightforward statement of the
conclusions -- very helpful for those of us unlikely to win
a Fields Medal any time soon.
 
T

tam

Mark said:
(e-mail address removed) wrote: ....>
An alternative implementation is two hashes where
one hash gives the student number given the name and the other
gives the student record given the student number. I suspect
this would mimic the underlying database more closely
and it seems a bit cleaner to me. There could still be a
convenience method that does
studentRecord.get(studentNumber.get(studentName))
so from the user perspective it's just as convenient. [Might
be slower but I doubt that will be an issue.]
When we create a new student record, we only update a single
hash, and when we update a name (e.g., after marriage) we
only update items relating to the name, not the the student
record hash.
Regards,
Tom McGlynn

In that case, the student number is dependent on the student's name,
no?
Unfortunately, that doesn't work for me.

Not sure quite what you mean by 'the number is dependent on the name'.
You would need to be able to map to a student's number given a student
name but the name does not determine the ID -- they are associated
by a table. If you have students who have a name but no number
then this approach won't work but if every student has a number if
should be fine.

In practice one normally uses a student number because student
names are not reliable indices for the database. It's common for
multiple students to have the same name and there can be many
variations on how the name is coded. E.g., I've seen just my last name
written as McGlynn, Mc Glynn, MCGLYNN or Mcglynn.
Numbers (or some equivalent coding) insure uniqueness and standardize
how the a coding for the record indices.

The database records are indexed by the number/code and when users
want to query by name there is a translation from the name to the
number and
then a search for the associated record.

This allows graceful handling of the case when two students
have the same name. If multiple numbers match a name,
the user is asked for additional information to resolve the ambiguity.

This code is cleanly separated from the code that uses the student
records. There is a single method for getting the student's records
once we have resolved any ambiguities about which student is
involved.

Of course, this may not fit with your database or your other
requirements
but it's a bit of what database programmers call database normalization
poking into a Java implementation.

Regards,
Tom McGlynn
 
M

Mark

Ahh.. yes. I was getting my terminology mixed up. Ive been using
"hash" to mean the value that is returned by the hash function, and
"key" to mean the array index (after applying hocus pocus to the hash).

Eric said:
You compare the bucket's key to
the search key to find out whether this is the desired pair or
whether you need to probe further.

I was thinking about passing a reference to the object into the "find"
method and then comparing the two objects... but of couse, this really
doesn't make sense. If you already know what the object is, you don't
need to search for it. However passing just the key makes much more
sense.
If you're trying to learn how to implement a hash table,
fine: go ahead and fiddle around to your curiosity's limits.
But if you're just trying to use a hash table in the solution
of some other problem, I recommend java.util.HashMap to your
attention. The wheel has already been invented, and is not
too badly out of round ...

It's a learning process :) And evidentally a needed one too.
I haven't studied it closely, but at a quick glance it
doesn't look noticeably better than linear probing. Every
hash value that starts by probing a particular bucket [k]
then follows the same path: [k+1], [k-1], [k+4], [k-4], ...
Also, I haven't convinced myself that this probe sequence
will eventually visit every table location instead of just
cycling back on itself.

Quadratic probing is actually considerably better. Yes, each inserted
object will follow the same path, but you won't get these large
clusters around the bucket, which means other objects that are inserted
nearby won't need to probe as far. However, a random or rehash probing
technique would be much better.
Also, the % operator doesn't do quite what you want.
For example, -7 % 4 yields -3, not 1.

Heheh... I realized this when I started getting "ArrayIndexOutOfBounds"
exceptions. Easily fixed.
Recommended reading: D.E. Knuth, "The Art of Computer
Programming, Volume III: Sorting and Searching." Some people
are intimidated by Knuth's rigor (I confess that some sections
are beyond my own mathematical skills), but he always ends the
deeper derivations with a straightforward statement of the
conclusions -- very helpful for those of us unlikely to win
a Fields Medal any time soon.

I'm not really a textbook sort of guy (more hands on), but I'll keep
this is mind.

Thank you so much for your help.
 
E

Eric Sosman

Mark said:
Ahh.. yes. I was getting my terminology mixed up. Ive been using
"hash" to mean the value that is returned by the hash function, and
"key" to mean the array index (after applying hocus pocus to the hash).

Eric said:
You compare the bucket's key to
the search key to find out whether this is the desired pair or
whether you need to probe further.


I was thinking about passing a reference to the object into the "find"
method and then comparing the two objects... but of couse, this really
doesn't make sense. If you already know what the object is, you don't
need to search for it. [...]

No, but you might want to ask a different kind of question,
like "Is this Student enrolled in Basket Weaving 101?" Given a
Student and a collection of all the enrollees in BW101, the
problem isn't so much to find the Student as to determine whether
he is or isn't in the collection. That's why Java has HashSet
(and other Set flavors, too).
> [in re Knuth:]
> I'm not really a textbook sort of guy (more hands on), but I'll keep
> this is mind.

As in many other endeavors, hands-on will suffice for a while
and take you for a certain distance. But eventually you will find
that progress depends on being able to gain from the experience of
others without expending all the time they did while acquiring it.
(Unless you happen to be immortal with a lot of time on your hands,
or such a blazing genius that you can re-invent an entire field of
knowledge overnight and without aid.) To paraphrase a fairly smart
person, it is foolish to turn down a giant's offer to allow you to
stand on his shoulder.
 
C

Chris Uppal

Eric said:
[in re Knuth:]
I'm not really a textbook sort of guy (more hands on), but I'll keep
this is mind.

As in many other endeavors, hands-on will suffice for a while
and take you for a certain distance. But eventually you will find
that progress depends on being able to gain from the experience of
others without expending all the time they did while acquiring it. [...]
To paraphrase a fairly smart
person, it is foolish to turn down a giant's offer to allow you to
stand on his shoulder.

Though that shoulder may itself to be too high and dry to be reachable without
the aid of a lesser giant.

Or, to put it less indirectly: maybe a less daunting text would be a better
place to start (assuming the desire to start at all). If so, then I'd
recommend Sedgewick's one-volume "Algorithms in <Language>" book(s). (That's
the one-volume versions, not the significantly expanded, and not yet complete,
n-volume versions.)

-- chris
 
M

Mark

Eric said:
Mark said:
Ahh.. yes. I was getting my terminology mixed up. Ive been using
"hash" to mean the value that is returned by the hash function, and
"key" to mean the array index (after applying hocus pocus to the hash).

Eric said:
You compare the bucket's key to
the search key to find out whether this is the desired pair or
whether you need to probe further.


I was thinking about passing a reference to the object into the "find"
method and then comparing the two objects... but of couse, this really
doesn't make sense. If you already know what the object is, you don't
need to search for it. [...]

No, but you might want to ask a different kind of question,
like "Is this Student enrolled in Basket Weaving 101?" Given a
Student and a collection of all the enrollees in BW101, the
problem isn't so much to find the Student as to determine whether
he is or isn't in the collection. That's why Java has HashSet
(and other Set flavors, too).
[in re Knuth:]
I'm not really a textbook sort of guy (more hands on), but I'll keep
this is mind.

As in many other endeavors, hands-on will suffice for a while
and take you for a certain distance. But eventually you will find
that progress depends on being able to gain from the experience of
others without expending all the time they did while acquiring it.
(Unless you happen to be immortal with a lot of time on your hands,
or such a blazing genius that you can re-invent an entire field of
knowledge overnight and without aid.) To paraphrase a fairly smart
person, it is foolish to turn down a giant's offer to allow you to
stand on his shoulder.

Well... I *had* a textbook (~$130), but then I decided it wasn't worth
my money since almost all the information I need is on the internet, so
I returned it. Nevertheless, I've learnt about "chaining"... never
would have thought of it myself. Much easier to implement AND much
more efficient than probing. I'm using the binary search tree class I
used earlier for the chains.

This is going to be one crazy abstract data type... it can retrieve any
student by last name or student number in O(1), or administrator via a
password in O(1)...and print all students in ascending order by last
name or student number in O(n)... however, at the minor cost of
worsening insertion time to O(log n). Heh.. I'm using two hash tables,
and two binary search trees... kind of space inefficient, yes.
 
M

Mark

Heh.. I'm using two hash tables,
and two binary search trees... kind of space inefficient, yes.

Scratch that. A doubly linked list instead of two hash tables. Just
realized I couldn't insert the same object in both trees and have them
sorted differently.
 
C

Chris Uppal

Mark said:
Nevertheless, I've learnt about "chaining"... never
would have thought of it myself. Much easier to implement AND much
more efficient than probing.

When considering the efficiency of data-structures in the modern world, it's
always worth thinking a bit about cache effects. Both open chaining, and any
form of double hashing, have poor effects on locality-of-reference, with memory
accesses jumping all over the shop rather than sequential.

May not make much difference in Java (which tends to be cache-unfriendly anyway
for objects), but worth thinking about. Especially if you end up considering
on-disk structures (I realise that isn't relevant to your immediate task).

-- chris
 
M

Mark Thornton

Chris said:
May not make much difference in Java (which tends to be cache-unfriendly anyway
for objects),

Newly allocated objects are usually adjacent in memory regardless of
their type. Depending on the garbage collector, objects with links to
each other may well be copied to nearby locations in the main heap.
Many C/C++ allocators might often place objects allocated in succession
a considerable distance apart if the heap is badly fragmented.

Mark Thornton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top