Indexing by multiple keys

  • Thread starter nooneinparticular314159
  • Start date
A

Arved Sandstrom

rossum said:
As an alternative you could use a database. Set up a table with Name,
SSN, etc. and build indexes for the Name and SSN fields for fast
retrieval.

rossum
But these two approaches (the other being a map with Names and SSNs as
keys, each pointing to, say, a Person object) will not be quite the
same. For Name in particular, this is not unique (I'm guessing not,
seeing as how this name is associated with an SSN), so a query on Name
could return multiple results, whereas retrieving a value from a plain
map with a Name as a key will return a single result.

For the Name key, the OP could use a multimap implementation easily
enough. In fact, one approach would be to use a interface that allows
for either a Java multimap implementation or the use of a database
(again, for the Name key). Lookups by SSN would return single values;
lookups by name ought to (just my 2 ISK worth) return multiple
results...either implementation would do that.

AHS
 
E

Eric Sosman

I would like to use an object that behaves like a hashmap (ie. put,
get, with a key), but I'd like to be able to index items in that
object by more than one key.

Is there a way to do this?

Ie. if I had a name for the objects I'm adding as well as a social
security number, I'd like to be able to return the object by either
name or number depending on need. Is there an object that does this,
or do I have to write it myself?

There are two main strategies I know of, and you can blend
them in various degrees:

1) Use multiple Maps, one for each key. Your example would
have a Map<Name,Thing> and a Map<SSN,Thing>. Each time you add
a Thing to one Map you also add it to the other; same business
on deletions. It may be convenient to gather all these Maps
into one big MegaMap, so you can more easily produce things
like Iterators that support remove(). See "inverted file."
(I doubt that a MegaMap could easily implement Map, since the
keySet() method would be problematic. But maybe your problem
suggests some domain-specific approach to that difficulty.)

2) Use a single Map<Object,Thing>, entering each Thing
into the Map once for each of its keys. If the keys are easily
distinguished (e.g., "Mortimer Snerd" is easily recognizable as
a Name and 011-22-4343 as an SSN), you might just use the keys
as they stand. If not, it might be best to make yourself a
little TaggedKey class holding the key's value along with an
enum designating its type:

class TaggedKey {
enum KeyType { NAME, SSN }
private final KeyType type;
private Object value;
}

.... and use a Map<TaggedKey,Thing>. See "library card catalog."
Note that an Iterator over the values() of such a Map will
encounter each Thing as many times as it has keys, just as
"The Sot-Weed Factor" appears many times in the library's catalog.

It will often turn out that not all the "keys" satisfy the
properties one wants in a key, properties like uniqueness. In
your example, Name is unlikely to be a unique key and even SSN
has problems. If so, you may want to synthesize a "primary key"
of some kind (this is why you likely have an "employee number")
and use one Map<PrimaryKey,Thing> for it, and use additional
Map<SecondaryKey,Collection<Thing>> maps for the others. (As
a refugee from the Bad Old Small-Memory days, I might instead
use a Map<SecondaryKey,Object> and store a mixture of Things
and Collection<Thing>s, using `instanceof Thing' at run-time
to figure out what I'd retrieved, and earning the scorn of
every modern-era right-thinking Java zealot.[*]) See also
TAOCP Volume III, "Retrieval on Secondary Keys."

[*] No offense, folks. But I recall a respected senior
programmer/architect at a former job whose response to "Why
worry? Memory's cheap" was to extend his hand, palm upwards,
and wait for the speaker to give him some. Oddly, nobody
ever did ...
 
R

Robert Klemme

I would like to use an object that behaves like a hashmap (ie. put,
get, with a key), but I'd like to be able to index items in that
object by more than one key.

Is there a way to do this?

Ie. if I had a name for the objects I'm adding as well as a social
security number, I'd like to be able to return the object by either
name or number depending on need. Is there an object that does this,
or do I have to write it myself?

You will need multiple Maps for that. Even though you could stuff
everything in a single Map I'd rather not do this because then you
cannot manipulate indexes individually any more. I would probably write
my own class which does all the Map handling and ensuring consistency
internally.

If you need this in multiple places with different data types and index
values it may pay off to look for a generic solution to the problem or
write it yourself (probably using reflection).

If the properties of objects which you use as keys are allowed to
change, things get more complicated. You probably then would have to
use some form of observer pattern to notify your multi index class of
the change. At the very least you would need a reindex operation which
can be invoked manually.

The fun starts, if you mix both (generic solution and mutable key
fields). :)

Another interesting question is whether your indexes are all unique or
non unique and what happens if you have a unique index and get two
objects with equivalent key values...

Kind regards

robert
 
L

Lew

Arved said:
Without being facetious, all over their website, including at
http://www.socialsecurity.gov/history/ssn/geocard.html.

They use "number" in the loose English sense of "numeric string"; they make no
claim there that the SSN is actually numeric, I term I selected advisedly.
Their use of "odd" and "even" is cognate to describing letters as "capital" or
"lower case". There is a numeric algorithm involved in creating an SSN, but
once assigned it acts as a label, and not as a number.

I could have been more accurate and said that the SSN is non-numeric as
commonly used in situations such as described in this thread.
The group number and serial number definitely use numerical concepts
(like odd and even for group number). ....

The only numeric operation that makes sense for an SSN is confirmation of a
check digit, an operation also commonly performed in software on character
strings with letters in them.
What is the checksum procedure for a SSN? I wasn't aware there was any.
The SSA does not describe any kind of checksum procedure for figuring
out whether an SSN is valid. There is a procedure of course for doing so.

I don't know the procedure in detail, and I don't think the SSA particularly
wants to publicize it lest identity hackers mess with it.

Note that their site, in particular the page you linked, describes three
groups of characters (that they call digits) that are independent of each
other. Note also that leading zeros are significant - you cannot have an SSN
of "12345678" or "012-4-5678". Also, group numbers are assigned in an
arbitrary (but fixed) lexical order, not a numeric order.

As you point out, it makes no sense to do any calculations (other than
checksum) on an SSN. What is the square root of an SSN?

Bear in mind that the English word "number" is used for numeric strings as
well as quantitative entities. My point is that SSNs are the former, not the
latter, not that the word "number" does not apply.

I do agree that it is possible to model an SSN as an int or as a String.

For the OP's purpose in particular, it makes more sense to model an SSN as a
String than an int.
 
A

Arne Vajhøj

Arved said:
It's numeric, consisting of 3 numbers. Even the SSA says so. It's not a
string. It's just that it makes little or no sense to contemplate
arithmetic operations involving SSNs, except perhaps for incrementing
the serial number part.

Given that:
- arithmetic operations does not make sense
- leading zero is significant
then I would suggest String over int.

Arne
 
A

Arne Vajhøj

As an alternative you could use a database. Set up a table with Name,
SSN, etc. and build indexes for the Name and SSN fields for fast
retrieval.

In most cases apps have both a persistent and an in memory
representation.

Even with correct index a database does not exactly have the
same performance characteristics as a HashMap.

Arne
 
A

Arved Sandstrom

Lew said:
They use "number" in the loose English sense of "numeric string"; they
make no claim there that the SSN is actually numeric, I term I selected
advisedly. Their use of "odd" and "even" is cognate to describing
letters as "capital" or "lower case". There is a numeric algorithm
involved in creating an SSN, but once assigned it acts as a label, and
not as a number.

I could have been more accurate and said that the SSN is non-numeric as
commonly used in situations such as described in this thread.


The only numeric operation that makes sense for an SSN is confirmation
of a check digit, an operation also commonly performed in software on
character strings with letters in them.


I don't know the procedure in detail, and I don't think the SSA
particularly wants to publicize it lest identity hackers mess with it.

Note that their site, in particular the page you linked, describes three
groups of characters (that they call digits) that are independent of
each other. Note also that leading zeros are significant - you cannot
have an SSN of "12345678" or "012-4-5678". Also, group numbers are
assigned in an arbitrary (but fixed) lexical order, not a numeric order.

As you point out, it makes no sense to do any calculations (other than
checksum) on an SSN. What is the square root of an SSN?

Bear in mind that the English word "number" is used for numeric strings
as well as quantitative entities. My point is that SSNs are the former,
not the latter, not that the word "number" does not apply.

I do agree that it is possible to model an SSN as an int or as a String.

For the OP's purpose in particular, it makes more sense to model an SSN
as a String than an int.
In general I agree - I was playing a bit of devil's advocate. I myself
would model an SSN as a string. That the individual 3 numbers making up
an SSN are not quantitative entities is undeniable. But then again,
neither are ordinal numbers, and both the group number and the serial
number behave like specialized ordinal numbers.

I'm still not sold on the idea that there is a checksum for SSNs. I
myself don't see how there can be one. You yourself say that the three
numbers making up an SSN are independent of each other.

AHS
 
R

Robert Klemme

Lew wrote:
In general I agree - I was playing a bit of devil's advocate. I myself
would model an SSN as a string. That the individual 3 numbers making up
an SSN are not quantitative entities is undeniable. But then again,
neither are ordinal numbers, and both the group number and the serial
number behave like specialized ordinal numbers.

I'm still not sold on the idea that there is a checksum for SSNs. I
myself don't see how there can be one. You yourself say that the three
numbers making up an SSN are independent of each other.

I would model a SSN as an SSN - meaning: since a SSN is not exactly a
number and has more properties than a simple length (for example, a
valid format) I would create a class for SSN handling. It's likely that
it will include a constructor with a String (or even CharSequence)
argument but the internal representation does not really matter. And it
can be changed, too if you notice that all of a sudden the long you used
initially is not sufficient to hold all the relevant information. My
0.02 EUR anyway.

Kind regards

robert
 
A

Arved Sandstrom

Robert said:
I would model a SSN as an SSN - meaning: since a SSN is not exactly a
number and has more properties than a simple length (for example, a
valid format) I would create a class for SSN handling. It's likely that
it will include a constructor with a String (or even CharSequence)
argument but the internal representation does not really matter. And it
can be changed, too if you notice that all of a sudden the long you used
initially is not sufficient to hold all the relevant information. My
0.02 EUR anyway.

Kind regards

robert

How I would approach SSN processing (or SIN or credit card number
processing, for that matter) would depend on where I am getting the
number from, and what I need to do with it. For example, a number of the
government applications I am helping to maintain deal with SINs, but we
get them electronically from trusted partners, and so there is no need
to deal with them other than as opaque strings.

If OTOH the SSN/SIN was being typed in by a clerk I'd consider
validation of some sort. It won't be perfect but it'll catch some typos.

I would contemplate your design if my application was tasked with
generating and/or validating an SSN/SIN, but not otherwise.

AHS
 
L

Lew

Arved said:
I'm still not sold on the idea that there is a checksum for SSNs. I
myself don't see how there can be one. You yourself say that the three
numbers making up an SSN are independent of each other.

I have been told by programmers at SSA that there is a checksum, but not the
details of the algorithm. However, Wikipedia states that there is no check
digit in an SSN, so it seems I was wrong.
<http://en.wikipedia.org/wiki/Social_security_number>
<http://en.wikipedia.org/wiki/Social_security_number#Absence_of_a_check_digit>

I was probably confusing the checksum idea with
<http://www.socialsecurity.gov/employer/ssnvhighgroup.htm>
 
A

Arne Vajhøj

Arved said:
How I would approach SSN processing (or SIN or credit card number
processing, for that matter) would depend on where I am getting the
number from, and what I need to do with it. For example, a number of the
government applications I am helping to maintain deal with SINs, but we
get them electronically from trusted partners, and so there is no need
to deal with them other than as opaque strings.

If OTOH the SSN/SIN was being typed in by a clerk I'd consider
validation of some sort. It won't be perfect but it'll catch some typos.

Even with validation String would probably be a better format than
int or int's.

Like matching the string against \d{3}-\d{2}-\d{4}.

Arne
 
R

Robert Klemme

Even with validation String would probably be a better format than
int or int's.

Like matching the string against \d{3}-\d{2}-\d{4}.

I would not want to go with plain String in an OO language if there must
be validation. IMHO that approach is only justifiable in the light of
performance issues. The default for me would be a separate class with
ensures all the invariants that are needed for an SSN in the application.

Only if that class is the bottleneck of the application I would consider
changing this. First line of defense might be to make the validation
optional if that is actually the bottleneck and I have sources that I
can trust. Then I'd consider switching to String. It's easier to
modify a complete program to exchange class SSN for class String as the
reverse operation.

Kind regards

robert
 
D

Donkey Hottie

Arne Vajhøj said:
In most cases apps have both a persistent and an in memory
representation.

Even with correct index a database does not exactly have
the same performance characteristics as a HashMap.

Arne

Given the HashMap contains the population of U.S. and the use case needs one
search from that popula, a HashMap that needs to be loaded before the search
does not have the performance characters of an SQL SELECT with the key.
 
A

Arne Vajhøj

Donkey said:
Given the HashMap contains the population of U.S. and the use case needs
one search from that popula, a HashMap that needs to be loaded before
the search does not have the performance characters of an SQL SELECT
with the key.

True.

But having then entire US population and only needing to lookup
one person is a rather unlikely scenario.

Arne
 
A

Arne Vajhøj

Robert said:
I would not want to go with plain String in an OO language if there must
be validation. IMHO that approach is only justifiable in the light of
performance issues. The default for me would be a separate class with
ensures all the invariants that are needed for an SSN in the application.

I disagree.

It comes in as string, it is persisted as string and i can not think of
any operation to be done on it that will not work fine on a string.

It is not particular OO to convert data to unnatural forms.

Arne
 
A

Arved Sandstrom

Arne said:
I disagree.

It comes in as string, it is persisted as string and i can not think of
any operation to be done on it that will not work fine on a string.

It is not particular OO to convert data to unnatural forms.

Arne

When I stated earlier that I would contemplate Robert's design for an
SSN class if validation was demanded, I would still store the actual SSN
as a String, with no structure. I am not sure that Robert was suggesting
otherwise himself.

AHS
 
R

Robert Klemme

It's not that there are operations that would not work on a String. My
point is that there are _too many_ operations that work on a String that
are either superfluous for a SSN or actually do harm (for mutable
strings). From what I have read a SSN is significantly important and
special enough that a dedicated class is warranted.

Whatever "unnatural" means in this context. I find it most natural when
working in an OO language to model important entities of the real world
as separate classes and not try to cover everything with basic types.
After all, that's what OO is about.
When I stated earlier that I would contemplate Robert's design for an
SSN class if validation was demanded, I would still store the actual SSN
as a String, with no structure. I am not sure that Robert was suggesting
otherwise himself.

As you rightly assumed, I wasn't. If Arne's argument were followed,
there would be no point in having other classes as String in an
application at all because all the data comes in as strings and goes out
as strings. (Of course I am exaggerating here - but just a bit.) ;-)

Kind regards

robert
 
D

Daniel Pitts

I would like to use an object that behaves like a hashmap (ie. put,
get, with a key), but I'd like to be able to index items in that
object by more than one key.

Is there a way to do this?

Ie. if I had a name for the objects I'm adding as well as a social
security number, I'd like to be able to return the object by either
name or number depending on need. Is there an object that does this,
or do I have to write it myself?

Thanks!
You can use whatever keys you want in a HashMap.
You can also use multiple hash-maps if you're keyspaces aren't disjoint.

Map<String, Person> byName;
Map<SocialSecurityNumber, Person> bySsn;
Map<Color, List<Person>> byFavoriteColor;
 
D

Daniel Pitts

Lew said:
Silly me.

I must have been counting the dashes.

Everyone who corrected me on that point is correct, of course.

An SSN is still not numeric, though.
I agree, but it is also not a String.

It is a type in-and-of itself.

come on people, we're OO programmers, use classes over primitives!
 
D

Daniel Pitts

Lew said:
I have been told by programmers at SSA that there is a checksum, but not
the details of the algorithm. However, Wikipedia states that there is
no check digit in an SSN, so it seems I was wrong.
<http://en.wikipedia.org/wiki/Social_security_number>
<http://en.wikipedia.org/wiki/Social_security_number#Absence_of_a_check_digit>


I was probably confusing the checksum idea with
<http://www.socialsecurity.gov/employer/ssnvhighgroup.htm>
Perhaps you were thinking of ISBN as well, which has a check digit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top