Arrays as key in a HashMap

T

trippy

Hendrik Maryns said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
NotDashEscaped: You need GnuPG to verify this message

trippy schreef:
John C. Bollinger said:
trippy wrote:
I want to use arrays as keys in a HashMap.

But a key has to be unique.

1 key to 1 value
And your point is?

There is nothing inherently unworkable with arrays being map keys;
arrays are objects, after all. You might as well argue that Strings
cannot be used as keys because they (often) contain multiple characters.

But arrays have more than one thing in them by definition.

Okay, say using this idea, you keep a list of users and their
information. Now, you have multiple people accessing the same user's
information because thisguy[0] will give you the same value as thisguy
[4].

I guess that's my point. 1 key to 1 value. With the arrays you'll have
multiple keys to values.

I guess if you want that, whatever but I think it's a bad idea. Why
don't you keep things in a plain old ArrayList if you're gonna do that.

Still don?t see your point. What?s the difference if I only want some
objects to be stored as a list, between ArrayList and array?

Actually, the difference would be between ArrayList and Map (probably
HashMap) ArrayList wouldn't require the 1 key to 1 value like a map
would.
Only for
reading, I see no point in using a List.

H.

--
trippy
mhm31x9 Smeeter#29 WSD#30
sTaRShInE_mOOnBeAm aT HoTmAil dOt CoM

NP: "I Am The Law" -- Anthrax

"Now, technology's getting better all the time and that's fine,
but most of the time all you need is a stick of gum, a pocketknife,
and a smile."

-- Robert Redford "Spy Game"
 
C

Chris Smith

tom fredriksen said:
My point is to advise about possible dangers and mystical bugs by doing
it that way. A complex key like that is just begging for trouble along
the way. KISS.

Not to mention the idiomatic breakdown.

[Warning: I'm going to make up terminology here -- strong identity, weak
identity, and confused identity -- because I don't think there is widely
accepted terminology in use throughout the Java programming language to
describe these concepts.]

A key should be a key. It ought to be just as complex as it needs to be
for the application. If I need to look up records that match an array,
then I ought to use an array for a key.

The problem here, as I see it, is that HashMap doesn't work well (or at
a minimum, is dangerous) with keys that have confused identities. An
object with a confused identity is an object whose logical identity may
change throughout its lifetime; loosely speaking, an object that
overrides equals and hashCode in such a way that they may change through
its lifetime. I advocate avoiding objects with confused identities
whenever possible... but certain APIs such as the main Collections API
interfaces require confused identity.

Note that arrays themselves work just fine as keys. It's just necessary
to first recognize that they have strong identity; changing the contents
doesn't change who they are, and the same contents don't make two
different arrays equal to each other. Similarly, objects with weak
identity make fine HashMap keys, regardless of their complexity, so long
as they represent the information that you really want to use as a key.
Instances of the List or Map interfaces, though, do not work well as
keys unless they are immutable, since mutable collections are required
to have confused identities.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
T

tom fredriksen

Chris said:
A key should be a key. It ought to be just as complex as it needs to be
for the application.

The problem here, as I see it, is that HashMap doesn't work well (or at
a minimum, is dangerous) with keys that have confused identities.

Well, it seems like you have a different view of it than I have.

I prefer to keep things as minimalistic and as simple as possible, thats
the way I have found to keep the bug count close to zero and a high
degree of autonomy and maintainability on the servers I have created. If
you have a different approach, please share. I would love to hear other
ways of doing it.

/tom
 
J

John C. Bollinger

Roedy said:
You can't use arrays literally as keys.

Sure you can.
You need sensible equals and
hashCode for them.

Object provides equals() and hashCode() methods that meet all the
technical requirements.
Absolute identity is probably not what you want.

It is not uncommon for me to use Maps where object identity is exactly
the right choice for my keys -- such a map associates values with
specific key objects (object semantics). On the other hand, it may be
more typical in general to use keys that have value semantics, such as
Strings and Integers. In that case the map can be considered as
associating values with whatever things the key objects /represent/; as
a result, you can retrieve each value with any equivalent representation
of the underlying word, number, or whatever that constitutes its logical
key. Both uses have their places.
So you must embed you array in a class that lets you override those
two methods.

If you want value semantics, then that's probably the most
straightforward way to go about it. For what it's worth, it appears
that the OP does want value semantics, and I agree that embedding his
arrays in wrapper objects would be the approach I would consider first.
 
C

Chris Smith

tom fredriksen said:
Well, it seems like you have a different view of it than I have.

I prefer to keep things as minimalistic and as simple as possible, thats
the way I have found to keep the bug count close to zero and a high
degree of autonomy and maintainability on the servers I have created. If
you have a different approach, please share. I would love to hear other
ways of doing it.

The point that any necessary complexity can be encapsulated behind the
contract for Object.equals and Object.hashCode. Somewhat recently, for
example, I wrote some code (I'm leaving out some details to prevent this
post from getting too long) where I needed to keep a Map<Graph,Strategy>
where Graph is a directed graph in the conventional sense, and Strategy
encapsulates a series of transformations that may be performed on that
graph. A user would look up a graph and be given certain options for
transformation. The result of the transformation might then be looked
up again, resulting in a few further options.

Now, I don't think too many people would consider a digraph to be a
minimalistic data structure... but it's what I needed to look up in the
structure. The Graph class was immutable, and tied to Strategy so that
graphs were manipulated by Strategy objects that return a different
Graph as a result. Graph encapsulated the lookup simply by being able
to calculate a consistent hash code for any graph, and being able to
compare itself against another class for equality. Implementing those
things wasn't exactly trivial (though not really hard, either), but once
it was done, the HashMap data structure worked fine, of course.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
T

tom fredriksen

Chris said:
The point that any necessary complexity can be encapsulated behind the
contract for Object.equals and Object.hashCode.

Just because you have a contract about an interface, and it can be kept
constant, does not mean that the code behind it needs to be as
complicated as you can manage, or want to, create it. Thats the key to
low complexity/high maintainability code.
Somewhat recently, for
example, I wrote some code (I'm leaving out some details to prevent this
post from getting too long) where I needed to keep a Map<Graph,Strategy>
where Graph is a directed graph in the conventional sense, and Strategy
encapsulates a series of transformations that may be performed on that
graph.

I am not in a position to say if that was the best way to do it. But of
course, its an intriguing way of doing it. But does it solve the low
complexity/high maintainability equation? Have you needed to change
things because of bugs or any other unforeseen design-/problems?

/tom
 
R

Roedy Green

It is not uncommon for me to use Maps where object identity is exactly
the right choice for my keys -- such a map associates values with
specific key objects (object semantics)

If you index by identity, you need the object itself to feed in as the
retrieval key. So what's the point of looking it up if you already
have it?

You would not use this to find the objects matched by identity
themselves,, but to find other unrelated associated objects, that the
key object does not link to, only by Map membership.

It would be like insisting on canonical interned strings for all your
HashMap lookup keys.
 
R

Roedy Green

Similarly, objects with weak
identity make fine HashMap keys, regardless of their complexity, so long
as they represent the information that you really want to use as a key.

That is a mindbending idea, using an array as an effectively immutable
and unique key for HashMap lookup even though the values of the
elements themselves change.

If you do this, you had better write a paragraph of explanation with
circles and arrows.
 
R

Roedy Green

That is a mindbending idea, using an array as an effectively immutable
and unique key for HashMap lookup even though the values of the
elements themselves change.

If you do this, you had better write a paragraph of explanation with
circles and arrows.

If I were implementing that I think I would put the array in a wrapper
object even if just to help maintenance programmers understand that
the object itself has its own identity that is preserved even though
the elements change.

I would also put in the hashCode and equals methods even if they just
called super with an explanation that was intentional. You DON'T want
them looking at the individual array elements.

Otherwise someone will be tempted to "fix" your code by adding a
"proper" equals method.
 
C

Chris Smith

tom fredriksen said:
Just because you have a contract about an interface, and it can be kept
constant, does not mean that the code behind it needs to be as
complicated as you can manage, or want to, create it. Thats the key to
low complexity/high maintainability code.

Can you solve that problem with less complexity, or do you just move the
complexity elsewhere? Somewhere, you've got to have code that compares
two graphs for equality. The hash code is, of course, an optimization
to avoid a linear search... but it's an even more welcome optimization
when the graphs are so expensive to compare.
I am not in a position to say if that was the best way to do it. But of
course, its an intriguing way of doing it. But does it solve the low
complexity/high maintainability equation? Have you needed to change
things because of bugs or any other unforeseen design-/problems?

Since I wrote the code two weeks ago and it's not used in production, it
doesn't really mean much that I haven't seen any bugs. But no, I
haven't seen any bugs.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
C

Chris Smith

Roedy said:
That is a mindbending idea, using an array as an effectively immutable
and unique key for HashMap lookup even though the values of the
elements themselves change.

If you do this, you had better write a paragraph of explanation with
circles and arrows.

It's definitely not a common way to use a plain array. The point is
that it works fine, and the complexity of the object's state makes it no
more likely to fail. Explanation is necessary, I agree... though I
don't see the utility of circles or arrows.

--
www.designacourse.com
The Easiest Way To Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
I

Ian Pilcher

Roedy said:
That is a mindbending idea, using an array as an effectively immutable
and unique key for HashMap lookup even though the values of the
elements themselves change.

If you do this, you had better write a paragraph of explanation with
circles and arrows.

Or use an IdentityHashMap.
 
M

Mike Schilling

James McGill said:
They are getting younger every day, Roedy. I feel your pain :)

I made a poster to display on the door of my office: A snarling Dick Cheney,
captioned "Seven hunters, two game wardens, and a cow". Many people have
complimented me on how funny it is, and then asked "Why a cow?"
 
T

tom fredriksen

Chris said:
Can you solve that problem with less complexity, or do you just move the
complexity elsewhere? Somewhere, you've got to have code that compares
two graphs for equality. The hash code is, of course, an optimization
to avoid a linear search... but it's an even more welcome optimization
when the graphs are so expensive to compare.

Of course somewhere you have to have the logic to perform the operations
you need done. And using equals() to initiate a comparison of two graphs
is as good as any.

But, using a hash code of the graph as a unique identifier of the graph
is not the way to go, as you then have to take into consideration many
factors that are quite complex, such as

- the complexity of the graph,
- a correct algorithm to produce a representative hash code for the
type and complexity of the graph,
- accuracy of the representation,
- collision rate of the hash code
- distribution properties of the hash code, when considering the
accuracy, collision rate etc

I would try to look for another piece of simple information about a
graph that can be used as a unique identifier.

/tom
 
O

Oliver Wong

Roedy Green said:
If you index by identity, you need the object itself to feed in as the
retrieval key. So what's the point of looking it up if you already
have it?

You would not use this to find the objects matched by identity
themselves,, but to find other unrelated associated objects, that the
key object does not link to, only by Map membership.

It would be like insisting on canonical interned strings for all your
HashMap lookup keys.

One case where I've used object identity as a key for maps was for
annotating trees for a compiler. The source code for the nodes of tree were
generated, so I didn't want to edit their source code and add in attributes
that way. So I make the nodes the keys in my map, and add annotations for
symbol-table construction and type checking as values.

Two nodes, representing the token "x" might be exactly equal in all
their attributes, but may be conceptually considered different, due to their
position in the tree. One such "x" may be a local variable, while another
one is a method parameter. So in cases like these, you really want identity,
and not just equality.

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top