Using a lot of Maps

M

markspace

I must admit that I don't really get this code. What should be the
initial value of x? Obviously this is a typo, as x isn't even
initialized and this won't compile.


You're right, I totally blew that. x should be initialized to 0;

However, if x is initialized to hash
then x = x * 53 + o.hashCode() doesn't make any sense to me (because
hash==0). What am I missing here?


Try it. Add a line after the x = x * 53 + o.hashCode(); to print out
the value of x in each loop. Make a Heirarchy object with some dummy
values:

Heirarchy h = hew Heirarchy( 1, 2, 5, 7, 9 );

then call the hashCode() method and observe the results:

System.out.println( h.hashCode() );

Maybe I loused up the rest of the hashCode method too but I think it's
correct. You should see x being calculated and then a final value for
the hashCode after it's computed.
 
M

markspace

I learned '31' as the coefficient to use for 32-bit hashes, though,
back in my Knuth-reading days. Is '53' better?


When NetBeans auto-generates hashCode() methods, I've noticed it
randomizes the constants it uses, around a small range of values. I
don't think it matters much, but I'm just doing the same thing. Any
constant with some low order bits set as well as some bits set above bit
position 3 will do fine I think.

31, 33, 35, 51, 53, 55, etc. all work pretty much the same, and generate
slightly different hash codes, meaning the chance of collision is
reduced if you use different constants once in a while.
 
M

markspace

No, it's still O(1). The depth of the nesting is specific to and constant
for a given program. For a depth of nesting k = 6 and a number of (ultimate)
objects to find n, the algorithm will require k * 1 lookups to find a
value, giving O(k * 1) => O( 6 * 1) (as k is a constant) which is O( 1 )


I disagree. Recall that in a tree structure, the average runtime of
insertions and retrivals is log n. Clearly we have a hierarchy like a
tree here, so we really can't expect to get better than this. And for
un-balanced or degenerate or otherwise poorly constructed trees, the
"tree" becomes a list and we do get O(n) look up times.

Like I said, we really can't tell. For HashMaps with a large fan-out
and only a small depth, the base of the logarithm will be large and the
number of look-ups small. If the trees are all tall and narrow, he's
going to be tending towards O(n) -- I would say effectively is O(n). It
all depends on his problem space.

But O(1) is better than either O(n) or O(lg n), so why not do that?
 
S

Screamin' Lord Byron

The additive term. 'o.hashCode()' is unlikely to be 0 (though if it
is 'null' the whole thing blows up). If it is, then it rightly
contributes zero to the result. So at each iteration through the loop
after the first, 'x' will also be non-zero, most likely.

Right. The loop. :) Apparently my brain shut down in the middle of the post.
 
S

Screamin' Lord Byron

You're right, I totally blew that. x should be initialized to 0;




Try it. Add a line after the x = x * 53 + o.hashCode(); to print out
the value of x in each loop.

I somehow managed to miss the fact that expression may be evaluated more
than once. With that in mind it's clear that x most likely won't stay 0
after the first iteration, as Lew pointed out. It all makes perfect
sense when you're not sleeping while replying to a post. :)
 
A

Arne Vajhøj

I disagree. Recall that in a tree structure, the average runtime of
insertions and retrivals is log n. Clearly we have a hierarchy like a
tree here, so we really can't expect to get better than this. And for
un-balanced or degenerate or otherwise poorly constructed trees, the
"tree" becomes a list and we do get O(n) look up times.

You do not have a tree here.

A tree is characterized by the depth increasing when n increases.

Here you have a fixed number of nested maps that does not increase
as n increases.

O(1)

Arne
 
A

Arne Vajhøj

I think we are going to have to agree to disagree on this.
Apparently.


That's interesting. I'm vaguely familiar with message passing systems.
Message passing is not new, I did work briefly on one lo these many
years ago.

My understanding is that there are two types of synchronization --
shared memory, and message passing. Java's monitors are a shared memory
design. I'm not sure what it would take to get a monitor in Java to run
on a message passing system. And the rest of the Java memory model might
be impossible to port.

Anyway, good info, but unlikely to affect the Java world, which is
shared memory only afaict.

The Java memory model was specifically created to handle
systems without cache coherency, so it will survive fine.
Apps assuming that it will not be a problem on modern
systems will not.

Arne
 
M

markspace

The Java memory model was specifically created to handle
systems without cache coherency, so it will survive fine.
Apps assuming that it will not be a problem on modern
systems will not.


Cache coherency isn't a concern. It's shared memory. Threads require
it, message passing systems generally eschew it. Message passing
systems generally use processes, not threads, to do their concurrency.

There's also problem is simulating shared memory with a message passing
system. There's a fairly high overhead in messaging systems with
passing the message. Trying to copy a large block of memory -- or
multiple large blocks -- to simulate a Java synchronization action could
really bog performance. Do-able, perhaps, but not a great idea.

Read the article you linked to again: Intel is saying they want to
connect 1000's of processors but not use cache coherency. So if you
fill an array with 10000 bytes of output and then have to make it
visible by releasing a shared lock, that memory has to be sent to all
systems that may need it.

One message, sent to N systems. In a shared memory design, this happens
automatically. In message passing, you have to pass a message to each
process. Each process has to wait for each message to receive it.

It gets worse if the initial thread has updated memory hither and yon
through out the system -- many messages to many processes.

I think this might be doable, but still the overhead appears nasty -- as
bad as Java 5 intrinsic locks, which required a system call per lock
(and unlike the faster light weight locks in use now).

Message passing works well I think when there are fewer messages. One
message at the beginning with the data to be worked on, one message at
the end of the computation when the result is calculated. That's a
design that could profit by a simplified hardware design. Java's fine
grain shared memory and monitors wouldn't work well.

Anyway, I'm not 100% sure, but I don't think this would be a simple port
of Java.
 
A

Arved Sandstrom

Patricia said:
On 11/22/2010 7:59 AM, ses wrote:
...

A higher level strategy I often follow in this sort of situation is to
isolate the data structure inside a class of its own that only offers
the methods you need. Those methods should be designed in terms of the
callers' needs, not the implementation. For example, if the key is
really a pair of ints, and Integer is only being used because you are
implementing it using collections, the class might have methods such
as:
public boolean insert(int key1, int key2, String data)
public String get(int key1, int key2)
public boolean delete(int key1, int key2)

Implement the class the simplest, most direct way that you can think
of, without spending a lot of time worrying about it.

Effectively, this kicks the problem of how best to implement the data
structure down the road to a time when some additional requirement,
such as a performance or memory usage requirement, has become clear.
At that time, knowing a specific requirement that your initial
implementation is not meeting will often make the correct course of
action obvious.
Patricia

Good advice. As it happens the OP's requirement is not uncommon, and neither
is his implementation. A liitle thought indicates that using a composite key
and a single map satisfies the same _requirement_, with a different
implementation, and might be a better approach. Commons Collections even has
a MultiKey class for creating composite keys.

Whatever the OP does, whether nested maps or using composite keys, following
your advice would be a good idea, IMO.

AHS
 
A

Arne Vajhøj

Cache coherency isn't a concern. It's shared memory. Threads require it,
message passing systems generally eschew it. Message passing systems
generally use processes, not threads, to do their concurrency.

I think that system will still have shared main memory.

L1, L2 and L3 cache will not be shared.
There's also problem is simulating shared memory with a message passing
system. There's a fairly high overhead in messaging systems with passing
the message. Trying to copy a large block of memory -- or multiple large
blocks -- to simulate a Java synchronization action could really bog
performance. Do-able, perhaps, but not a great idea.

Read the article you linked to again: Intel is saying they want to
connect 1000's of processors but not use cache coherency. So if you fill
an array with 10000 bytes of output and then have to make it visible by
releasing a shared lock, that memory has to be sent to all systems that
may need it.

They will not need to send any data - they will need to send an
invalidate cache message.
One message, sent to N systems. In a shared memory design, this happens
automatically. In message passing, you have to pass a message to each
process. Each process has to wait for each message to receive it.
>
> It gets worse if the initial thread has updated memory hither and yon
> through out the system -- many messages to many processes.

I believe this it at the HW level, so no processes involved.

Arne
 
M

markspace

They will not need to send any data - they will need to send an
invalidate cache message.


OK, but still, see below.

I believe this it at the HW level, so no processes involved.


On the message passing system I worked on, you needed a sender and a
receiver. You won't want to broadcast to all processes, just the ones
that want to listen. If you have 50 threads that need to synchronize,
you don't want to dump the cache of the other 950 in the system. This
is the advantage of message systems, as I understand it. They have an
explicit target, so you can target only 50 instead of the entire 1000.

Message passing systems are typically implemented in the kernel, with
support from hardware, but not fully hardware themselves. In this case,
the kernel would probably need to look up the PID of the processes, get
which CPU they're running (if any, could be blocked and swapped out),
and send the message to those HW caches.

This could be received asynchronously by the CPUs, with out an explicit
receive on the target process, I'll give that. And HW can assist in the
look-up, that's been done. But there's still a context switch to deal
with (these HW access are usually privileged and not available to user
processes). And the context switch is the overhead, as I understand it.
Fine grained context switching is going to cause slowdowns.
 
L

Lew

markspace said:
On the message passing system I worked on, you needed a sender and a
receiver. You won't want to broadcast to all processes, just the ones
that want to listen. If you have 50 threads that need to synchronize,
you don't want to dump the cache of the other 950 in the system. This is
the advantage of message systems, as I understand it. They have an
explicit target, so you can target only 50 instead of the entire 1000.

Message passing systems are typically implemented in the kernel, with
support from hardware, but not fully hardware themselves. In this case,
the kernel would probably need to look up the PID of the processes, get
which CPU they're running (if any, could be blocked and swapped out),
and send the message to those HW caches.

This could be received asynchronously by the CPUs, with out an explicit
receive on the target process, I'll give that. And HW can assist in the
look-up, that's been done. But there's still a context switch to deal
with (these HW access are usually privileged and not available to user
processes). And the context switch is the overhead, as I understand it.
Fine grained context switching is going to cause slowdowns.

Are you discussing message passing on top of shared memory or shared memory on
top of message passing? I got a little dizzy, but there are ways to do either
that play to the strengths of the underlying platform. (There's also
something to be said for the hybrid approach.)

You can use shared memory to deliver messages using locking and releasing
threads to send and receive to/from the shared area. This allows quite fast
delivery of messages. In a message-passing system you might actually have
such a thing happening to support processor-local messages.

Conversely, in message-passing systems a shared resource typically hides
behind a "resource manager" that operates receive-mode to dole out resource
services on demand. You could put shared data behind a resource manager,
which then will deliver (or simulate) synchronization services.

Upthread, markspace alluded to alternative scenarios where message passing or
shared memory might be the natural paradigm. Different layers, diverse
modules, might call for different patterns.
 
L

Lew

When NetBeans auto-generates hashCode() methods, I've noticed it
randomizes the constants it uses, around a small range of values. I
don't think it matters much, but I'm just doing the same thing. Any
constant with some low order bits set as well as some bits set above bit
position 3 will do fine I think.

31, 33, 35, 51, 53, 55, etc. all work pretty much the same, and generate
slightly different hash codes, meaning the chance of collision is
reduced if you use different constants once in a while.

I'm too lazy to do the research right now, but I'm going to delve into it
later. Here's what I recall:

- There's a difference between coefficient choices for the pseudorandom
generator (PRG) wrt cycle length - how long the pseudorandom sequence goes
before repeating.

- 31 gives a good cycle length.

- I think I recall 53 also being among the small selection of pretty decent
coefficients long-cycle-wise for 32-bit PRGs.

- I don't recall the choice being as wide as your post conveys.

- Cycle length is not the only measure of PRGs for hash goodness.

- There are ways to reduce the likelihood of hash collision for a given set of
values regardless of the goodness of the pseudorandom sequence, and the Java
API uses them.

Thus I conclude that your advice is sound, though by preference I will use a
prime coefficient, and that will be 31 for 32-bit hashes at least until I've
done the research again.
 
M

markspace

Are you discussing message passing on top of shared memory or shared
memory on top of message passing? I got a little dizzy, but there are
ways to do either that play to the strengths of the underlying platform.
(There's also something to be said for the hybrid approach.)


You'd have to read the article Arne posted be sure. I think it talks
about a system with no cache coherency (so no bus sniffing to determine
dirty cache lines) and uses messages instead to achieve the same effect.

I think that's "shared memory on top of message passing" because there
really is no shared memory, just messages.

Upthread, markspace alluded to alternative scenarios where message
passing or shared memory might be the natural paradigm. Different
layers, diverse modules, might call for different patterns.


Different languages too. erlang is a language that uses message passing
inherently, rather than Java which uses threads which share a heap space.

I think I have to take back too what I said upthread about the cache
flush message working. It wouldn't, by itself. You'd also have to send
a "lock" message when the critical region was entered. Otherwise you
could have your cache flushed on you when some other process exited the
critical region, even if you where still using it.

A "lock" can be done fairly efficiently on shared memory systems. Low
level "test and set" or "compare and swap" instructions will do it. But
on a message passing system messages tend to have higher overhead, and
you're back to the slower performance of Java 5 intrinsic locks again.

I'm sure it would be possible to use Java on such a system. But I think
you'd have to use it differently. I.e., message passing is one of those
paradigms that require retraining a lot of programmers. Intel has some
good ideas but they have some dumb ideas too. I think it remains to be
seen which camp this falls in.
 
M

markspace

What we have is a fixed number of
lookups in a HashMap.


I don't think we do. The op's example gave a fixed look-up length, but
his description to me said "I'm going to continue adding HashMaps and
increasing the depth as long as I need to."

One reason I'm asking for clarification and use case. I think it matters.
 
A

Arne Vajhøj

I don't think we do. The op's example gave a fixed look-up length, but
his description to me said "I'm going to continue adding HashMaps and
increasing the depth as long as I need to."

I can not see that quote in the thread!?!?

Arne
 
A

Arne Vajhøj

I'm too lazy to do the research right now, but I'm going to delve into
it later. Here's what I recall:

- There's a difference between coefficient choices for the pseudorandom
generator (PRG) wrt cycle length - how long the pseudorandom sequence
goes before repeating.

For x(n+1) = (a * x(n) + b) % c where c=2^i then the only requirements
necesarry for max. cycle of 2^i is that a%4=1 and b%2=1.

For x(n+1) = (a * x(n)) % c where c=2^i then the only requirements
necesarry for max. cycle of 2^(i-2) is that a%8=3 or a%8=5 and x(1)%2=1.
- 31 gives a good cycle length.

- I think I recall 53 also being among the small selection of pretty
decent coefficients long-cycle-wise for 32-bit PRGs.

- I don't recall the choice being as wide as your post conveys.

- Cycle length is not the only measure of PRGs for hash goodness.

- There are ways to reduce the likelihood of hash collision for a given
set of values regardless of the goodness of the pseudorandom sequence,
and the Java API uses them.

Thus I conclude that your advice is sound, though by preference I will
use a prime coefficient, and that will be 31 for 32-bit hashes at least
until I've done the research again.

It is not obvious to me that coefficients that gives long cycles
for LCG's also give good distribution for hashes.

Arne
 
T

Tom Anderson

I for one welcome our new brainy... naw, that's overdone.

I guess any pedantic post correcting someone has to contain at least one
really ridiculous and obvious spelling error, doesn't it?

Skitt's Law, it's been called. And needless to say, i misspelled 'Lea'
when i composed the post - but due to the aforementioned law, i profred it
extra-carefully.

The Brian/Brain error is a particular favourite of mine; it always raises
a grin. Somehow, it does always seem to get applied to appropriately
eggheaded Brians:

http://news.ansible.co.uk/a147.html#09

tom
 
T

Tom Anderson

It is not obvious to me that coefficients that gives long cycles for
LCG's also give good distribution for hashes.

The whole discussion seems of academic interest to me (which is certainly
not to say it is of no interest!) - there are a number of hashes which are
faster and better-distributed than any multiply-and-add hash. Read these:

http://www.strchr.com/hash_functions
http://www.team5150.com/~andrew/noncryptohashzoo/

Implement this:

http://code.google.com/p/smhasher/wiki/MurmurHash3

tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,059
Latest member
cryptoseoagencies

Latest Threads

Top